Skip to content

Instantly share code, notes, and snippets.

View mlabonne's full-sized avatar

Maxime Labonne mlabonne

View GitHub Profile
Model AGIEval GPT4All TruthfulQA Bigbench Average
dolphin-2.8-mistral-7b-v02 38.99 72.22 51.96 40.41 50.9

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.65 ± 2.59
acc_norm 20.47 ± 2.54
agieval_logiqa_en 0 acc 35.79 ± 1.88
Model AGIEval GPT4All TruthfulQA Bigbench Average
distilabeled-Marcoro14-7B-slerp 45.38 76.48 65.68 48.18 58.93

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.56 ± 2.81
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 39.17 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
openchat-3.5-1210 42.62 72.84 53.21 43.88 53.14

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 22.44 ± 2.62
acc_norm 24.41 ± 2.70
agieval_logiqa_en 0 acc 41.17 ± 1.93
Model AGIEval GPT4All TruthfulQA Bigbench Average
openchat_3.5 42.67 72.92 47.27 42.51 51.34

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.02 ± 2.69
acc_norm 24.80 ± 2.72
agieval_logiqa_en 0 acc 38.86 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
zephyr-7b-beta 37.33 71.83 55.1 39.7 50.99

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.26 ± 2.57
acc_norm 20.47 ± 2.54
agieval_logiqa_en 0 acc 33.33 ± 1.85
Model AGIEval GPT4All TruthfulQA Bigbench Average
MistralTrix-v1 44.98 76.62 71.44 47.17 60.05

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 25.59 ± 2.74
acc_norm 24.80 ± 2.72
agieval_logiqa_en 0 acc 37.48 ± 1.90
Model AGIEval GPT4All TruthfulQA Bigbench Average
Mistral-7B-Instruct-v0.2 38.5 71.64 66.82 42.29 54.81

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.62 ± 2.67
acc_norm 22.05 ± 2.61
agieval_logiqa_en 0 acc 36.10 ± 1.88
Model AGIEval GPT4All TruthfulQA Bigbench Average
dolphin-2.2.1-mistral-7b 38.64 72.24 54.09 39.22 51.05

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.23 ± 2.65
acc_norm 21.26 ± 2.57
agieval_logiqa_en 0 acc 35.48 ± 1.88
Model AGIEval GPT4All TruthfulQA Bigbench Average
zephyr-7b-alpha 38 72.24 56.06 40.57 51.72

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 20.47 ± 2.54
acc_norm 19.69 ± 2.50
agieval_logiqa_en 0 acc 31.49 ± 1.82