Want to Become a Sponsor? Contact Us Now!🎉

🏆 LLMリーダーボード

LLMリーダーボードへようこそ。LLMモデルのパフォーマンスメトリクスのための決定的なプラットフォームです。私たちの使命は、さまざまなLLMモデルの集中的で包括的な概要を提供し、ユーザーがそれらの能力を比較し対照できるようにすることです。

LLMリーダーボードの注目LLMモデル

VLMおよびその他のLLMツール

LLMリーダーボード

モデル名発行者オープン?チャットボットアリーナEloHellaSwag (フューショット)HellaSwag (ゼロショット)HellaSwag (ワンショット)HumanEval-Python (パス@1)LAMBADA (ゼロショット)LAMBADA (ワンショット)MMLU (ゼロショット)MMLU (フューショット)TriviaQA (ゼロショット)TriviaQA (ワンショット)WinoGrande (ゼロショット)WinoGrande (ワンショット)WinoGrande (フューショット)
alpaca-7b (opens in a new tab)Stanfordno0.739 (opens in a new tab)0.661 (opens in a new tab)
alpaca-13b (opens in a new tab)Stanfordno1008 (opens in a new tab)
bloom-176b (opens in a new tab)BigScienceyes0.744 (opens in a new tab)0.155 (opens in a new tab)0.299 (opens in a new tab)
cerebras-gpt-7b (opens in a new tab)Cerebrasyes0.636 (opens in a new tab)0.636 (opens in a new tab)0.259 (opens in a new tab)0.141 (opens in a new tab)
cerebras-gpt-13b (opens in a new tab)Cerebrasyes0.635 (opens in a new tab)0.635 (opens in a new tab)0.258 (opens in a new tab)0.146 (opens in a new tab)
chatglm-6b (opens in a new tab)ChatGLMyes985 (opens in a new tab)
chinchilla-70b (opens in a new tab)DeepMindno0.808 (opens in a new tab)0.774 (opens in a new tab)0.675 (opens in a new tab)0.749 (opens in a new tab)
codex-12b / code-cushman-001 (opens in a new tab)OpenAIno0.317 (opens in a new tab)
codegen-16B-mono (opens in a new tab)Salesforceyes0.293 (opens in a new tab)
codegen-16B-multi (opens in a new tab)Salesforceyes0.183 (opens in a new tab)
codegx-13b (opens in a new tab)Tsinghua Universityno0.229 (opens in a new tab)
dolly-v2-12b (opens in a new tab)Databricksyes944 (opens in a new tab)0.710 (opens in a new tab)0.622 (opens in a new tab)
eleuther-pythia-7b (opens in a new tab)EleutherAIyes0.667 (opens in a new tab)0.667 (opens in a new tab)0.265 (opens in a new tab)0.198 (opens in a new tab)0.661 (opens in a new tab)
eleuther-pythia-12b (opens in a new tab)EleutherAIyes0.704 (opens in a new tab)0.704 (opens in a new tab)0.253 (opens in a new tab)0.233 (opens in a new tab)0.638 (opens in a new tab)
falcon-7b (opens in a new tab)TIIyes0.781 (opens in a new tab)0.350 (opens in a new tab)
falcon-40b (opens in a new tab)TIIyes0.853 (opens in a new tab)0.527 (opens in a new tab)
fastchat-t5-3b (opens in a new tab)Lmsys.orgyes951 (opens in a new tab)
gal-120b (opens in a new tab)Meta AIno0.526 (opens in a new tab)
gpt-3-7b / curie (opens in a new tab)OpenAIno0.682 (opens in a new tab)0.243 (opens in a new tab)
gpt-3-175b / davinci (opens in a new tab)OpenAIno0.793 (opens in a new tab)0.789 (opens in a new tab)0.439 (opens in a new tab)0.702 (opens in a new tab)
gpt-3.5-175b / text-davinci-003 (opens in a new tab)OpenAIno0.822 (opens in a new tab)0.834 (opens in a new tab)0.481 (opens in a new tab)0.762 (opens in a new tab)0.569 (opens in a new tab)0.758 (opens in a new tab)0.816 (opens in a new tab)
gpt-3.5-175b / code-davinci-002 (opens in a new tab)OpenAIno0.463 (opens in a new tab)
gpt-4 (opens in a new tab)OpenAIno0.953 (opens in a new tab)0.670 (opens in a new tab)0.864 (opens in a new tab)0.875 (opens in a new tab)
gpt4all-13b-snoozy (opens in a new tab)Nomic AIyes0.750 (opens in a new tab)0.713 (opens in a new tab)
gpt-neox-20b (opens in a new tab)EleutherAIyes0.718 (opens in a new tab)0.719 (opens in a new tab)0.719 (opens in a new tab)0.269 (opens in a new tab)0.276 (opens in a new tab)0.347 (opens in a new tab)
gpt-j-6b (opens in a new tab)EleutherAIyes0.663 (opens in a new tab)0.683 (opens in a new tab)0.683 (opens in a new tab)0.261 (opens in a new tab)0.249 (opens in a new tab)0.234 (opens in a new tab)
koala-13b (opens in a new tab)Berkeley BAIRno1082 (opens in a new tab)0.726 (opens in a new tab)0.688 (opens in a new tab)
llama-7b (opens in a new tab)Meta AIno0.738 (opens in a new tab)0.105 (opens in a new tab)0.738 (opens in a new tab)0.302 (opens in a new tab)0.443 (opens in a new tab)0.701 (opens in a new tab)
llama-13b (opens in a new tab)Meta AIno932 (opens in a new tab)0.792 (opens in a new tab)0.158 (opens in a new tab)0.730 (opens in a new tab)
llama-33b (opens in a new tab)Meta AIno0.828 (opens in a new tab)0.217 (opens in a new tab)0.760 (opens in a new tab)
llama-65b (opens in a new tab)Meta AIno0.842 (opens in a new tab)0.237 (opens in a new tab)0.634 (opens in a new tab)0.770 (opens in a new tab)
llama-2-70b (opens in a new tab)Meta AIyes0.873 (opens in a new tab)0.698 (opens in a new tab)
mpt-7b (opens in a new tab)MosaicMLyes0.761 (opens in a new tab)0.702 (opens in a new tab)0.296 (opens in a new tab)0.343 (opens in a new tab)
oasst-pythia-12b (opens in a new tab)Open Assistantyes1065 (opens in a new tab)0.681 (opens in a new tab)0.650 (opens in a new tab)
opt-7b (opens in a new tab)Meta AIno0.677 (opens in a new tab)0.677 (opens in a new tab)0.251 (opens in a new tab)0.227 (opens in a new tab)
opt-13b (opens in a new tab)Meta AIno0.692 (opens in a new tab)0.692 (opens in a new tab)0.257 (opens in a new tab)0.282 (opens in a new tab)
opt-66b (opens in a new tab)Meta AIno0.745 (opens in a new tab)0.276 (opens in a new tab)
opt-175b (opens in a new tab)Meta AIno0.791 (opens in a new tab)0.318 (opens in a new tab)
palm-62b (opens in a new tab)Google Researchno0.770 (opens in a new tab)
palm-540b (opens in a new tab)Google Researchno0.838 (opens in a new tab)0.834 (opens in a new tab)0.836 (opens in a new tab)0.262 (opens in a new tab)0.779 (opens in a new tab)0.818 (opens in a new tab)0.693 (opens in a new tab)0.814 (opens in a new tab)0.811 (opens in a new tab)0.837 (opens in a new tab)0.851 (opens in a new tab)
palm-coder-540b (opens in a new tab)Google Researchno0.359 (opens in a new tab)
palm-2-s (opens in a new tab)Google Researchno0.820 (opens in a new tab)0.807 (opens in a new tab)0.752 (opens in a new tab)0.779 (opens in a new tab)
palm-2-s* (opens in a new tab)Google Researchno0.376 (opens in a new tab)
palm-2-m (opens in a new tab)Google Researchno0.840 (opens in a new tab)0.837 (opens in a new tab)0.817 (opens in a new tab)0.792 (opens in a new tab)
palm-2-l (opens in a new tab)Google Researchno0.868 (opens in a new tab)0.869 (opens in a new tab)0.861 (opens in a new tab)0.830 (opens in a new tab)
palm-2-l-instruct (opens in a new tab)Google Researchno0.909 (opens in a new tab)
replit-code-v1-3b (opens in a new tab)Replityes0.219 (opens in a new tab)
stablelm-base-alpha-7b (opens in a new tab)Stability AIyes0.412 (opens in a new tab)0.533 (opens in a new tab)0.251 (opens in a new tab)0.049 (opens in a new tab)0.501 (opens in a new tab)
stablelm-tuned-alpha-7b (opens in a new tab)Stability AIno858 (opens in a new tab)0.536 (opens in a new tab)0.548 (opens in a new tab)
starcoder-base-16b (opens in a new tab)BigCodeyes0.304 (opens in a new tab)
starcoder-16b (opens in a new tab)BigCodeyes0.336 (opens in a new tab)
vicuna-13b (opens in a new tab)Lmsys.orgno1169 (opens in a new tab)

LLM ベンチマーク

  1. Chatbot Arena Elo

  2. HellaSwag

  3. HumanEval

  4. LAMBADA

  5. MMLU

  6. TriviaQA

  7. WinoGrande

謝辞と情報源

LLMリーダーボードのデータは、個々の論文およびモデルの著者の結果から丹念に収集されました。詳細な情報源の分析については、llm-leaderboard リポジトリをご覧ください。

特別な感謝を申し上げます:

免責事項

LLMリーダーボードの情報は参考用です。商用モデルの使用については、法律顧問に相談してください。