Want to Become a Sponsor? Contact Us Now!🎉

🏆 LLM Leaderboard

Welcome to the LLM Leaderboard, the definitive platform for LLM model performance metrics. Our mission is to provide a centralized and comprehensive overview of various LLM models, allowing users to compare and contrast their capabilities.

Open Models: At LLM Leaderboard, we champion transparency. Models labeled as "open" can be locally deployed and utilized for commercial endeavors.

Featured LLM Models on the LLM Leaderboard

VLMs and Other LLM Tools

LLM Leaderboard

Model NamePublisherOpen?Chatbot Arena EloHellaSwag (few-shot)HellaSwag (zero-shot)HellaSwag (one-shot)HumanEval-Python (pass@1)LAMBADA (zero-shot)LAMBADA (one-shot)MMLU (zero-shot)MMLU (few-shot)TriviaQA (zero-shot)TriviaQA (one-shot)WinoGrande (zero-shot)WinoGrande (one-shot)WinoGrande (few-shot)
alpaca-7b (opens in a new tab)Stanfordno0.739 (opens in a new tab)0.661 (opens in a new tab)
alpaca-13b (opens in a new tab)Stanfordno1008 (opens in a new tab)
bloom-176b (opens in a new tab)BigScienceyes0.744 (opens in a new tab)0.155 (opens in a new tab)0.299 (opens in a new tab)
cerebras-gpt-7b (opens in a new tab)Cerebrasyes0.636 (opens in a new tab)0.636 (opens in a new tab)0.259 (opens in a new tab)0.141 (opens in a new tab)
cerebras-gpt-13b (opens in a new tab)Cerebrasyes0.635 (opens in a new tab)0.635 (opens in a new tab)0.258 (opens in a new tab)0.146 (opens in a new tab)
chatglm-6b (opens in a new tab)ChatGLMyes985 (opens in a new tab)
chinchilla-70b (opens in a new tab)DeepMindno0.808 (opens in a new tab)0.774 (opens in a new tab)0.675 (opens in a new tab)0.749 (opens in a new tab)
codex-12b / code-cushman-001 (opens in a new tab)OpenAIno0.317 (opens in a new tab)
codegen-16B-mono (opens in a new tab)Salesforceyes0.293 (opens in a new tab)
codegen-16B-multi (opens in a new tab)Salesforceyes0.183 (opens in a new tab)
codegx-13b (opens in a new tab)Tsinghua Universityno0.229 (opens in a new tab)
dolly-v2-12b (opens in a new tab)Databricksyes944 (opens in a new tab)0.710 (opens in a new tab)0.622 (opens in a new tab)
eleuther-pythia-7b (opens in a new tab)EleutherAIyes0.667 (opens in a new tab)0.667 (opens in a new tab)0.265 (opens in a new tab)0.198 (opens in a new tab)0.661 (opens in a new tab)
eleuther-pythia-12b (opens in a new tab)EleutherAIyes0.704 (opens in a new tab)0.704 (opens in a new tab)0.253 (opens in a new tab)0.233 (opens in a new tab)0.638 (opens in a new tab)
falcon-7b (opens in a new tab)TIIyes0.781 (opens in a new tab)0.350 (opens in a new tab)
falcon-40b (opens in a new tab)TIIyes0.853 (opens in a new tab)0.527 (opens in a new tab)
fastchat-t5-3b (opens in a new tab)Lmsys.orgyes951 (opens in a new tab)
gal-120b (opens in a new tab)Meta AIno0.526 (opens in a new tab)
gpt-3-7b / curie (opens in a new tab)OpenAIno0.682 (opens in a new tab)0.243 (opens in a new tab)
gpt-3-175b / davinci (opens in a new tab)OpenAIno0.793 (opens in a new tab)0.789 (opens in a new tab)0.439 (opens in a new tab)0.702 (opens in a new tab)
gpt-3.5-175b / text-davinci-003 (opens in a new tab)OpenAIno0.822 (opens in a new tab)0.834 (opens in a new tab)0.481 (opens in a new tab)0.762 (opens in a new tab)0.569 (opens in a new tab)0.758 (opens in a new tab)0.816 (opens in a new tab)
gpt-3.5-175b / code-davinci-002 (opens in a new tab)OpenAIno0.463 (opens in a new tab)
gpt-4 (opens in a new tab)OpenAIno0.953 (opens in a new tab)0.670 (opens in a new tab)0.864 (opens in a new tab)0.875 (opens in a new tab)
gpt4all-13b-snoozy (opens in a new tab)Nomic AIyes0.750 (opens in a new tab)0.713 (opens in a new tab)
gpt-neox-20b (opens in a new tab)EleutherAIyes0.718 (opens in a new tab)0.719 (opens in a new tab)0.719 (opens in a new tab)0.269 (opens in a new tab)0.276 (opens in a new tab)0.347 (opens in a new tab)
gpt-j-6b (opens in a new tab)EleutherAIyes0.663 (opens in a new tab)0.683 (opens in a new tab)0.683 (opens in a new tab)0.261 (opens in a new tab)0.249 (opens in a new tab)0.234 (opens in a new tab)
koala-13b (opens in a new tab)Berkeley BAIRno1082 (opens in a new tab)0.726 (opens in a new tab)0.688 (opens in a new tab)
llama-7b (opens in a new tab)Meta AIno0.738 (opens in a new tab)0.105 (opens in a new tab)0.738 (opens in a new tab)0.302 (opens in a new tab)0.443 (opens in a new tab)0.701 (opens in a new tab)
llama-13b (opens in a new tab)Meta AIno932 (opens in a new tab)0.792 (opens in a new tab)0.158 (opens in a new tab)0.730 (opens in a new tab)
llama-33b (opens in a new tab)Meta AIno0.828 (opens in a new tab)0.217 (opens in a new tab)0.760 (opens in a new tab)
llama-65b (opens in a new tab)Meta AIno0.842 (opens in a new tab)0.237 (opens in a new tab)0.634 (opens in a new tab)0.770 (opens in a new tab)
llama-2-70b (opens in a new tab)Meta AIyes0.873 (opens in a new tab)0.698 (opens in a new tab)
mpt-7b (opens in a new tab)MosaicMLyes0.761 (opens in a new tab)0.702 (opens in a new tab)0.296 (opens in a new tab)0.343 (opens in a new tab)
oasst-pythia-12b (opens in a new tab)Open Assistantyes1065 (opens in a new tab)0.681 (opens in a new tab)0.650 (opens in a new tab)
opt-7b (opens in a new tab)Meta AIno0.677 (opens in a new tab)0.677 (opens in a new tab)0.251 (opens in a new tab)0.227 (opens in a new tab)
opt-13b (opens in a new tab)Meta AIno0.692 (opens in a new tab)0.692 (opens in a new tab)0.257 (opens in a new tab)0.282 (opens in a new tab)
opt-66b (opens in a new tab)Meta AIno0.745 (opens in a new tab)0.276 (opens in a new tab)
opt-175b (opens in a new tab)Meta AIno0.791 (opens in a new tab)0.318 (opens in a new tab)
palm-62b (opens in a new tab)Google Researchno0.770 (opens in a new tab)
palm-540b (opens in a new tab)Google Researchno0.838 (opens in a new tab)0.834 (opens in a new tab)0.836 (opens in a new tab)0.262 (opens in a new tab)0.779 (opens in a new tab)0.818 (opens in a new tab)0.693 (opens in a new tab)0.814 (opens in a new tab)0.811 (opens in a new tab)0.837 (opens in a new tab)0.851 (opens in a new tab)
palm-coder-540b (opens in a new tab)Google Researchno0.359 (opens in a new tab)
palm-2-s (opens in a new tab)Google Researchno0.820 (opens in a new tab)0.807 (opens in a new tab)0.752 (opens in a new tab)0.779 (opens in a new tab)
palm-2-s* (opens in a new tab)Google Researchno0.376 (opens in a new tab)
palm-2-m (opens in a new tab)Google Researchno0.840 (opens in a new tab)0.837 (opens in a new tab)0.817 (opens in a new tab)0.792 (opens in a new tab)
palm-2-l (opens in a new tab)Google Researchno0.868 (opens in a new tab)0.869 (opens in a new tab)0.861 (opens in a new tab)0.830 (opens in a new tab)
palm-2-l-instruct (opens in a new tab)Google Researchno0.909 (opens in a new tab)
replit-code-v1-3b (opens in a new tab)Replityes0.219 (opens in a new tab)
stablelm-base-alpha-7b (opens in a new tab)Stability AIyes0.412 (opens in a new tab)0.533 (opens in a new tab)0.251 (opens in a new tab)0.049 (opens in a new tab)0.501 (opens in a new tab)
stablelm-tuned-alpha-7b (opens in a new tab)Stability AIno858 (opens in a new tab)0.536 (opens in a new tab)0.548 (opens in a new tab)
starcoder-base-16b (opens in a new tab)BigCodeyes0.304 (opens in a new tab)
starcoder-16b (opens in a new tab)BigCodeyes0.336 (opens in a new tab)
vicuna-13b (opens in a new tab)Lmsys.orgno1169 (opens in a new tab)

LLM Benchmarks

  1. Chatbot Arena Elo

  2. HellaSwag

  3. HumanEval

  4. LAMBADA

  5. MMLU

  6. TriviaQA

  7. WinoGrande

Acknowledgements & Sources

Data on the LLM Leaderboard is meticulously sourced from individual papers and model authors' results. For a detailed source breakdown, visit the llm-leaderboard repository.

Special thanks to:

Disclaimer

Information on the LLM Leaderboard is for reference. For commercial model usage, consult legal counsel.