AI Model Ranking (LLM Leaderboard)
Fastest AI Models
Language models ranked by inference speed and throughput
| Model AI model name and provider organization | Price/1M Cost per 1 million tokens β input (text you send) / output (text the model generates) |
MMLU-Pro
Massive Multitask Language Understanding (Professional) - tests broad knowledge across 14 subjects including STEM, humanities, and social sciences | Speed Inference throughput in tokens per second - how fast the model generates responses |
GPQA
Graduate-level Google-Proof Q&A benchmark - tests PhD-level reasoning and advanced intelligence |
AIME 2025
American Invitational Mathematics Examination 2025 - tests advanced mathematical problem-solving ability | Release When the model was released - newer models may have more capabilities | Compare |
|---|---|---|---|---|---|---|---|
| #1 Mercury 2 by Inception | $0.25 / $0.75 | - | 869 tok/s | 77.0% | - | Feb 20, 2026 | |
| #2 Granite 4.0 H Small by IBM | $0.06 / $0.25 | 62.4% | 482 tok/s | 41.6% | 13.7% | Sep 22, 2025 | |
| #3 Granite 3.3 8B (Non-reasoning) by IBM | $0.03 / $0.25 | 46.8% | 409 tok/s | 33.8% | 6.7% | Apr 16, 2025 | |
| #4 Gemini 3.1 Flash-Lite Preview by Google | $0.25 / $1.50 | - | 335 tok/s | 82.2% | - | Mar 3, 2026 | |
| #5 Nova Micro by Amazon | $0.04 / $0.14 | 53.1% | 333 tok/s | 35.8% | 6.0% | Dec 3, 2024 | |
| #6 gpt-oss-20B (high) by OpenAI | $0.07 / $0.20 | 74.8% | 294 tok/s | 68.8% | 89.3% | Aug 5, 2025 | |
| #7 Ministral 3 3B by Mistral | $0.10 / $0.10 | 52.4% | 289 tok/s | 35.8% | 22.0% | Dec 2, 2025 | |
| #8 Qwen3.5 0.8B (Non-reasoning) by Alibaba | $0.01 / $0.05 | - | 283 tok/s | 23.6% | - | Mar 2, 2026 | |
| #9 Gemini 2.5 Flash-Lite (Non-reasoning) by Google | $0.10 / $0.40 | 72.4% | 279 tok/s | 47.4% | 35.3% | Jun 17, 2025 | |
| #10 Gemini 2.5 Flash-Lite (Reasoning) by Google | $0.10 / $0.40 | 75.9% | 278 tok/s | 62.5% | 53.3% | Jun 17, 2025 | |
| #11 gpt-oss-20B (low) by OpenAI | $0.07 / $0.20 | 71.8% | 272 tok/s | 61.1% | 62.3% | Aug 5, 2025 | |
| #12 Sarvam 30B (high) by Sarvam | N/A / N/A | - | 271 tok/s | 63.3% | - | Mar 6, 2026 | |
| #13 Qwen3.5 2B (Non-reasoning) by Alibaba | $0.02 / $0.10 | - | 241 tok/s | 43.8% | - | Mar 2, 2026 | |
| #14 Qwen3.6 35B A3B (Reasoning) by Alibaba | $0.38 / $2.25 | - | 238 tok/s | 84.1% | - | Apr 16, 2026 | |
| #15 Nova Lite by Amazon | $0.06 / $0.24 | 59.0% | 228 tok/s | 43.3% | 7.0% | Dec 3, 2024 | |
| #16 Gemini 2.5 Flash (Reasoning) by Google | $0.30 / $2.50 | 83.2% | 227 tok/s | 79.0% | 73.3% | May 20, 2025 | |
| #17 Nova 2.0 Omni (Non-reasoning) by Amazon | $0.30 / $2.50 | 71.9% | 223 tok/s | 55.5% | 37.0% | Nov 26, 2025 | |
| #18 Grok 4.20 0309 v2 (Reasoning) by xAI | $2.00 / $6.00 | - | 221 tok/s | 91.1% | - | Apr 7, 2026 | |
| #19 Grok 3 mini Reasoning (high) by xAI | $0.30 / $0.50 | 82.8% | 217 tok/s | 79.1% | 84.7% | Feb 19, 2025 | |
| #20 Grok 4.20 0309 (Reasoning) by xAI | $2.00 / $6.00 | - | 215 tok/s | 88.5% | - | Mar 10, 2026 | |
| #21 gpt-oss-120B (high) by OpenAI | $0.15 / $0.60 | 80.8% | 212 tok/s | 78.2% | 93.4% | Aug 5, 2025 | |
| #22 gpt-oss-120B (low) by OpenAI | $0.15 / $0.60 | 77.5% | 210 tok/s | 67.2% | 66.7% | Aug 5, 2025 | |
| #23 Grok 4 Fast (Reasoning) by xAI | $0.20 / $0.50 | 85.0% | 208 tok/s | 84.7% | 89.7% | Sep 19, 2025 | |
| #24 GPT-5 Codex (high) by OpenAI | $1.25 / $10.00 | 86.5% | 208 tok/s | 83.7% | 98.7% | Sep 23, 2025 | |
| #25 Devstral Small (Jul '25) by Mistral | $0.10 / $0.30 | 62.2% | 208 tok/s | 41.4% | 29.3% | Jul 10, 2025 | |
| #26 GPT-5.1 Codex mini (high) by OpenAI | $0.25 / $2.00 | 82.0% | 208 tok/s | 81.3% | 91.7% | Nov 13, 2025 | |
| #27 Gemini 3 Flash Preview (Non-reasoning) by Google | $0.50 / $3.00 | 88.2% | 204 tok/s | 81.2% | 55.7% | Dec 17, 2025 | |
| #28 Grok 4 Fast (Non-reasoning) by xAI | $0.20 / $0.50 | 73.0% | 204 tok/s | 60.6% | 41.3% | Sep 19, 2025 | |
| #29 Nova 2.0 Lite (low) by Amazon | $0.30 / $2.50 | 78.8% | 199 tok/s | 69.8% | 46.7% | Oct 29, 2025 | |
| #30 Nova 2.0 Lite (medium) by Amazon | $0.30 / $2.50 | 81.3% | 197 tok/s | 76.8% | 88.7% | Oct 29, 2025 | |
| #31 Gemini 3 Flash Preview (Reasoning) by Google | $0.50 / $3.00 | 89.0% | 196 tok/s | 89.8% | 97.0% | Dec 17, 2025 | |
| #32 Qwen3 0.6B (Non-reasoning) by Alibaba | $0.11 / $0.42 | 23.1% | 195 tok/s | 23.1% | 10.3% | Apr 28, 2025 | |
| #33 Mistral 7B Instruct by Mistral | $0.25 / $0.25 | 24.5% | 193 tok/s | 17.7% | - | Sep 27, 2023 | |
| #34 Ministral 3 8B by Mistral | $0.15 / $0.15 | 64.2% | 192 tok/s | 47.1% | 31.7% | Dec 2, 2025 | |
| #35 GPT-4.1 nano by OpenAI | $0.10 / $0.40 | 65.7% | 192 tok/s | 51.2% | 24.0% | Apr 14, 2025 | |
| #36 GPT-5.4 mini (xhigh) by OpenAI | $0.75 / $4.50 | - | 188 tok/s | 87.5% | - | Mar 17, 2026 | |
| #37 Step 3.5 Flash 2603 by StepFun | N/A / N/A | - | 188 tok/s | 82.6% | - | Apr 2, 2026 | |
| #38 Nova 2.0 Lite (high) by Amazon | $0.30 / $2.50 | 81.8% | 188 tok/s | 81.1% | 94.3% | Oct 29, 2025 | |
| #39 Qwen3.5 4B (Non-reasoning) by Alibaba | $0.03 / $0.15 | - | 188 tok/s | 71.2% | - | Mar 2, 2026 | |
| #40 Jamba 1.6 Mini by AI21 Labs | $0.20 / $0.40 | 36.7% | 186 tok/s | 30.0% | - | Mar 6, 2025 | |
| #41 Qwen3.5 4B (Reasoning) by Alibaba | $0.03 / $0.15 | - | 185 tok/s | 77.1% | - | Mar 2, 2026 | |
| #42 Qwen3 0.6B (Reasoning) by Alibaba | $0.11 / $1.26 | 34.7% | 185 tok/s | 23.9% | 18.0% | Apr 28, 2025 | |
| #43 Gemini 2.5 Flash (Non-reasoning) by Google | $0.30 / $2.50 | 80.9% | 183 tok/s | 68.3% | 60.3% | May 20, 2025 | |
| #44 Grok Code Fast 1 by xAI | $0.20 / $1.50 | 79.3% | 183 tok/s | 72.7% | 43.3% | Aug 28, 2025 | |
| #45 Nova 2.0 Pro Preview (Non-reasoning) by Amazon | $1.25 / $10.00 | 77.2% | 182 tok/s | 63.6% | 30.7% | Nov 27, 2025 | |
| #46 GPT-4o (Nov '24) by OpenAI | $2.50 / $10.00 | 74.8% | 182 tok/s | 54.3% | 6.0% | Nov 20, 2024 | |
| #47 Grok 4.20 0309 (Non-reasoning) by xAI | $2.00 / $6.00 | - | 177 tok/s | 78.5% | - | Mar 10, 2026 | |
| #48 GPT-5.4 mini (medium) by OpenAI | $0.75 / $4.50 | - | 177 tok/s | 82.3% | - | Mar 17, 2026 | |
| #49 Magistral Small 1.2 by Mistral | $0.50 / $1.50 | 76.8% | 177 tok/s | 66.3% | 80.3% | Sep 17, 2025 | |
| #50 Mistral Small 4 (Reasoning) by Mistral | $0.15 / $0.60 | - | 175 tok/s | 76.9% | - | Mar 16, 2026 | |
| #51 Nova 2.0 Lite (Non-reasoning) by Amazon | $0.30 / $2.50 | 74.3% | 173 tok/s | 60.3% | 33.7% | Oct 29, 2025 | |
| #52 Qwen3 Next 80B A3B Instruct by Alibaba | $0.50 / $2.00 | 81.9% | 172 tok/s | 73.8% | 66.3% | Sep 11, 2025 | |
| #53 GPT-5.1 Codex (high) by OpenAI | $1.25 / $10.00 | 86.0% | 170 tok/s | 86.0% | 95.7% | Nov 13, 2025 | |
| #54 Qwen3.5 Omni Flash by Alibaba | $0.10 / $0.80 | - | 170 tok/s | 74.2% | - | Mar 30, 2026 | |
| #55 NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) by NVIDIA | $0.20 / $0.60 | 64.9% | 170 tok/s | 43.9% | 26.7% | Oct 28, 2025 | |
| #56 Step 3.5 Flash by StepFun | $0.10 / $0.30 | - | 169 tok/s | 83.1% | - | Feb 2, 2026 | |
| #57 Grok 4.20 0309 v2 (Non-reasoning) by xAI | $2.00 / $6.00 | - | 169 tok/s | 77.6% | - | Apr 7, 2026 | |
| #58 GPT-5.4 nano (xhigh) by OpenAI | $0.20 / $1.25 | - | 168 tok/s | 81.7% | - | Mar 17, 2026 | |
| #59 Qwen3 Next 80B A3B (Reasoning) by Alibaba | $0.50 / $6.00 | 82.4% | 168 tok/s | 75.9% | 84.3% | Sep 11, 2025 | |
| #60 Qwen3.5 122B A10B (Reasoning) by Alibaba | $0.40 / $3.20 | - | 162 tok/s | 85.7% | - | Feb 24, 2026 | |
| #61 NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) by NVIDIA | $0.06 / $0.24 | 79.4% | 162 tok/s | 75.7% | 91.0% | Dec 15, 2025 | |
| #62 GPT-5.4 nano (medium) by OpenAI | $0.20 / $1.25 | - | 161 tok/s | 76.1% | - | Mar 17, 2026 | |
| #63 Mistral Small 3.2 by Mistral | $0.10 / $0.30 | 68.1% | 160 tok/s | 50.5% | 27.0% | Jun 20, 2025 | |
| #64 GPT-5.4 mini (Non-Reasoning) by OpenAI | $0.75 / $4.50 | - | 160 tok/s | 60.6% | - | Mar 17, 2026 | |
| #65 Llama 3.1 Instruct 8B by Meta | $0.10 / $0.10 | 47.6% | 160 tok/s | 25.9% | 4.3% | Jul 23, 2024 | |
| #66 NVIDIA Nemotron 3 Super 120B A12B (Reasoning) by NVIDIA | $0.30 / $0.75 | - | 159 tok/s | 80.0% | - | Mar 11, 2026 | |
| #67 o3-mini by OpenAI | $1.10 / $4.40 | 79.1% | 158 tok/s | 74.8% | - | Jan 31, 2025 | |
| #68 o3-mini (high) by OpenAI | $1.10 / $4.40 | 80.2% | 156 tok/s | 77.3% | - | Jan 31, 2025 | |
| #69 Grok 4.1 Fast (Reasoning) by xAI | $0.20 / $0.50 | 85.4% | 155 tok/s | 85.3% | 89.3% | Nov 19, 2025 | |
| #70 Qwen3.5 122B A10B (Non-reasoning) by Alibaba | $0.40 / $3.20 | - | 155 tok/s | 82.7% | - | Feb 24, 2026 | |
| #71 GPT-5 (ChatGPT) by OpenAI | $1.25 / $10.00 | 82.0% | 155 tok/s | 68.6% | 48.3% | Aug 7, 2025 | |
| #72 Mistral Small (Sep '24) by Mistral | $0.20 / $0.60 | 52.9% | 155 tok/s | 38.1% | - | Sep 17, 2024 | |
| #73 GPT-5.4 nano (Non-Reasoning) by OpenAI | $0.20 / $1.25 | - | 154 tok/s | 55.8% | - | Mar 17, 2026 | |
| #74 Nova 2.0 Pro Preview (low) by Amazon | $1.25 / $10.00 | 82.2% | 154 tok/s | 75.1% | 63.3% | Nov 27, 2025 | |
| #75 GPT-5 nano (medium) by OpenAI | $0.05 / $0.40 | 77.2% | 154 tok/s | 67.0% | 78.3% | Aug 7, 2025 | |
| #76 Qwen3 Coder Next by Alibaba | $0.35 / $1.20 | - | 154 tok/s | 73.7% | - | Feb 3, 2026 | |
| #77 NVIDIA Nemotron Nano 12B v2 VL (Reasoning) by NVIDIA | $0.20 / $0.60 | 75.9% | 152 tok/s | 57.2% | 75.0% | Oct 28, 2025 | |
| #78 Mistral Small 3 by Mistral | $0.10 / $0.30 | 65.2% | 152 tok/s | 46.2% | 4.3% | Jan 30, 2025 | |
| #79 Grok 4.1 Fast (Non-reasoning) by xAI | $0.20 / $0.50 | 74.3% | 151 tok/s | 63.7% | 34.3% | Nov 19, 2025 | |
| #80 Mistral Small 3.1 by Mistral | $0.10 / $0.30 | 65.9% | 149 tok/s | 45.4% | 3.7% | Mar 17, 2025 | |
| #81 Qwen3 30B A3B 2507 (Reasoning) by Alibaba | $0.20 / $2.40 | 80.5% | 148 tok/s | 70.7% | 56.3% | Jul 30, 2025 | |
| #82 LFM2 24B A2B by Liquid AI | $0.03 / $0.12 | - | 148 tok/s | 47.4% | - | Feb 25, 2026 | |
| #83 Mistral Small 4 (Non-reasoning) by Mistral | $0.15 / $0.60 | - | 147 tok/s | 57.1% | - | Mar 16, 2026 | |
| #84 GPT-5 nano (high) by OpenAI | $0.05 / $0.40 | 78.0% | 147 tok/s | 67.6% | 83.7% | Aug 7, 2025 | |
| #85 Mistral Small (Feb '24) by Mistral | $1.00 / $3.00 | 41.9% | 147 tok/s | 30.2% | - | Feb 26, 2024 | |
| #86 Qwen3.5 35B A3B (Reasoning) by Alibaba | $0.25 / $2.00 | - | 146 tok/s | 84.5% | - | Feb 24, 2026 | |
| #87 Qwen3 VL 8B Instruct by Alibaba | $0.18 / $0.70 | 68.6% | 145 tok/s | 42.7% | 27.3% | Oct 14, 2025 | |
| #88 Nova 2.0 Pro Preview (medium) by Amazon | $1.25 / $10.00 | 83.0% | 144 tok/s | 78.5% | 89.0% | Nov 27, 2025 | |
| #89 LongCat Flash Lite by LongCat | N/A / N/A | - | 143 tok/s | 63.6% | - | Jan 28, 2026 | |
| #90 Claude 4.5 Haiku (Reasoning) by Anthropic | $1.00 / $5.00 | 76.0% | 143 tok/s | 67.2% | 83.7% | Oct 15, 2025 | |
| #91 Qwen3.5 9B (Non-reasoning) by Alibaba | $0.04 / $0.20 | - | 143 tok/s | 78.6% | - | Mar 2, 2026 | |
| #92 GPT-5 nano (minimal) by OpenAI | $0.05 / $0.40 | 55.6% | 142 tok/s | 42.8% | 27.3% | Aug 7, 2025 | |
| #93 o4-mini (high) by OpenAI | $1.10 / $4.40 | 83.2% | 141 tok/s | 78.4% | 90.7% | Apr 16, 2025 | |
| #94 Qwen3 1.7B (Non-reasoning) by Alibaba | $0.11 / $0.42 | 41.1% | 141 tok/s | 28.3% | 7.3% | Apr 28, 2025 | |
| #95 Qwen3.5 35B A3B (Non-reasoning) by Alibaba | $0.25 / $2.00 | - | 141 tok/s | 81.9% | - | Feb 24, 2026 | |
| #96 Qwen3 1.7B (Reasoning) by Alibaba | $0.11 / $1.26 | 57.0% | 140 tok/s | 35.6% | 38.7% | Apr 28, 2025 | |
| #97 Devstral Medium by Mistral | $0.40 / $2.00 | 70.8% | 140 tok/s | 49.2% | 4.7% | Jul 10, 2025 | |
| #98 MiMo-V2-Flash (Non-reasoning) by Xiaomi | $0.10 / $0.30 | 74.4% | 139 tok/s | 65.6% | 67.7% | Dec 16, 2025 | |
| #99 GLM-4.7-Flash (Non-reasoning) by Z AI | $0.07 / $0.40 | - | 139 tok/s | 45.2% | - | Jan 19, 2026 | |
| #100 MiMo-V2-Flash (Reasoning) by Xiaomi | $0.10 / $0.30 | 84.3% | 134 tok/s | 84.6% | 96.3% | Dec 16, 2025 |
Showing 100 of 477 models
Chat with 100+ AI Models in one App.
Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.
Understanding the AI Model Leaderboard
This comprehensive AI model leaderboard helps you compare and choose the best large language models (LLMs) for your needs. We track standardized AI benchmarks, token pricing, inference speed, and model capabilities across all major AI providers like OpenAI, Anthropic, Google, Meta, and DeepSeek.
Core AI Benchmarks Explained
Key Metrics to Consider
How to Choose the Right AI Model for Your Use Case
For Research & Analysis
Prioritize models with high MMLU-Pro (70%+) and GPQA (60%+) scores for complex reasoning tasks, academic research, and technical documentation
For Cost Optimization
Sort by input/output pricing - smaller models often deliver 80% of flagship performance at 10% of the cost for simple tasks
For Math & STEM
Filter by Math Index or AIME 2025 scores (50%+) for quantitative analysis, engineering calculations, and scientific applications
All benchmark scores and pricing data are updated daily from Artificial Analysis to reflect the latest model versions and capabilities. Use the sort filters above to find AI models by intelligence, cost, coding ability, math performance, speed, or release date.
Frequently Asked Questions
What is MMLU-Pro and why is it the standard AI intelligence benchmark?
MMLU-Pro (Massive Multitask Language Understanding - Professional) is the most comprehensive AI benchmark, testing models across 14 academic subjects including mathematics, science, history, law, and ethics. Scores range from 46% (basic competency) to 87% (near-expert level). Models scoring above 75% demonstrate strong general intelligence suitable for professional applications, while scores below 60% indicate limitations in complex reasoning tasks.
What does GPQA measure and which models score highest?
GPQA (Graduate-level Google-Proof Q&A) tests PhD-level reasoning with questions designed to be "Google-proof" - requiring deep understanding rather than simple fact retrieval. Top models like GPT-5.1 (87.3%), GPT-5 mini (82.8%), and o3 (82.7%) excel at GPQA, making them ideal for research, technical analysis, and complex problem-solving. Models below 50% GPQA struggle with advanced reasoning and may provide superficial answers to complex questions.
What is AIME 2025 and how does it evaluate AI mathematical ability?
AIME 2025 (American Invitational Mathematics Examination) is an elite math competition benchmark that tests advanced problem-solving, algebra, geometry, and number theory. Scores above 80% (like GPT-5 Codex at 98.7% or GPT-5.1 at 94%) indicate exceptional mathematical reasoning suitable for engineering, scientific computing, and quantitative analysis. Models scoring below 50% may struggle with multi-step mathematical problems or require explicit problem breakdown.
How is AI model pricing calculated and what's considered cost-effective?
AI model pricing is measured per 1 million tokens (approximately 750,000 words). Input pricing covers text you send, while output pricing covers generated responses. Budget models like Llama 3.3 70B cost $0.54/$0.71 per million tokens, mid-tier models like GPT-5 nano cost $0.05/$0.40, while premium models like GPT-5 cost $1.25/$10. For typical applications with 3:1 input-to-output ratio, budget models can be 10-20x cheaper than flagship models while maintaining 70-80% performance.
Which AI models are best for coding and programming tasks?
Sort by Coding Index to see top programming models. Our Coding Index combines LiveCodeBench, SciCode, and coding benchmarks. Top performers include GPT-5.1 (57.5 index), GPT-5 mini (51.4), and GPT-5 Codex (53.5). These models excel at code generation, debugging, refactoring, and explaining complex algorithms. For budget-conscious developers, models with 40+ coding index scores offer excellent value for routine programming tasks.
How often are AI model benchmarks and rankings updated?
Our leaderboard syncs daily with Artificial Analysis API to ensure benchmark scores (MMLU-Pro, GPQA, AIME 2025), pricing, and inference speed data reflect the latest model versions. New model releases appear immediately under the "Newest" sort option. Benchmark scores can change when providers release updated versions - for example, GPT-5.1 released in November 2025 achieved 69.7 intelligence compared to GPT-5's 68.5 from August 2025.
What inference speed (tokens/second) do I need for my application?
Inference speed determines how fast models generate responses. For real-time chatbots and interactive applications, target 100+ tokens/second (models like gpt-oss-120B at 340 tok/s). For background processing and batch jobs, 50-100 tok/s is sufficient. Premium reasoning models like GPT-5 (103 tok/s) balance speed and capability. Note that higher inference speed doesn't always mean better quality - slower models often deliver more thoughtful, detailed responses.
Can I test these AI models for free before committing?
Yes! Try our free AI chat interface to test different models instantly without creating an account. Many providers also offer free tiers: OpenAI (ChatGPT with daily limits), Anthropic (Claude with usage caps), Google (Gemini free tier), and open-source models like Llama 3.3. Compare performance on your specific use case before upgrading to paid plans.