AI Model Ranking (LLM Leaderboard)

Newest AI Models

Latest language model releases sorted by date

Model AI model name and provider organization	Input/1M Cost per 1 million input tokens (text you send to the model)	Output/1M Cost per 1 million output tokens (text the model generates for you)	MMLU-Pro Massive Multitask Language Understanding (Professional) - tests broad knowledge across 14 subjects including STEM, humanities, and social sciences	GPQA Graduate-level Google-Proof Q&A benchmark - tests PhD-level reasoning and advanced intelligence	AIME 2025 American Invitational Mathematics Examination 2025 - tests advanced mathematical problem-solving ability	Release When the model was released - newer models may have more capabilities	Compare
#1 Mercury 2 by Inception	$0.25	$0.75	-	77.0%	-	Feb 20, 2026	Chat now
#2 Gemini 3.1 Pro Preview by Google	$2.00	$12.00	-	94.1%	-	Feb 19, 2026	Chat now
#3 Claude Sonnet 4.6 (Non-reasoning, High Effort) by Anthropic	$3.00	$15.00	-	79.9%	-	Feb 17, 2026	Chat now
#4 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) by Anthropic	$3.00	$15.00	-	87.5%	-	Feb 17, 2026	Chat now
#5 Claude Sonnet 4.6 (Non-reasoning, Low Effort) by Anthropic	$3.00	$15.00	-	79.7%	-	Feb 17, 2026	Chat now
#6 Tiny Aya Global by Cohere	N/A	N/A	-	30.5%	-	Feb 17, 2026	Chat now
#7 Qwen3.5 397B A17B (Reasoning) by Alibaba	$0.60	$3.60	-	89.3%	-	Feb 16, 2026	Chat now
#8 Qwen3.5 397B A17B (Non-reasoning) by Alibaba	$0.60	$3.60	-	86.1%	-	Feb 16, 2026	Chat now
#9 Doubao Seed 2.0 lite (Reasoning) by ByteDance Seed	N/A	N/A	-	65.6%	-	Feb 15, 2026	Chat now
#10 MiniMax-M2.5 by MiniMax	$0.30	$1.20	-	84.8%	-	Feb 12, 2026	Chat now
#11 GLM-5 (Reasoning) by Z AI	$1.00	$3.20	-	82.0%	-	Feb 11, 2026	Chat now
#12 GLM-5 (Non-reasoning) by Z AI	$1.00	$3.20	-	66.6%	-	Feb 11, 2026	Chat now
#13 Tri-21B-Think by Trillion Labs	N/A	N/A	-	60.1%	-	Feb 10, 2026	Chat now
#14 Tri-21B-think Preview by Trillion Labs	N/A	N/A	-	53.8%	-	Feb 10, 2026	Chat now
#15 Claude Opus 4.6 (Adaptive Reasoning, Max Effort) by Anthropic	$5.00	$25.00	-	89.6%	-	Feb 5, 2026	Chat now
#16 Claude Opus 4.6 (Non-reasoning, High Effort) by Anthropic	$5.00	$25.00	-	84.0%	-	Feb 5, 2026	Chat now
#17 Qwen3 Coder Next by Alibaba	$0.20	$1.20	-	73.7%	-	Feb 3, 2026	Chat now
#18 Kimi K2.5 (Reasoning) by Kimi	$0.60	$3.00	-	87.9%	-	Jan 27, 2026	Chat now
#19 Kimi K2.5 (Non-reasoning) by Kimi	$0.60	$3.00	-	78.9%	-	Jan 27, 2026	Chat now
#20 Qwen3 Max Thinking by Alibaba	$1.20	$6.00	-	86.1%	-	Jan 26, 2026	Chat now
#21 LFM2.5-1.2B-Thinking by Liquid AI	N/A	N/A	-	33.9%	-	Jan 20, 2026	Chat now
#22 Step3 VL 10B by StepFun	N/A	N/A	-	69.0%	-	Jan 20, 2026	Chat now
#23 GLM-4.7-Flash (Reasoning) by Z AI	$0.07	$0.40	-	58.1%	-	Jan 19, 2026	Chat now
#24 GLM-4.7-Flash (Non-reasoning) by Z AI	$0.07	$0.40	-	45.2%	-	Jan 19, 2026	Chat now
#25 Olmo 3.1 32B Instruct by Allen Institute for AI	$0.20	$0.60	-	53.9%	-	Jan 13, 2026	Chat now
#26 LFM2.5-VL-1.6B by Liquid AI	N/A	N/A	-	28.9%	-	Jan 5, 2026	Chat now
#27 LFM2.5-1.2B-Instruct by Liquid AI	N/A	N/A	-	32.6%	-	Jan 5, 2026	Chat now
#28 Falcon-H1R-7B by TII UAE	N/A	N/A	72.5%	66.1%	80.0%	Jan 4, 2026	Chat now
#29 K-EXAONE (Reasoning) by LG AI Research	N/A	N/A	83.8%	78.3%	90.3%	Dec 31, 2025	Chat now
#30 K-EXAONE (Non-reasoning) by LG AI Research	N/A	N/A	81.0%	69.5%	44.0%	Dec 31, 2025	Chat now
#31 HyperCLOVA X SEED Think (32B) by Naver	N/A	N/A	78.5%	61.5%	59.0%	Dec 26, 2025	Chat now
#32 MiniMax-M2.1 by MiniMax	$0.30	$1.20	87.5%	83.0%	82.7%	Dec 23, 2025	Chat now
#33 GLM-4.7 (Reasoning) by Z AI	$0.55	$2.15	85.6%	85.9%	95.0%	Dec 22, 2025	Chat now
#34 GLM-4.7 (Non-reasoning) by Z AI	$0.55	$2.15	79.4%	66.4%	48.0%	Dec 22, 2025	Chat now
#35 Gemini 3 Flash Preview (Non-reasoning) by Google	$0.50	$3.00	88.2%	81.2%	55.7%	Dec 17, 2025	Chat now
#36 Gemini 3 Flash Preview (Reasoning) by Google	$0.50	$3.00	89.0%	89.8%	97.0%	Dec 17, 2025	Chat now
#37 Grok Voice Agent by xAI	N/A	N/A	-	-	-	Dec 17, 2025	Chat now
#38 Solar Open 100B (Reasoning) by Upstage	N/A	N/A	-	65.7%	-	Dec 17, 2025	Chat now
#39 MiMo-V2-Flash (Non-reasoning) by Xiaomi	$0.10	$0.30	74.4%	65.6%	67.7%	Dec 16, 2025	Chat now
#40 MiMo-V2-Flash (Feb 2026) by Xiaomi	$0.10	$0.30	-	83.5%	-	Dec 16, 2025	Chat now
#41 MiMo-V2-Flash (Reasoning) by Xiaomi	$0.10	$0.30	84.3%	84.6%	96.3%	Dec 16, 2025	Chat now
#42 NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) by NVIDIA	$0.06	$0.24	79.4%	75.7%	91.0%	Dec 15, 2025	Chat now
#43 NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) by NVIDIA	$0.05	$0.20	57.9%	39.9%	13.3%	Dec 15, 2025	Chat now
#44 K2 Think V2 by MBZUAI Institute of Foundation Models	N/A	N/A	-	71.3%	-	Dec 15, 2025	Chat now
#45 Olmo 3.1 32B Think by Allen Institute for AI	N/A	N/A	76.3%	59.1%	77.3%	Dec 12, 2025	Chat now
#46 GPT-5.2 (medium) by OpenAI	$1.75	$14.00	85.9%	86.4%	96.7%	Dec 11, 2025	Chat now
#47 GPT-5.2 (Non-reasoning) by OpenAI	$1.75	$14.00	81.4%	71.2%	51.0%	Dec 11, 2025	Chat now
#48 GPT-5.2 (xhigh) by OpenAI	$1.75	$14.00	87.4%	90.3%	99.0%	Dec 11, 2025	Chat now
#49 GPT-5.2 Codex (xhigh) by OpenAI	$1.75	$14.00	-	89.9%	-	Dec 11, 2025	Chat now
#50 Molmo2-8B by Allen Institute for AI	N/A	N/A	-	42.5%	-	Dec 11, 2025	Chat now
#51 Mi:dm K 2.5 Pro by Korea Telecom	N/A	N/A	80.9%	70.1%	76.7%	Dec 11, 2025	Chat now
#52 Mi:dm K 2.5 Pro Preview by Korea Telecom	N/A	N/A	81.3%	72.2%	78.7%	Dec 11, 2025	Chat now
#53 Devstral 2 by Mistral	N/A	N/A	76.2%	59.4%	36.7%	Dec 9, 2025	Chat now
#54 Devstral Small 2 by Mistral	N/A	N/A	67.8%	53.2%	34.3%	Dec 9, 2025	Chat now
#55 GLM-4.6V (Reasoning) by Z AI	$0.30	$0.90	79.9%	71.9%	85.3%	Dec 8, 2025	Chat now
#56 GLM-4.6V (Non-reasoning) by Z AI	$0.30	$0.90	75.2%	56.6%	26.3%	Dec 8, 2025	Chat now
#57 K2-V2 (medium) by MBZUAI Institute of Foundation Models	N/A	N/A	76.1%	59.8%	64.7%	Dec 5, 2025	Chat now
#58 K2-V2 (low) by MBZUAI Institute of Foundation Models	N/A	N/A	71.3%	54.1%	35.3%	Dec 5, 2025	Chat now
#59 K2-V2 (high) by MBZUAI Institute of Foundation Models	N/A	N/A	78.6%	68.1%	78.3%	Dec 5, 2025	Chat now
#60 Motif-2-12.7B-Reasoning by Motif Technologies	N/A	N/A	79.6%	69.5%	80.3%	Dec 4, 2025	Chat now
#61 Ministral 3 14B by Mistral	$0.20	$0.20	69.3%	57.2%	30.0%	Dec 2, 2025	Chat now
#62 Ministral 3 8B by Mistral	$0.15	$0.15	64.2%	47.1%	31.7%	Dec 2, 2025	Chat now
#63 Mistral Large 3 by Mistral	$0.50	$1.50	80.7%	68.0%	38.0%	Dec 2, 2025	Chat now
#64 Ministral 3 3B by Mistral	$0.10	$0.10	52.4%	35.8%	22.0%	Dec 2, 2025	Chat now
#65 DeepSeek V3.2 (Non-reasoning) by DeepSeek	$0.28	$0.42	83.7%	75.1%	59.0%	Dec 1, 2025	Chat now
#66 DeepSeek V3.2 (Reasoning) by DeepSeek	$0.28	$0.42	86.2%	84.0%	92.0%	Dec 1, 2025	Chat now
#67 DeepSeek V3.2 Speciale by DeepSeek	N/A	N/A	86.3%	87.1%	96.7%	Dec 1, 2025	Chat now
#68 Nova 2.0 Pro Preview (medium) by Amazon	$1.25	$10.00	83.0%	78.5%	89.0%	Nov 27, 2025	Chat now
#69 Nova 2.0 Pro Preview (Non-reasoning) by Amazon	$1.25	$10.00	77.2%	63.6%	30.7%	Nov 27, 2025	Chat now
#70 Nova 2.0 Pro Preview (low) by Amazon	$1.25	$10.00	82.2%	75.1%	63.3%	Nov 27, 2025	Chat now
#71 INTELLECT-3 by Prime Intellect	N/A	N/A	82.2%	76.1%	88.0%	Nov 27, 2025	Chat now
#72 Nova 2.0 Omni (Non-reasoning) by Amazon	$0.30	$2.50	71.9%	55.5%	37.0%	Nov 26, 2025	Chat now
#73 Nova 2.0 Omni (medium) by Amazon	$0.30	$2.50	80.9%	76.0%	89.7%	Nov 26, 2025	Chat now
#74 Nova 2.0 Omni (low) by Amazon	$0.30	$2.50	79.8%	69.9%	56.0%	Nov 26, 2025	Chat now
#75 Apriel-v1.6-15B-Thinker by ServiceNow	N/A	N/A	79.0%	73.3%	88.0%	Nov 25, 2025	Chat now
#76 Claude Opus 4.5 (Reasoning) by Anthropic	$5.00	$25.00	89.5%	86.6%	91.3%	Nov 24, 2025	Chat now
#77 Claude Opus 4.5 (Non-reasoning) by Anthropic	$5.00	$25.00	88.9%	81.0%	62.7%	Nov 24, 2025	Chat now
#78 Olmo 3 7B Instruct by Allen Institute for AI	$0.10	$0.20	52.2%	40.0%	41.3%	Nov 20, 2025	Chat now
#79 Olmo 3 7B Think by Allen Institute for AI	$0.12	$0.20	65.5%	51.6%	70.7%	Nov 20, 2025	Chat now
#80 Olmo 3 32B Think by Allen Institute for AI	N/A	N/A	75.9%	61.0%	73.7%	Nov 20, 2025	Chat now
#81 Grok 4.1 Fast (Non-reasoning) by xAI	$0.20	$0.50	74.3%	63.7%	34.3%	Nov 19, 2025	Chat now
#82 Grok 4.1 Fast (Reasoning) by xAI	$0.20	$0.50	85.4%	85.3%	89.3%	Nov 19, 2025	Chat now
#83 Gemini 3 Pro Preview (low) by Google	$2.00	$12.00	89.5%	88.7%	86.7%	Nov 18, 2025	Chat now
#84 Cogito v2.1 (Reasoning) by Deep Cogito	$1.25	$1.25	84.9%	76.8%	72.7%	Nov 18, 2025	Chat now
#85 Gemini 3 Pro Preview (high) by Google	$2.00	$12.00	89.8%	90.8%	95.7%	Nov 18, 2025	Chat now
#86 GPT-5.1 Codex mini (high) by OpenAI	$0.25	$2.00	82.0%	81.3%	91.7%	Nov 13, 2025	Chat now
#87 ERNIE 5.0 Thinking Preview by Baidu	N/A	N/A	83.0%	77.7%	85.0%	Nov 13, 2025	Chat now
#88 GPT-5.1 (high) by OpenAI	$1.25	$10.00	87.0%	87.3%	94.0%	Nov 13, 2025	Chat now
#89 GPT-5.1 (Non-reasoning) by OpenAI	$1.25	$10.00	80.1%	64.3%	38.0%	Nov 13, 2025	Chat now
#90 GPT-5.1 Codex (high) by OpenAI	$1.25	$10.00	86.0%	86.0%	95.7%	Nov 13, 2025	Chat now
#91 KAT-Coder-Pro V1 by KwaiKAT	$0.30	$1.20	81.3%	76.4%	94.7%	Nov 11, 2025	Chat now
#92 Doubao Seed Code by ByteDance Seed	N/A	N/A	85.4%	76.4%	79.3%	Nov 11, 2025	Chat now
#93 Kimi K2 Thinking by Kimi	$0.60	$2.50	84.8%	83.8%	94.7%	Nov 6, 2025	Chat now
#94 Qwen3 Max Thinking (Preview) by Alibaba	$1.20	$6.00	82.4%	77.6%	82.3%	Nov 3, 2025	Chat now
#95 Kimi Linear 48B A3B Instruct by Kimi	N/A	N/A	58.5%	41.2%	36.3%	Oct 30, 2025	Chat now
#96 Nova 2.0 Lite (low) by Amazon	$0.30	$2.50	78.8%	69.8%	46.7%	Oct 29, 2025	Chat now
#97 Nova 2.0 Lite (Non-reasoning) by Amazon	$0.30	$2.50	74.3%	60.3%	33.7%	Oct 29, 2025	Chat now
#98 Nova 2.0 Lite (medium) by Amazon	$0.30	$2.50	81.3%	76.8%	88.7%	Oct 29, 2025	Chat now
#99 NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) by NVIDIA	$0.20	$0.60	64.9%	43.9%	26.7%	Oct 28, 2025	Chat now
#100 NVIDIA Nemotron Nano 12B v2 VL (Reasoning) by NVIDIA	$0.20	$0.60	75.9%	57.2%	75.0%	Oct 28, 2025	Chat now

Showing 100 of 408 models

Made in Europe

Chat with 100+ AI Models in one App.

Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.

Start for free View pricing

Understanding the AI Model Leaderboard

This comprehensive AI model leaderboard helps you compare and choose the best large language models (LLMs) for your needs. We track standardized AI benchmarks, token pricing, inference speed, and model capabilities across all major AI providers like OpenAI, Anthropic, Google, Meta, and DeepSeek.

Core AI Benchmarks Explained

MMLU-Pro: Tests broad knowledge across 14 academic subjects including STEM, humanities, and social sciences - the foundational intelligence benchmark
GPQA: Graduate-level Google-Proof Q&A benchmark - measures PhD-level reasoning and advanced problem-solving capabilities
AIME 2025: American Invitational Mathematics Examination - evaluates elite mathematical reasoning and competition-level problem solving
Coding Index: Composite score of LiveCodeBench, SciCode, and coding benchmarks - measures programming ability
Math Index: Composite score of AIME, MATH-500, and mathematical reasoning tests

Key Metrics to Consider

Token Pricing: Compare input vs output token costs per million - crucial for estimating API expenses and optimizing usage patterns
Inference Speed: Measured in tokens/second - determines response time for chatbots, streaming, and real-time applications
Release Date: Newer models often incorporate latest training techniques and updated knowledge cutoffs
Benchmark Scores: Percentage scores (0-100%) make it easy to compare model capabilities at a glance

How to Choose the Right AI Model for Your Use Case

For Research & Analysis

Prioritize models with high MMLU-Pro (70%+) and GPQA (60%+) scores for complex reasoning tasks, academic research, and technical documentation

For Cost Optimization

Sort by input/output pricing - smaller models often deliver 80% of flagship performance at 10% of the cost for simple tasks

For Math & STEM

Filter by Math Index or AIME 2025 scores (50%+) for quantitative analysis, engineering calculations, and scientific applications

All benchmark scores and pricing data are updated daily from Artificial Analysis to reflect the latest model versions and capabilities. Use the sort filters above to find AI models by intelligence, cost, coding ability, math performance, speed, or release date.

Frequently Asked Questions

What is MMLU-Pro and why is it the standard AI intelligence benchmark?

MMLU-Pro (Massive Multitask Language Understanding - Professional) is the most comprehensive AI benchmark, testing models across 14 academic subjects including mathematics, science, history, law, and ethics. Scores range from 46% (basic competency) to 87% (near-expert level). Models scoring above 75% demonstrate strong general intelligence suitable for professional applications, while scores below 60% indicate limitations in complex reasoning tasks.

What does GPQA measure and which models score highest?

GPQA (Graduate-level Google-Proof Q&A) tests PhD-level reasoning with questions designed to be "Google-proof" - requiring deep understanding rather than simple fact retrieval. Top models like GPT-5.1 (87.3%), GPT-5 mini (82.8%), and o3 (82.7%) excel at GPQA, making them ideal for research, technical analysis, and complex problem-solving. Models below 50% GPQA struggle with advanced reasoning and may provide superficial answers to complex questions.

What is AIME 2025 and how does it evaluate AI mathematical ability?

AIME 2025 (American Invitational Mathematics Examination) is an elite math competition benchmark that tests advanced problem-solving, algebra, geometry, and number theory. Scores above 80% (like GPT-5 Codex at 98.7% or GPT-5.1 at 94%) indicate exceptional mathematical reasoning suitable for engineering, scientific computing, and quantitative analysis. Models scoring below 50% may struggle with multi-step mathematical problems or require explicit problem breakdown.

How is AI model pricing calculated and what's considered cost-effective?

AI model pricing is measured per 1 million tokens (approximately 750,000 words). Input pricing covers text you send, while output pricing covers generated responses. Budget models like Llama 3.3 70B cost $0.54/$0.71 per million tokens, mid-tier models like GPT-5 nano cost $0.05/$0.40, while premium models like GPT-5 cost $1.25/$10. For typical applications with 3:1 input-to-output ratio, budget models can be 10-20x cheaper than flagship models while maintaining 70-80% performance.

Which AI models are best for coding and programming tasks?

Sort by Coding Index to see top programming models. Our Coding Index combines LiveCodeBench, SciCode, and coding benchmarks. Top performers include GPT-5.1 (57.5 index), GPT-5 mini (51.4), and GPT-5 Codex (53.5). These models excel at code generation, debugging, refactoring, and explaining complex algorithms. For budget-conscious developers, models with 40+ coding index scores offer excellent value for routine programming tasks.

How often are AI model benchmarks and rankings updated?

Our leaderboard syncs daily with Artificial Analysis API to ensure benchmark scores (MMLU-Pro, GPQA, AIME 2025), pricing, and inference speed data reflect the latest model versions. New model releases appear immediately under the "Newest" sort option. Benchmark scores can change when providers release updated versions - for example, GPT-5.1 released in November 2025 achieved 69.7 intelligence compared to GPT-5's 68.5 from August 2025.

What inference speed (tokens/second) do I need for my application?

Inference speed determines how fast models generate responses. For real-time chatbots and interactive applications, target 100+ tokens/second (models like gpt-oss-120B at 340 tok/s). For background processing and batch jobs, 50-100 tok/s is sufficient. Premium reasoning models like GPT-5 (103 tok/s) balance speed and capability. Note that higher inference speed doesn't always mean better quality - slower models often deliver more thoughtful, detailed responses.

Can I test these AI models for free before committing?

Yes! Try our free AI chat interface to test different models instantly without creating an account. Many providers also offer free tiers: OpenAI (ChatGPT with daily limits), Anthropic (Claude with usage caps), Google (Gemini free tier), and open-source models like Llama 3.3. Compare performance on your specific use case before upgrading to paid plans.