NVIDIA Models
NVIDIA logo

Llama 3.3 Nemotron Super 49B V1.5

49B

by NVIDIA

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.

Chat with Llama 3.3 Nemotron Super 49B V1.5
Input Price$0.00/1M tokens
Output Price$0.00/1M tokens
Intelligence14.3
Coding7.6

Specifications

Technical details and pricing.

ProviderNVIDIA
Context Window131,072 tokens
Release DateMar 18, 2025
ModalitiesText

Benchmarks

11 benchmark scores from Artificial Analysis.

GPQA51.7%
MMLU Pro69.8%
HLE3.5%
LiveCodeBench28.0%
MATH 50077.5%
AIME 20257.7%
AIME19.3%
SciCode22.9%
LCR11.3%
IFBench39.5%
TerminalBench Hard0.0%

Composite Indices

Intelligence, Coding, Math

Standard Benchmarks

Academic and industry benchmarks

Frequently Asked Questions

What is Llama 3.3 Nemotron Super 49B V1.5 good for?

Use Llama 3.3 Nemotron Super 49B V1.5 for everyday tasks like writing, summarizing, brainstorming, and getting clear explanations.

How much does Llama 3.3 Nemotron Super 49B V1.5 cost?

Pricing is based on usage. Current rates are $0.00/1M tokens for input and $0.00/1M tokens for output.

Can I try Llama 3.3 Nemotron Super 49B V1.5 for free?

Yes. You can start a chat instantly and test the model before deciding on a plan.

Does Llama 3.3 Nemotron Super 49B V1.5 support images or audio?

Llama 3.3 Nemotron Super 49B V1.5 focuses on text-based tasks.

Benchmarks and pricing are sourced from Artificial Analysis where available. OpenRouter specs are used as a fallback.