Llama 3.3 Nemotron Super 49B V1.5
49Bby NVIDIA
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.
Specifications
Technical details and pricing.
Benchmarks
11 benchmark scores from Artificial Analysis.
Composite Indices
Intelligence, Coding, Math
Standard Benchmarks
Academic and industry benchmarks
Frequently Asked Questions
What is Llama 3.3 Nemotron Super 49B V1.5 good for?
Use Llama 3.3 Nemotron Super 49B V1.5 for everyday tasks like writing, summarizing, brainstorming, and getting clear explanations.
How much does Llama 3.3 Nemotron Super 49B V1.5 cost?
Pricing is based on usage. Current rates are $0.00/1M tokens for input and $0.00/1M tokens for output.
Can I try Llama 3.3 Nemotron Super 49B V1.5 for free?
Yes. You can start a chat instantly and test the model before deciding on a plan.
Does Llama 3.3 Nemotron Super 49B V1.5 support images or audio?
Llama 3.3 Nemotron Super 49B V1.5 focuses on text-based tasks.
Similar Models
Other models you might want to explore.
Benchmarks and pricing are sourced from Artificial Analysis where available. OpenRouter specs are used as a fallback.