Llama 3.3 Nemotron Super 49B V1.5

Name: Llama 3.3 Nemotron Super 49B V1.5
Brand: NVIDIA

49B

by NVIDIA

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.

Chat with Llama 3.3 Nemotron Super 49B V1.5

Input Price$0.00/1M tokens

Output Price$0.00/1M tokens

Intelligence14.3

Coding7.6

Specifications

Technical details and pricing.

ProviderNVIDIA

Context Window131,072 tokens

Release DateMar 18, 2025

ModalitiesText

Benchmarks

11 benchmark scores from Artificial Analysis.

GPQA51.7%

MMLU Pro69.8%

HLE3.5%

LiveCodeBench28.0%

MATH 50077.5%

AIME 20257.7%

AIME19.3%

SciCode22.9%

LCR11.3%

IFBench39.5%

TerminalBench Hard0.0%

Composite Indices

Intelligence, Coding, Math

Standard Benchmarks

Academic and industry benchmarks

Frequently Asked Questions

What is Llama 3.3 Nemotron Super 49B V1.5 good for?

Use Llama 3.3 Nemotron Super 49B V1.5 for everyday tasks like writing, summarizing, brainstorming, and getting clear explanations.

How much does Llama 3.3 Nemotron Super 49B V1.5 cost?

Pricing is based on usage. Current rates are $0.00/1M tokens for input and $0.00/1M tokens for output.

Can I try Llama 3.3 Nemotron Super 49B V1.5 for free?

Yes. You can start a chat instantly and test the model before deciding on a plan.

Does Llama 3.3 Nemotron Super 49B V1.5 support images or audio?

Llama 3.3 Nemotron Super 49B V1.5 focuses on text-based tasks.

Similar Models

Other models you might want to explore.

Llama 3.1 Nemotron 70B Instruct

NVIDIA

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses.

Details →

Nemotron 3 Nano 30B A3B (free)

NVIDIA

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.

Details →

Nemotron Nano 12B 2 VL (free)

NVIDIA

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence.

Details →

Benchmarks and pricing are sourced from Artificial Analysis where available. OpenRouter specs are used as a fallback.