Qwen2.5 VL 32B Instruct

Name: Qwen2.5 VL 32B Instruct
Brand: Qwen

32B

by Qwen

Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.

Chat with Qwen2.5 VL 32B Instruct

Input Price$0.00/1M tokens

Output Price$0.00/1M tokens

Intelligence13.2

CodingN/A

Specifications

Technical details and pricing.

ProviderQwen

Context Window128,000 tokens

Release DateSep 19, 2024

ModalitiesText, Image → Text

CapabilitiesVision

Benchmarks

7 benchmark scores from Artificial Analysis.

GPQA46.6%

MMLU Pro69.7%

HLE3.8%

LiveCodeBench24.8%

MATH 50080.5%

AIME11.0%

SciCode22.9%

Composite Indices

Intelligence, Coding, Math

Standard Benchmarks

Academic and industry benchmarks

Frequently Asked Questions

What is Qwen2.5 VL 32B Instruct good for?

Use Qwen2.5 VL 32B Instruct for everyday tasks like writing, summarizing, brainstorming, and getting clear explanations.

How much does Qwen2.5 VL 32B Instruct cost?

Pricing is based on usage. Current rates are $0.00/1M tokens for input and $0.00/1M tokens for output.

Can I try Qwen2.5 VL 32B Instruct for free?

Yes. You can start a chat instantly and test the model before deciding on a plan.

Does Qwen2.5 VL 32B Instruct support images or audio?

Qwen2.5 VL 32B Instruct can understand images.

Similar Models

Other models you might want to explore.

Qwen2.5 VL 72B Instruct

Qwen

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects.

Details →

Qwen2.5-VL 7B Instruct

Qwen

Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

Details →

Qwen2.5 Coder 32B Instruct

qwen

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen).

Details →

Benchmarks and pricing are sourced from Artificial Analysis where available. OpenRouter specs are used as a fallback.