All Models
PaddlePaddle logo

PaddleOCR-VL-0.9B

0.9B

by PaddlePaddle

Baidu's 0.9B vision-language OCR model combining a NaViT-style dynamic-resolution encoder with ERNIE-4.5-0.3B. Handles multilingual text, tables, charts, and formulas across 16K context — optimized for efficient on-device document parsing.

Context Window16,384 tokens
Parameters0.9B
LicenseApache 2.0
ModalitiesText, Image

Specifications

Technical details and pricing.

ProviderPaddlePaddle
Context Window16,384 tokens
Release DateOct 1, 2025
ModalitiesText, Image → Text
CapabilitiesOCR, Document Parsing, Vision
LicenseApache 2.0

Frequently Asked Questions

What is PaddleOCR-VL-0.9B?

Baidu's 0.9B vision-language OCR model combining a NaViT-style dynamic-resolution encoder with ERNIE-4.5-0.3B. Handles multilingual text, tables, charts, and formulas across 16K context — optimized for efficient on-device document parsing.

What input formats does PaddleOCR-VL-0.9B support?

PaddleOCR-VL-0.9B accepts text, image as input and produces text output.

What is the context length of PaddleOCR-VL-0.9B?

PaddleOCR-VL-0.9B supports up to 16,384 tokens of context.

Is PaddleOCR-VL-0.9B open source?

PaddleOCR-VL-0.9B is available under the Apache 2.0 license.

Specifications are based on publicly available model documentation.