PaddleOCR-VL-1.5 Example

PaddleOCR-VL-1.5.mp4

Introduction

PaddleOCR-VL-1.5 is a unified Vision-Language OCR model that supports multiple document understanding tasks through a single model architecture. Built upon powerful vision-language foundations, it can handle diverse OCR scenarios including text recognition, table extraction, formula recognition, chart understanding, seal recognition, and text spotting with bounding boxes.

Here, we will show you how to use PaddleOCR-VL-1.5 on X-AnyLabeling to perform various OCR and document understanding tasks.

Let's get started!

Supported Tasks

PaddleOCR-VL-1.5 supports six distinct tasks:

Task	Description	Output
OCR	Optical Character Recognition for text extraction	Text content
Table Recognition	Extract table structure and content	HTML/Markdown table
Formula Recognition	Recognize mathematical formulas	LaTeX format
Chart Recognition	Extract information from charts and graphs	Structured data
Text Spotting	Detect and recognize text with bounding boxes	Polygon shapes with text
Seal Recognition	Recognize seal stamps and chop marks	Text content

PP-DocLayoutV3 Label Routing

When the PPOCR panel uses PP-DocLayoutV3 for layout detection and then sends each cropped block to PaddleOCR-VL-1.5, labels are routed as follows:

Routed Task	PP-DocLayoutV3 Labels
OCR	`doc_title`, `paragraph_title`, `header`, `footer`, `content`, `reference`, `reference_content`, `text`, `vertical_text`, `aside_text`, `abstract`, `footnote`, `vision_footnote`, `figure_title`, `number`
Table Recognition	`table`
Formula Recognition	`inline_formula`, `display_formula`, `formula_number`, `algorithm`
Chart Recognition	`chart`
Seal Recognition	`seal`
Image Only (no text task)	`image`, `header_image`, `footer_image`

Note

PP-DocLayoutV3 officially provides 25 labels. X-AnyLabeling also routes an additional compatibility label formula to Formula Recognition when it appears in layout output.

Installation

You'll need to get X-AnyLabeling-Server up and running first. Check out the installation guide for the details. Make sure you're running at least v0.0.7 of the server and v3.3.9 of the X-AnyLabeling client, otherwise you might run into compatibility issues.

Important

PaddleOCR-VL-1.5 requires transformers>=5.0.0. Install it with:

python -m pip install "transformers>=5.0.0"

For more details, see the official model page.

Tip

We highly recommend installing flash-attn to boost performance and reduce memory usage:

pip install flash-attn --no-build-isolation

Once that's done, head over to configs/models.yaml and enable paddleocr_vl_1_5. There's an example config you can reference if you're not sure how to set it up.

You can tweak the settings in paddleocr_vl_1_5.yaml to fit your needs.

Configuration Parameters

Parameter	Default	Description
`model_path`	`PaddlePaddle/PaddleOCR-VL-1.5`	HuggingFace model path
`device`	`cuda:0`	Device for inference
`torch_dtype`	`bfloat16`	Model precision
`max_new_tokens`	`512`	Maximum tokens for generation
`max_pixels`	`1605632`	Maximum pixels for text tasks (1280×28×28)
`spotting_max_pixels`	`1605632`	Maximum pixels for spotting task (2048×28×28)
`spotting_upscale_threshold`	`1500`	Threshold for image upscaling in spotting

Note

If inference times out, try adjusting max_new_tokens, max_pixels, spotting_max_pixels, and spotting_upscale_threshold based on your GPU memory.

Getting Started

Launch the X-AnyLabeling client, press Ctrl+A or click the AI button in the left menu bar to open the auto-labeling panel. In the model dropdown list, select Remote-Server, then choose PaddleOCR-VL-1.5.

OCR (Text Recognition)

The OCR task extracts text content from images.

Usage:

Select the "OCR" task from the task dropdown
Click the "Run" button to extract text
The recognized text will be displayed in the description field

Table Recognition

The Table Recognition task extracts table structure and content from document images.

Usage:

Select the "Table Recognition" task from the task dropdown
Click the "Run" button to extract table content
The result will be formatted as HTML/Markdown table structure

Formula Recognition

The Formula Recognition task recognizes mathematical formulas and converts them to LaTeX format.

Usage:

Select the "Formula Recognition" task from the task dropdown
Click the "Run" button to recognize formulas
The result will be in LaTeX format

Chart Recognition

The Chart Recognition task extracts information from charts and graphs.

Usage:

Select the "Chart Recognition" task from the task dropdown
Click the "Run" button to analyze the chart
The extracted data will be displayed in structured format

Text Spotting

The Text Spotting task detects text regions and recognizes their content with polygon bounding boxes.

Usage:

Select the "Text Spotting" task from the task dropdown
Click the "Run" button to detect and recognize text
Polygon shapes with recognized text will be created on the canvas

Tip

For small images (width and height both less than 1500 pixels), the model automatically upscales the image by 2x for better detection accuracy. You can adjust this threshold via spotting_upscale_threshold.

Seal Recognition

The Seal Recognition task recognizes text from seal stamps and chop marks.

Usage:

Select the "Seal Recognition" task from the task dropdown
Click the "Run" button to recognize seal text
The recognized text will be displayed in the description field

Tip

All tasks support batch processing. You can run inference on the entire dataset with a single click using Ctrl+M or the batch processing feature in X-AnyLabeling.

Related Models

PP-DocLayoutV3: Document layout analysis model that works well with PaddleOCR-VL-1.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PaddleOCR-VL-1.5 Example

Introduction

Supported Tasks

PP-DocLayoutV3 Label Routing

Installation

Configuration Parameters

Getting Started

OCR (Text Recognition)

Table Recognition

Formula Recognition

Chart Recognition

Text Spotting

Seal Recognition

Related Models

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

PaddleOCR-VL-1.5 Example

Introduction

Supported Tasks

PP-DocLayoutV3 Label Routing

Installation

Configuration Parameters

Getting Started

OCR (Text Recognition)

Table Recognition

Formula Recognition

Chart Recognition

Text Spotting

Seal Recognition

Related Models