Document Layout Analysis is used to identify and extract the layout structure of documents, including text blocks, images, tables, and other elements.
DocLayout-YOLO is a real-time and robust layout detection model for diverse documents, based on YOLO-v10. This model is enriched with diversified document pre-training and structural optimization tailored for layout detection.
- Import your image (
Ctrl+I) or video (Ctrl+O) file into the X-AnyLabeling. - Select and load the DocLayout-YOLO model.
- Initiate the process by clicking
Run (i). Once you've verified that everything is set up correctly, use the keyboard shortcutCtrl+Mto process all images in one go.
PP-DocLayoutV3 is specifically engineered to handle non-planar document images. It can directly predict multi-point bounding boxes for layout elements—as opposed to standard two-point boxes—and determine logical reading orders for skewed and curved surfaces within a single forward pass, significantly reducing cascading errors.
This model is an essential component of PaddleOCR-VL-1.5, providing crucial layout analysis for the high-precision parsing of various real-world documents.
PP-DocLayoutV3 supports 25 layout element categories:
| Category | Labels |
|---|---|
| Document Structure | doc_title, paragraph_title, header, footer, content, reference, reference_content |
| Text Elements | text, vertical_text, aside_text, abstract, footnote, vision_footnote |
| Visual Elements | image, chart, figure_title, header_image, footer_image, seal |
| Math & Formulas | inline_formula, display_formula, formula_number, algorithm |
| Tables | table |
| Other | number |
You'll need to get X-AnyLabeling-Server up and running first. Check out the installation guide for the details. Make sure you're running at least v0.0.7 of the server and v3.3.9 of the X-AnyLabeling client, otherwise you might run into compatibility issues.
Note
PP-DocLayoutV3 requires the latest transformers development branch. Install it with:
pip install --upgrade git+https://github.com/huggingface/transformers.gitFor more details, see the official model page.
Once that's done, head over to configs/models.yaml and enable pp_doclayout_v3. There's an example config you can reference if you're not sure how to set it up.
You can tweak the settings in pp_doclayout_v3.yaml to fit your needs.
Launch the X-AnyLabeling client, press Ctrl+A or click the AI button in the left menu bar to open the auto-labeling panel. In the model dropdown list, select Remote-Server, then choose PP-DocLayoutV3.
