Skip to content

feat(document): add artifacts_dir param to export_to_markdown#585

Open
Smeet23 wants to merge 2 commits into
docling-project:mainfrom
Smeet23:feat/export-to-markdown-artifacts-dir
Open

feat(document): add artifacts_dir param to export_to_markdown#585
Smeet23 wants to merge 2 commits into
docling-project:mainfrom
Smeet23:feat/export-to-markdown-artifacts-dir

Conversation

@Smeet23
Copy link
Copy Markdown
Contributor

@Smeet23 Smeet23 commented Apr 10, 2026

Summary

  • Adds an optional artifacts_dir: Optional[Path] parameter to DoclingDocument.export_to_markdown()
  • When image_mode=ImageRefMode.REFERENCED and artifacts_dir is provided, images are automatically saved to that directory and the returned markdown contains relative paths referencing them
  • When artifacts_dir is None (default) or image_mode is not REFERENCED, behaviour is unchanged — fully backwards-compatible

Before:

# Manual image saving required before calling export_to_markdown
for item, _ in doc.iterate_items():
    if isinstance(item, PictureItem):
        img = item.get_image(doc=doc)
        img.save(f"./images/image_{i}.png")
        item.image.uri = Path(f"./images/image_{i}.png")

md = doc.export_to_markdown(image_mode=ImageRefMode.REFERENCED)

After:

md = doc.export_to_markdown(
    image_mode=ImageRefMode.REFERENCED,
    artifacts_dir=Path("./images"),
)

Relation to upstream issue

Resolves docling-project/docling#3094

Test plan

  • test_export_to_markdown_with_artifacts_dir — verifies images are saved and referenced in markdown
  • test_export_to_markdown_referenced_without_artifacts_dir — verifies fallback when artifacts_dir is omitted
  • test_export_to_markdown_artifacts_dir_ignored_for_non_referenced — verifies no side-effects for PLACEHOLDER/EMBEDDED modes
  • Existing test_save_to_disk, test_construct_doc, test_save_pictures all still pass

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 10, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

Waiting for

  • #approved-reviews-by >= 2
This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@dosubot
Copy link
Copy Markdown

dosubot Bot commented Apr 10, 2026

Related Documentation

1 document(s) may need updating based on files changed in this PR:

Docling

What are the differences between vlm_pipeline_model_local and picture_description_local in Docling, and how do image descriptions, OCR, and table extraction work together? Also, how do the include_annotations and mark_annotations properties affect exported output?
View Suggested Changes
@@ -93,6 +93,7 @@
 - `mark_annotations` (bool): Whether to mark annotations with special formatting in the export (default: `False`)
 - `compact_tables` (bool): Whether to use compact table format without column padding (default: `False`, Markdown only)
 - `traverse_pictures` (bool): Whether to traverse into picture items and serialize their text children (default: `False`)
+- `artifacts_dir` (Optional[Path]): Directory where images are saved when `image_mode=ImageRefMode.REFERENCED`. The markdown output will contain relative paths pointing into this directory. When `None` (default) or when `image_mode` is not `REFERENCED`, this parameter is ignored (default: `None`, Markdown only)
 
 **Handling OCR Text in Scanned/Image-Based PDFs:**
 When processing scanned or image-based PDFs with `force_full_page_ocr=True`, the layout model classifies full-page scans as `PictureItem` nodes. OCR text items are added as children of that picture node in the document tree.

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 10, 2026

DCO Check Passed

Thanks @Smeet23, all your commits are properly signed off. 🎉

When image_mode=ImageRefMode.REFERENCED, users previously had to
manually iterate over pictures and save them to disk before calling
export_to_markdown(). This adds an optional artifacts_dir parameter
that, when provided, automatically saves images to that directory and
returns markdown with relative paths referencing the saved files.

Resolves docling-project/docling#3094

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Smeet Agrawal <smeetagrawal23@gmail.com>
@Smeet23 Smeet23 force-pushed the feat/export-to-markdown-artifacts-dir branch from 9404ba1 to 87ff805 Compare April 10, 2026 16:22
@Smeet23
Copy link
Copy Markdown
Contributor Author

Smeet23 commented Apr 14, 2026

Hi @PeterStaar-IBM and @dolfim-ibm — could you take a look at this PR when you get a chance? Mergify requires 2 approvals since test data was updated. DCO and all other checks are passing. Would really appreciate a review!

PeterStaar-IBM
PeterStaar-IBM previously approved these changes Apr 14, 2026
Signed-off-by: Smeet23 <smeetagrawal2003@gmail.com>
@Smeet23
Copy link
Copy Markdown
Contributor Author

Smeet23 commented Apr 15, 2026

Hi @PeterStaar-IBM — thanks for the approval! I've pushed a small follow-up fix: ruff reformatted one line in document.py that was too long — that was the only CI failure. DCO ✅ is passing. Could you re-approve the new commit when you get a chance so Mergify can proceed? Happy to address any other feedback. Thanks!

@Smeet23
Copy link
Copy Markdown
Contributor Author

Smeet23 commented Apr 17, 2026

Hi @PeterStaar-IBM and @dolfim-ibm — just a gentle ping on this one. DCO ✅ is passing and all checks are green. Would appreciate a re-approval so Mergify can proceed. Happy to address any feedback. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add image_dir parameter to export_to_markdown for automatic image saving

3 participants