Skip to content

feat(markdown): add picture index to image placeholder#555

Open
nuri-yoo wants to merge 5 commits intodocling-project:mainfrom
nuri-yoo:feat/image-placeholder-index
Open

feat(markdown): add picture index to image placeholder#555
nuri-yoo wants to merge 5 commits intodocling-project:mainfrom
nuri-yoo:feat/image-placeholder-index

Conversation

@nuri-yoo
Copy link
Copy Markdown

@nuri-yoo nuri-yoo commented Mar 18, 2026

Summary

Add sequential picture indexing to the markdown image placeholder by introducing a {index} format token in image_placeholder.

  • Default: "<!-- image -->""<!-- image_{index} -->" → renders as <!-- image_0 -->, <!-- image_1 -->, ...
  • Index is extracted from item.self_ref (e.g. "#/pictures/6"6), matching JSON export references
  • Backward compatible: custom placeholders without {index} are unaffected (.replace() is a no-op)

Changes

  • MarkdownParams.image_placeholder default updated
  • MarkdownPictureSerializer._serialize_image_part(): resolve {index} token before emitting placeholder
  • Ground truth test data regenerated

Testing

  • All existing tests pass (369 passed, 0 failed)
  • Verified backward compatibility with explicit image_placeholder="<!-- image -->" (no {index} token → no change)

Resolves docling-project/docling#3078

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 18, 2026

DCO Check Passed

Thanks @nuri-yoo, all your commits are properly signed off. 🎉

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 18, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@dosubot
Copy link
Copy Markdown

dosubot bot commented Mar 18, 2026

Related Documentation

1 document(s) may need updating based on files changed in this PR:

Docling

What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?
View Suggested Changes
@@ -7,7 +7,7 @@
     - `do_ocr` (default True): Use OCR
     - `force_ocr`: Replace existing text with OCR-generated text
     - `ocr_engine`, `ocr_lang`: OCR engine and language options
-    - `image_export_mode`: `placeholder`, `embedded`, `referenced`
+    - `image_export_mode`: `placeholder`, `embedded`, `referenced`. When using `placeholder` mode with Markdown export, the default placeholder format is `"<!-- image_{index} -->"`, which renders as sequential placeholders like `<!-- image_0 -->`, `<!-- image_1 -->`, etc. The index corresponds to the picture reference in the JSON export (e.g., `item.self_ref` like `"#/pictures/6"` → `6`). This is backward compatible—custom placeholders without the `{index}` token are unaffected.
     - `do_table_structure`, `table_mode`, `table_cell_matching`: Table extraction options (see Table Structure Models section below for details on TableFormer V1 and V2)
     - `do_code_enrichment`, `do_formula_enrichment`: Code/formula recognition
     - `vlm_pipeline_preset`, `vlm_pipeline_custom_config`, `picture_description_preset`, `picture_description_custom_config`, `code_formula_preset`, `code_formula_custom_config`: New model inference engine and preset options for VLM, picture description, and code/formula extraction

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@ceberam ceberam self-requested a review March 19, 2026 13:31
nryoo added 4 commits March 20, 2026 20:42
Use `{index}` token in `image_placeholder` to include the picture index
from `item.self_ref`. Default placeholder changes from `<!-- image -->`
to `<!-- image_{index} -->`, producing `<!-- image_0 -->`, etc.

Backward compatible: custom placeholders without `{index}` are unaffected.

Related: docling-project/docling#3078
I, nryoo <nryoo@nryooui-MacBookPro.local>, hereby add my Signed-off-by to this commit: 5de57a5

Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>
Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>
Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>
@nuri-yoo nuri-yoo force-pushed the feat/image-placeholder-index branch from 4d418c3 to 8ee7731 Compare March 20, 2026 11:43
I, nryoo <nryoo@nryooui-MacBookPro.local>, hereby add my Signed-off-by to this commit: c157073

Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>
@ceberam
Copy link
Copy Markdown
Member

ceberam commented Mar 20, 2026

@nuri-yoo I've seen you've been adding new commits lately. Just please let us know when it's ready for review.

@nuri-yoo
Copy link
Copy Markdown
Author

Ready for review. All CI checks are passing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

image/figure count in md an txt

2 participants