feat(markdown): add picture index to image placeholder#555
Open
nuri-yoo wants to merge 5 commits intodocling-project:mainfrom
Open
feat(markdown): add picture index to image placeholder#555nuri-yoo wants to merge 5 commits intodocling-project:mainfrom
nuri-yoo wants to merge 5 commits intodocling-project:mainfrom
Conversation
Contributor
|
✅ DCO Check Passed Thanks @nuri-yoo, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesThis rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
|
Related Documentation 1 document(s) may need updating based on files changed in this PR: Docling What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?View Suggested Changes@@ -7,7 +7,7 @@
- `do_ocr` (default True): Use OCR
- `force_ocr`: Replace existing text with OCR-generated text
- `ocr_engine`, `ocr_lang`: OCR engine and language options
- - `image_export_mode`: `placeholder`, `embedded`, `referenced`
+ - `image_export_mode`: `placeholder`, `embedded`, `referenced`. When using `placeholder` mode with Markdown export, the default placeholder format is `"<!-- image_{index} -->"`, which renders as sequential placeholders like `<!-- image_0 -->`, `<!-- image_1 -->`, etc. The index corresponds to the picture reference in the JSON export (e.g., `item.self_ref` like `"#/pictures/6"` → `6`). This is backward compatible—custom placeholders without the `{index}` token are unaffected.
- `do_table_structure`, `table_mode`, `table_cell_matching`: Table extraction options (see Table Structure Models section below for details on TableFormer V1 and V2)
- `do_code_enrichment`, `do_formula_enrichment`: Code/formula recognition
- `vlm_pipeline_preset`, `vlm_pipeline_custom_config`, `picture_description_preset`, `picture_description_custom_config`, `code_formula_preset`, `code_formula_custom_config`: New model inference engine and preset options for VLM, picture description, and code/formula extractionNote: You must be authenticated to accept/decline updates. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
added 4 commits
March 20, 2026 20:42
Use `{index}` token in `image_placeholder` to include the picture index
from `item.self_ref`. Default placeholder changes from `<!-- image -->`
to `<!-- image_{index} -->`, producing `<!-- image_0 -->`, etc.
Backward compatible: custom placeholders without `{index}` are unaffected.
Related: docling-project/docling#3078
I, nryoo <nryoo@nryooui-MacBookPro.local>, hereby add my Signed-off-by to this commit: 5de57a5 Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>
Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>
Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>
4d418c3 to
8ee7731
Compare
I, nryoo <nryoo@nryooui-MacBookPro.local>, hereby add my Signed-off-by to this commit: c157073 Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>
Member
|
@nuri-yoo I've seen you've been adding new commits lately. Just please let us know when it's ready for review. |
Author
|
Ready for review. All CI checks are passing now. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add sequential picture indexing to the markdown image placeholder by introducing a
{index}format token inimage_placeholder."<!-- image -->"→"<!-- image_{index} -->"→ renders as<!-- image_0 -->,<!-- image_1 -->, ...item.self_ref(e.g."#/pictures/6"→6), matching JSON export references{index}are unaffected (.replace()is a no-op)Changes
MarkdownParams.image_placeholderdefault updatedMarkdownPictureSerializer._serialize_image_part(): resolve{index}token before emitting placeholderTesting
image_placeholder="<!-- image -->"(no{index}token → no change)Resolves docling-project/docling#3078