Can confidence and bbox be included in document metadata (dl_meta)?

First of all, thank you for building such a powerful and well-structured document intelligence toolkit!

I'm currently using Docling to process PDF documents and extract structured content with metadata via the chunker pipeline. I noticed that the dl_meta output already includes useful provenance information like page_no and bbox under the prov field — which is great!

However, I have two related questions:

Bounding Box (bbox):
Is the bbox in prov always guaranteed to be present for text items? And is it possible to get more granular bbox information (e.g., per word or per line) if needed?

Confidence Score (confidence):
Does Docling support exposing a confidence score (e.g., from OCR or layout detection models) in the metadata? This would be extremely helpful for downstream filtering or quality assessment, especially when processing scanned or low-quality documents.

If these fields are not currently exposed but are available internally, would you consider adding them as optional metadata in future versions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can confidence and bbox be included in document metadata (dl_meta)? #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can confidence and bbox be included in document metadata (dl_meta)? #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions