-
Notifications
You must be signed in to change notification settings - Fork 147
Description
Summary
After commit dda9c88, table footnotes are no longer present in markdown export. Before this commit, footnotes were serialized as TextItem during the items iteration and appeared in the markdown output. The commit changed footnote handling to delegate serialization to the parent item (e.g., TableItem), but the markdown serializers (like MarkdownTableSerializer) were not updated to serialize footnotes, so they are now missing.
Important: Being a serialization problem, it affects both document exports and chunking.
Steps to Reproduce
Create a document with a table that includes a footnote.
Export the document to markdown.
Observe that the table footnote is missing in the markdown output, even if the textitem were correctly assigned as footnotes to the table.
Example pdf
table_parsing_test.pdf
Code to reproduce:
from docling.document_converter import DocumentConverter
converter = DocumentConverter()
conv_res = converter.convert("table_parsing_test.pdf")
json_export = conv_res.document.export_to_dict()
md_export = conv_res.document.export_to_markdown()
md_export output:
## Structured Document Parsing Test
The following inventory report is designed to test how document parsers handle multi-line annotations that are geographically tied to a table structure.
| Record ID | Asset Category | Unit Count | Status |
|-------------|------------------|--------------|-------------|
| 440-A | Hardware | 142 | Operational |
| 440-B | Software License | 12 | Pending* |
| 441-C | Cloud Instance | 3 | Operational |
| 442-D | Legacy Support | 1 | Archived** |
This final paragraph provides a clear end-of-section marker. Parsers should correctly associate the two footnotes with the table above rather than this trailing body text.
Expected Behavior
Table footnotes should appear in the markdown export, as they did before commit dda9c88.
Actual Behavior
Table footnotes are omitted from the markdown export.
Technical Details
Before dda9c88, footnotes were serialized as TextItem during the main iteration.
After dda9c88, footnotes are delegated to the parent floating item for serialization, but MarkdownTableSerializer does not call the footnote serialization method, so footnotes are not included.
The markdown serializers need to be updated to serialize table footnotes via the appropriate method.
References
Commit dda9c88
MarkdownTableSerializer code
Discussion of the regression and cause
Suggested Fix
Update MarkdownTableSerializer to serialize table footnotes using the appropriate method, similar to how captions are handled.