In document formats like Microsoft Word (.docx), comments can be anchored to specific text ranges within the document. The current data model supports storing comments in the ContentLayer.NOTES layer within GroupLabel.COMMENT_SECTION groups, but there is no mechanism to link a comment back to the specific DocItem(s) it annotates.
This linking capability is important for:
- Round-trip fidelity - reconstructing the original document structure
- Semantic understanding - knowing which content a comment refers to
- LLM/RAG applications - providing context about what text a comment is discussing
- Document analysis - tracing review/collaboration trails
In docling PR #2834 basic Word document comment extraction was implemented. The next step is enabling comment-to-content linking, which requires data model changes in docling-core first.