Skip to content

Add data model support for linking comments to their annotated document content (Word/Office comments) #464

@s1v4-d

Description

@s1v4-d

In document formats like Microsoft Word (.docx), comments can be anchored to specific text ranges within the document. The current data model supports storing comments in the ContentLayer.NOTES layer within GroupLabel.COMMENT_SECTION groups, but there is no mechanism to link a comment back to the specific DocItem(s) it annotates.

This linking capability is important for:

  • Round-trip fidelity - reconstructing the original document structure
  • Semantic understanding - knowing which content a comment refers to
  • LLM/RAG applications - providing context about what text a comment is discussing
  • Document analysis - tracing review/collaboration trails

In docling PR #2834 basic Word document comment extraction was implemented. The next step is enabling comment-to-content linking, which requires data model changes in docling-core first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions