-
Notifications
You must be signed in to change notification settings - Fork 155
fix: updated caption references in add_node_items #514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
81ac541
51cce6c
968887a
25a78c3
92c244c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4562,6 +4562,30 @@ def _append_item_copies( | |
| for item in node_items: | ||
| item_copy = item.model_copy(deep=True) | ||
|
|
||
| # handle DocItem pointers (comments) | ||
| if isinstance(item, DocItem): | ||
| if item.comments: | ||
| if isinstance(item_copy, DocItem): | ||
| item_copy.comments = self._copy_and_reindex_refs(item.comments, doc=doc, parent_ref=parent_ref) | ||
|
|
||
| # handling new references for floating items | ||
| if isinstance(item, FloatingItem): | ||
| if item.captions: | ||
| if isinstance(item_copy, FloatingItem): | ||
| item_copy.captions = self._copy_and_reindex_refs(item.captions, doc=doc, parent_ref=parent_ref) | ||
|
|
||
| if item.footnotes: | ||
| if isinstance(item_copy, FloatingItem): | ||
| item_copy.footnotes = self._copy_and_reindex_refs( | ||
| item.footnotes, doc=doc, parent_ref=parent_ref | ||
| ) | ||
|
|
||
| if item.references: | ||
| if isinstance(item_copy, FloatingItem): | ||
| item_copy.references = self._copy_and_reindex_refs( | ||
| item.references, doc=doc, parent_ref=parent_ref | ||
| ) | ||
|
|
||
| self._append_item(item=item_copy, parent_ref=parent_ref) | ||
|
|
||
| if item_copy.children: | ||
|
|
@@ -4578,6 +4602,31 @@ def _append_item_copies( | |
|
|
||
| return new_refs | ||
|
|
||
| def _copy_and_reindex_refs(self, ref_list: list[Any], doc: "DoclingDocument", parent_ref: RefItem) -> list[Any]: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fine grain comment: can we be more precise with the type hints? Would
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ceberam Just a gentle ping on this when you have a moment. If list[Any] is a bit too loose for the project's typing standards, I can implement a generic TypeVar (e.g., T_Ref = TypeVar("T_Ref", bound="RefItem") alongside Sequence[T_Ref]). This approach should satisfy MyPy's strict list invariance rules while keeping the exact typing intact for both FineRef and RefItem outputs. Let me know if you'd like me to push that update, or if you're comfortable moving forward with it as-is! |
||
| """Helper to copy referenced items and return their new indices | ||
|
|
||
| :param ref_list: list[Any]: The list of references (e.g., captions, footnotes, comments) to be copied | ||
| :param doc: "DoclingDocument": The document from which the NodeItems are taken | ||
| :param parent_ref: RefItem: The reference of the parent item in the current document where copies will be appended to | ||
|
|
||
| :returns: list[Any]: A new list of references pointing to the newly appended items in the current document | ||
|
Comment on lines
+4606
to
+4612
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even though we have not been consistent in the past, we want to stick to the google docstring conventions at least on new code, as we specify it on pyproject.toml |
||
| """ | ||
| if not ref_list: | ||
| return [] | ||
|
|
||
| new_refs = [] | ||
| for ref in ref_list: | ||
| resolved_item = ref.resolve(doc) | ||
| if resolved_item: | ||
| ref_copy = resolved_item.model_copy(deep=True) | ||
| self._append_item(item=ref_copy, parent_ref=parent_ref) | ||
|
|
||
| new_ref_pointer = ref.model_copy(deep=True) | ||
| new_ref_pointer.cref = ref_copy.get_ref().cref | ||
|
|
||
| new_refs.append(new_ref_pointer) | ||
| return new_refs | ||
|
|
||
| def num_pages(self): | ||
| """num_pages.""" | ||
| return len(self.pages.values()) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability, can we keep it more compact? (this one and the other similar patterns)