Bug: Don't parse annotations as documents, even if the tag is set #636

kbolashev · 2025-12-14T12:47:59Z

In Data Engine, if you have a metadata field that is set both as an annotation and a document, loading the annotation field fails.

Example of setting an annotation AND a document (might happen automatically on the backend):

    builder = ds.metadata_field("exported_annotation").set_annotation()
    builder._add_tags({ReservedTags.DOCUMENT.value})
    builder.apply()

This PR fixes it, now if an annotation field also has the document field set, the annotation tag takes precedence, and it's ignored for loading documents.

dagshub · 2025-12-14T12:48:02Z

Join the discussion on DagsHub!

Copilot

Pull request overview

This PR fixes a bug where metadata fields tagged as both annotations and documents would fail to load correctly. The fix ensures that annotation fields take precedence over document fields during data loading by excluding annotation fields from document field processing.

Key changes:

Modified query result processing to filter out annotation fields from document field conversion
Added test coverage for the scenario where a field has both annotation and document tags

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`tests/data_engine/conftest.py`	Added `PreprocessingStatus` import and initialization in mock datasource setup
`tests/data_engine/annotation_import/test_annotation_parsing.py`	Added comprehensive test for annotation fields that also have document tags
`tests/data_engine/annotation_import/res/annotation1.json`	Added test fixture containing segmentation annotation data
`dagshub/data_engine/model/query_result.py`	Fixed bug by filtering annotation fields from document field processing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

simonlsk

Seems to solve the issue thank you.
I have a suggestion to make the flow more solid, but otherwise LGTM.

simonlsk · 2025-12-14T14:18:06Z

dagshub/data_engine/model/query_result.py

        # Convert any downloaded document fields
        document_fields = [f for f in fields if f in self.document_fields]
+        # Exclude any annotation fields, because they are already converted above
+        document_fields = [f for f in document_fields if f not in self.annotation_fields]


It would make more sense to me for every processed field to either be loaded as document or annotation here in this function, and not called twice from outside and then filter out annotations when it's called for documents.

In the current flow if we were to add a new tag that conflicts with the existing one, you would need to double exclude every field from the ones they don't belong to.

Let me know if this fix is easy to make.

Would require a refactor, due to the way annotation errors are handled and printed out
Can totally be refactored though, just not a 5 minute refactor

Don't parse annotations as documents, even if the tag is set

4e254d4

kbolashev requested review from Copilot and simonlsk December 14, 2025 12:47

kbolashev self-assigned this Dec 14, 2025

kbolashev added the bug Something isn't working label Dec 14, 2025

Copilot AI reviewed Dec 14, 2025

View reviewed changes

simonlsk approved these changes Dec 14, 2025

View reviewed changes

kbolashev merged commit de4e8fa into main Dec 15, 2025
7 of 8 checks passed

kbolashev deleted the bug/annotation-documents branch December 15, 2025 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Don't parse annotations as documents, even if the tag is set #636

Bug: Don't parse annotations as documents, even if the tag is set #636

Uh oh!

kbolashev commented Dec 14, 2025

Uh oh!

dagshub bot commented Dec 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

simonlsk left a comment

Uh oh!

simonlsk Dec 14, 2025

Uh oh!

kbolashev Dec 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bug: Don't parse annotations as documents, even if the tag is set #636

Bug: Don't parse annotations as documents, even if the tag is set #636

Uh oh!

Conversation

kbolashev commented Dec 14, 2025

Uh oh!

dagshub bot commented Dec 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

simonlsk left a comment

Choose a reason for hiding this comment

Uh oh!

simonlsk Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

kbolashev Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kbolashev Dec 15, 2025 •

edited

Loading