Skip to content

Conversation

@kbolashev
Copy link
Member

In Data Engine, if you have a metadata field that is set both as an annotation and a document, loading the annotation field fails.

Example of setting an annotation AND a document (might happen automatically on the backend):

    builder = ds.metadata_field("exported_annotation").set_annotation()
    builder._add_tags({ReservedTags.DOCUMENT.value})
    builder.apply()

This PR fixes it, now if an annotation field also has the document field set, the annotation tag takes precedence, and it's ignored for loading documents.

@kbolashev kbolashev self-assigned this Dec 14, 2025
@kbolashev kbolashev added the bug Something isn't working label Dec 14, 2025
@dagshub
Copy link

dagshub bot commented Dec 14, 2025

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where metadata fields tagged as both annotations and documents would fail to load correctly. The fix ensures that annotation fields take precedence over document fields during data loading by excluding annotation fields from document field processing.

Key changes:

  • Modified query result processing to filter out annotation fields from document field conversion
  • Added test coverage for the scenario where a field has both annotation and document tags

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
tests/data_engine/conftest.py Added PreprocessingStatus import and initialization in mock datasource setup
tests/data_engine/annotation_import/test_annotation_parsing.py Added comprehensive test for annotation fields that also have document tags
tests/data_engine/annotation_import/res/annotation1.json Added test fixture containing segmentation annotation data
dagshub/data_engine/model/query_result.py Fixed bug by filtering annotation fields from document field processing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@simonlsk simonlsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to solve the issue thank you.
I have a suggestion to make the flow more solid, but otherwise LGTM.

# Convert any downloaded document fields
document_fields = [f for f in fields if f in self.document_fields]
# Exclude any annotation fields, because they are already converted above
document_fields = [f for f in document_fields if f not in self.annotation_fields]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make more sense to me for every processed field to either be loaded as document or annotation here in this function, and not called twice from outside and then filter out annotations when it's called for documents.

In the current flow if we were to add a new tag that conflicts with the existing one, you would need to double exclude every field from the ones they don't belong to.

Let me know if this fix is easy to make.

Copy link
Member Author

@kbolashev kbolashev Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would require a refactor, due to the way annotation errors are handled and printed out
Can totally be refactored though, just not a 5 minute refactor

@kbolashev kbolashev merged commit de4e8fa into main Dec 15, 2025
7 of 8 checks passed
@kbolashev kbolashev deleted the bug/annotation-documents branch December 15, 2025 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants