Skip to content

Conversation

@Tommel71
Copy link
Member

No description provided.

@Tommel71 Tommel71 requested a review from soad003 December 19, 2025 14:35
Copy link
Member

@soad003 soad003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get everything correctly for small files and if an !include is in the file, we use the same logic as before, only big files without include use the new fast yaml parser. There is the main downside except it does not support includes, is that it does not support proper duplicate key handling? Most changes are localized around the validation, in the worst case not all validations work as expected, or we might fail to import the data because of some violation on insert (missing fk or something). There are only some critical changes in tagpack.py init_default_values and all_header_fields and init that might have an effect on the imported data, right. And they look pretty much identical to me. But It's hard to judge for me if these change could lead to wrong data imports.

category=DeprecationWarning,
)
import ryml as _ryml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a good idea to just remove these warnings?

VENV := venv
RELEASE := 'v25.11.10'
RELEASESEM := 'v2.8.10'
RELEASE := 'v25.11.11'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really a change or is this a artifact of not merging the last release to master?

has_include = b"!include" in f.read(4096)

if header_dir is not None or has_include:
YamlIncludeConstructor.add_to_loader_class(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means as soon as we have a header we fall back to the old impl, right?

# Context is already a string, parse once to get tags
try:
context_tags = json.loads(context).get("tags")
except json.JSONDecodeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the speedup really worth the additional code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, is the ignoring the json decoder errors the same as before?

if actor:
# Collect actors from header level and tag level
actors_to_check = set()
header_actor = tagpack.all_header_fields.get("actor")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we need more cases? or is this a new feature?

file_size = os.path.getsize(file_path)

# Use UniqueKeyLoader for small files (duplicate key detection)
if file_size < 100 * 1024:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to fallback on small files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants