-
Notifications
You must be signed in to change notification settings - Fork 6
Feature/faster validation #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
soad003
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get everything correctly for small files and if an !include is in the file, we use the same logic as before, only big files without include use the new fast yaml parser. There is the main downside except it does not support includes, is that it does not support proper duplicate key handling? Most changes are localized around the validation, in the worst case not all validations work as expected, or we might fail to import the data because of some violation on insert (missing fk or something). There are only some critical changes in tagpack.py init_default_values and all_header_fields and init that might have an effect on the imported data, right. And they look pretty much identical to me. But It's hard to judge for me if these change could lead to wrong data imports.
| category=DeprecationWarning, | ||
| ) | ||
| import ryml as _ryml | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a good idea to just remove these warnings?
| VENV := venv | ||
| RELEASE := 'v25.11.10' | ||
| RELEASESEM := 'v2.8.10' | ||
| RELEASE := 'v25.11.11' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really a change or is this a artifact of not merging the last release to master?
| has_include = b"!include" in f.read(4096) | ||
|
|
||
| if header_dir is not None or has_include: | ||
| YamlIncludeConstructor.add_to_loader_class( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means as soon as we have a header we fall back to the old impl, right?
| # Context is already a string, parse once to get tags | ||
| try: | ||
| context_tags = json.loads(context).get("tags") | ||
| except json.JSONDecodeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the speedup really worth the additional code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, is the ignoring the json decoder errors the same as before?
| if actor: | ||
| # Collect actors from header level and tag level | ||
| actors_to_check = set() | ||
| header_actor = tagpack.all_header_fields.get("actor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we need more cases? or is this a new feature?
| file_size = os.path.getsize(file_path) | ||
|
|
||
| # Use UniqueKeyLoader for small files (duplicate key detection) | ||
| if file_size < 100 * 1024: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to fallback on small files?
No description provided.