Feature/faster validation #38

Tommel71 · 2025-12-19T14:35:14Z

No description provided.

soad003

If I get everything correctly for small files and if an !include is in the file, we use the same logic as before, only big files without include use the new fast yaml parser. There is the main downside except it does not support includes, is that it does not support proper duplicate key handling? Most changes are localized around the validation, in the worst case not all validations work as expected, or we might fail to import the data because of some violation on insert (missing fk or something). There are only some critical changes in tagpack.py init_default_values and all_header_fields and init that might have an effect on the imported data, right. And they look pretty much identical to me. But It's hard to judge for me if these change could lead to wrong data imports.

soad003 · 2025-12-19T15:48:59Z

src/graphsenselib/tagpack/__init__.py

+        category=DeprecationWarning,
+    )
+    import ryml as _ryml
+


is it a good idea to just remove these warnings?

soad003 · 2025-12-19T15:52:24Z

Makefile

 VENV := venv
-RELEASE := 'v25.11.10'
-RELEASESEM := 'v2.8.10'
+RELEASE := 'v25.11.11'


is this really a change or is this a artifact of not merging the last release to master?

soad003 · 2025-12-19T16:02:12Z

src/graphsenselib/tagpack/tagpack.py

+            has_include = b"!include" in f.read(4096)
+
+        if header_dir is not None or has_include:
+            YamlIncludeConstructor.add_to_loader_class(


this means as soon as we have a header we fall back to the old impl, right?

soad003 · 2025-12-19T16:06:04Z

src/graphsenselib/tagpack/tagpack.py

+            # Context is already a string, parse once to get tags
+            try:
+                context_tags = json.loads(context).get("tags")
+            except json.JSONDecodeError:


is the speedup really worth the additional code?

also, is the ignoring the json decoder errors the same as before?

soad003 · 2025-12-19T16:09:03Z

src/graphsenselib/tagpack/cli.py

-                    if actor:
+                    # Collect actors from header level and tag level
+                    actors_to_check = set()
+                    header_actor = tagpack.all_header_fields.get("actor")


is there a reason we need more cases? or is this a new feature?

soad003 · 2025-12-19T16:11:34Z

src/graphsenselib/tagpack/__init__.py

+    file_size = os.path.getsize(file_path)
+
+    # Use UniqueKeyLoader for small files (duplicate key detection)
+    if file_size < 100 * 1024:


is there a reason to fallback on small files?

Tommel71 added 3 commits December 17, 2025 14:00

Fix tagpack actor validation on tag level

3494e35

Faster validation

de7c8aa

More performance optimizations

dbb2061

Tommel71 requested a review from soad003 December 19, 2025 14:35

soad003 reviewed Dec 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/faster validation #38

Feature/faster validation #38

Uh oh!

Tommel71 commented Dec 19, 2025

Uh oh!

soad003 left a comment

Uh oh!

soad003 Dec 19, 2025

Uh oh!

soad003 Dec 19, 2025

Uh oh!

soad003 Dec 19, 2025

Uh oh!

soad003 Dec 19, 2025

Uh oh!

soad003 Dec 19, 2025

Uh oh!

soad003 Dec 19, 2025

Uh oh!

soad003 Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature/faster validation #38

Are you sure you want to change the base?

Feature/faster validation #38

Uh oh!

Conversation

Tommel71 commented Dec 19, 2025

Uh oh!

soad003 left a comment

Choose a reason for hiding this comment

Uh oh!

soad003 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

soad003 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

soad003 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

soad003 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

soad003 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

soad003 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

soad003 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants