fix(deserialization): Fix deserialization of special unicode characters#7
Merged
alesanfra merged 3 commits intoalesanfra:mainfrom Feb 13, 2026
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a Unicode-related deserialization bug where parsing logic used character indices (from .chars().enumerate()) as if they were byte offsets, causing incorrect slicing/positioning when keys/values contain multi-byte characters (e.g., ®). It also extends the integration test corpus to cover the regression.
Changes:
- Update deserialization scanning logic to iterate with
char_indices()so returned positions are valid byte offsets for string slicing. - Add new smoke integration tests covering Unicode in keys, fields, values, and arrays.
- Extend
complex_testfixtures (.toonand.json) with an additional Unicode edge-case value (®).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/deserialization.rs |
Switches several parsers/scanners to byte-safe indices for Unicode correctness. |
tests/integration/test_smoke.py |
Adds direct regression tests for Unicode round-tripping through dumps/loads. |
tests/data/complex_test.toon |
Adds ® to the Unicode edge-cases in TOON fixture. |
tests/data/complex_test.json |
Keeps JSON fixture in sync with the updated TOON fixture. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Owner
|
Excellent work, thanks a lot for this PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Deserialization of special unicode characters (like ®) failed.
Changed the complex_test data to show the error and the fix.
Type of Change
Related Issues
Fixes #8
Testing
Checklist