fix(deserialization): Fix deserialization of special unicode characters by medvekoma · Pull Request #7 · alesanfra/toons

medvekoma · 2026-02-13T14:52:21Z

Description

Deserialization of special unicode characters (like ®) failed.
Changed the complex_test data to show the error and the fix.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Related Issues

Fixes #8

Testing

Tests pass locally
New tests added for new functionality
Documentation updated

Checklist

Copilot

Pull request overview

This PR fixes a Unicode-related deserialization bug where parsing logic used character indices (from .chars().enumerate()) as if they were byte offsets, causing incorrect slicing/positioning when keys/values contain multi-byte characters (e.g., ®). It also extends the integration test corpus to cover the regression.

Changes:

Update deserialization scanning logic to iterate with char_indices() so returned positions are valid byte offsets for string slicing.
Add new smoke integration tests covering Unicode in keys, fields, values, and arrays.
Extend complex_test fixtures (.toon and .json) with an additional Unicode edge-case value (®).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`src/deserialization.rs`	Switches several parsers/scanners to byte-safe indices for Unicode correctness.
`tests/integration/test_smoke.py`	Adds direct regression tests for Unicode round-tripping through `dumps`/`loads`.
`tests/data/complex_test.toon`	Adds `®` to the Unicode edge-cases in TOON fixture.
`tests/data/complex_test.json`	Keeps JSON fixture in sync with the updated TOON fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/deserialization.rs

alesanfra · 2026-02-13T21:11:45Z

Excellent work, thanks a lot for this PR

medvekoma added 2 commits February 13, 2026 14:58

Fix issue

0be75b4

Fix issue in deserialization

368407b

medvekoma marked this pull request as draft February 13, 2026 15:32

Adding further fixes and unit tests

31b399a

medvekoma marked this pull request as ready for review February 13, 2026 16:32

alesanfra requested a review from Copilot February 13, 2026 20:39

Copilot started reviewing on behalf of alesanfra February 13, 2026 20:39 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

src/deserialization.rs Show resolved Hide resolved

alesanfra merged commit d3d2026 into alesanfra:main Feb 13, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deserialization): Fix deserialization of special unicode characters#7

fix(deserialization): Fix deserialization of special unicode characters#7
alesanfra merged 3 commits intoalesanfra:mainfrom
medvekoma:fix/unicode-deserialization

medvekoma commented Feb 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

alesanfra commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

medvekoma commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Testing

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

alesanfra commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

medvekoma commented Feb 13, 2026 •

edited

Loading