Add support for `alphabet` in `{"type": "string"}` by Liam-DeVoe · Pull Request #72 · hegeldev/hegel-core

Liam-DeVoe · 2026-03-28T03:12:57Z

Closes #44.

Here's a problem I ran into: cbor uses UTF-8 to encode strings. UTF-8 disallows surrogate code points. We want to be able to generate surrogate code points, because some languages use UTF-16 or raw bytes as their string representation. Therefore the protocol must be able to transport surrogate code points.

I chose to change the representation of all strings in the protocol to a new tag 6, which represents it as WTF-8, which is byte-for-byte equivalent to UTF-8 except it relaxes the UTF-8 well-formed requirement that surrogate code points not appear. Tags 6-15 are reserved for local assignment in the cbor spec.

Every client library will need to understand this and implement a decoder for tag 6. I considered only encoding as tag 6 when a surrogate is present, but I think unifying the representation and forcing libraries to contend with this early is the right choice.

Liam-DeVoe added 9 commits March 27, 2026 20:26

add support for alphabet in type: "string"

8a8a958

format

6eeecc3

fix typing

f88085f

fix decoding and tests

6d62a63

derive length metric from codepoints

cf4d18a

use uv in ci so we respect lockfile

3337026

update black

9d4f4c6

bump protocol version

a0c65b4

mention protocol bump in release notes

188001e

Liam-DeVoe mentioned this pull request Mar 28, 2026

Add text() options and characters() generator hegeldev/hegel-rust#147

Open

format

495d728

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `alphabet` in `{"type": "string"}`#72

Add support for `alphabet` in `{"type": "string"}`#72
Liam-DeVoe wants to merge 10 commits intomainfrom
str-alphabet

Liam-DeVoe commented Mar 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Liam-DeVoe commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Liam-DeVoe commented Mar 28, 2026 •

edited

Loading