Add tokenization benchmarks comparing with YAML, XML, JSON

We need to see how much tokens are used in GPTs contexts.

https://github.com/toon-format/toon - see for benchmark reference, do not include TOON itself in benchmark, but you can take similar examples to benchmark on from there, but adapt it to Links Notation style.

We should need to do bechmarks in UTF-8 encoding character count.

Make sure we have benchmark implemented in all our supported languages, but we should use Rust version to Run it CI/CD to automatically generate benchmarks markdown pages on push to main branch. These should be separate workflows, we should update markdown documents only if any changes or benchmark didn't fail using GITHUB_TOKEN (or how the default GitHub Actions token is called).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tokenization benchmarks comparing with YAML, XML, JSON #209

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add tokenization benchmarks comparing with YAML, XML, JSON #209

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions