Skip to content

Enhancement: Properly handle encoding and decoding of raw non-UTF-8 binary data #31

@lcn2

Description

@lcn2

Is there an existing issue for this?

  • I have searched for existing issues and did not find anything like this

Describe the enhancement

Now all sequences of bytes are valid UTF-8.

The JSON encoder and JSON decoder should handle raw non-UTF-8 binary data in some way.

Relevant images, screenshots or other files

From issue 1056 from mkiocccentry repo GH-issuecomment-2566736763 as reported by @SirWumpus:

elf$ jparse/jstrencode -v 0 -n < jparse/jstrencode | wc
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
wc: <stdin>: invalid byte sequence
...

Relevant links

Anything else?

If raw non-UTF-8 binary data is not properly handled in some way, warnings can be issued by commands that object to invalid UTF-8:

Or worse, programs might hang as a result of invalid UTF-8:

Unless some mechanism, some form of warning/error handling and/or some form of encoding is used to protect code from invalid UTF-8.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions