-
Notifications
You must be signed in to change notification settings - Fork 529
Add TextDecoder support for x-user-defined encoding (fixes #6039) #6040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add TextDecoder support for x-user-defined encoding (fixes #6039) #6040
Conversation
Implements the x-user-defined decoder per WHATWG Encoding Standard. - Map bytes 0x00–0x7F to identical ASCII code points - Map bytes 0x80–0xFF to Unicode PUA U+F780–U+F7FF - Add dedicated XUserDefinedDecoder with ASCII fast path (no ICU) - Register "x-user-defined" label - Wire through TextDecoder constructor, getImpl(), and decodePtr() - Add unit tests for decoding, streaming, and fatal mode Fixes cloudflare#6039
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
|
Linter and some tests seem to be failing. Can you look into it? |
|
@JosephDoUrden ... to run linting, if you have |
I think only the lint issues are at issue. The test appear to have been a ci glitch. @JosephDoUrden ... the "run internal build" one is one we'll have to run ourselves, just fyi. Thank you for the contribution! |
…flare#6039) Replace manual byte loop with simdutf::validate_ascii() when detecting high bytes in XUserDefinedDecoder::decode. Fix JSG_REQUIRE line break in TextDecoder::constructor to satisfy clang-format.
Summary
Adds support for the x-user-defined encoding to
TextDecoder, as required by the WHATWG Encoding Standard and requested in #6039.Behavior
0xF700 + byte).This gives a simple, reversible single-byte mapping useful for legacy binary-over-string use cases (e.g. when you need an isomorphic byte↔code point mapping;
latin1is not suitable because it is mapped to windows-1252 and is not isomorphic).Implementation
XUserDefinedDecoderinencoding.h/encoding.c++, with an ASCII-only fast path and a slow path for bytes ≥ 0x80."x-user-defined"is registered in the encoding label table and handled in the TextDecoder constructor (no ICU).x-user-definedinallTheDecoders, plus dedicated tests inencoding-test.jsfor decoding, streaming, and fatal mode.Tests
api/tests/encoding-test.js:xUserDefinedDecode,xUserDefinedFatal, andx-user-definedinallTheDecoders.Fixes #6039