Fix Incomplete CSI Final-Byte And Intermediate-Byte Matching#66
Fix Incomplete CSI Final-Byte And Intermediate-Byte Matching#66MRayermannMSFT wants to merge 2 commits intochalk:mainfrom
Conversation
673bfeb to
63145fe
Compare
|
Hi there, this looks interesting. Do you have any sources for these? I'd be curious to read more about it. |
|
Sure! Here's what I'm basing my work off of: ECMA-48 §5.4 - CSI sequence structureThe spec (ECMA-48) defines the CSI control sequence format in §5.4 as: With the byte ranges spelled out explicitly:
Wikipedia's ANSI escape code — CSI article also summarizes these ranges with the same ECMA-48 §5.4 citation. Problems with the current regexThe current final-byte character class is: 1. Missing final bytes. The regex omits 26 valid final bytes from the 2. Digits treated as final bytes. The class includes Real-world impact/what motivated my PR - Windows ConPTY ECHWindows ConPTY emits ECH sequences (
After stripping with the current regex, the Other sequences from the same page that are similarly affected include ICH ( I'm far from an escape code expert, so there's a chance I'm understanding the spec wrong, so happy to make changes to this PR as needed! |
|
A few things:
|
63145fe to
d52f332
Compare
What
Replace the hand-enumerated CSI final-byte character class with the full ECMA-48 range (
0x40–0x7E), add support for intermediate bytes (0x20–0x2F), and extractESC<,ESC=,ESC>into a dedicated pattern so they are no longer conflated with CSI sequences.Why
The CSI final-byte class
[\dA-PR-TZcf-nq-uy=><~]has gaps — it omits valid final bytes likeX(ECH),I(CHT),Z(CBT),b(REP),d(VPA),e(VPR), and others. It also incorrectly treats digits as valid final bytes, causing sequences likeESC[31Xto be partially matched asESC[31(with1consumed as the final byte), leaving a strayXin the stripped output. This breaks any downstream code that relies on clean ANSI stripping — for example, Windows ConPTY emits ECH sequences that were being corrupted after stripping.Using the spec-defined range
[@-~](equivalently\x40–\x7E) eliminates these gaps.