Fix Incomplete CSI Final-Byte And Intermediate-Byte Matching by MRayermannMSFT · Pull Request #66 · chalk/ansi-regex

MRayermannMSFT · 2026-02-15T23:52:52Z

What

Replace the hand-enumerated CSI final-byte character class with the full ECMA-48 range (0x40–0x7E), add support for intermediate bytes (0x20–0x2F), and extract ESC<, ESC=, ESC> into a dedicated pattern so they are no longer conflated with CSI sequences.

Why

The CSI final-byte class [\dA-PR-TZcf-nq-uy=><~] has gaps — it omits valid final bytes like X (ECH), I (CHT), Z (CBT), b (REP), d (VPA), e (VPR), and others. It also incorrectly treats digits as valid final bytes, causing sequences like ESC[31X to be partially matched as ESC[31 (with 1 consumed as the final byte), leaving a stray X in the stripped output. This breaks any downstream code that relies on clean ANSI stripping — for example, Windows ConPTY emits ECH sequences that were being corrupted after stripping.

Using the spec-defined range [@-~] (equivalently \x40–\x7E) eliminates these gaps.

Qix- · 2026-02-17T16:36:38Z

Hi there, this looks interesting. Do you have any sources for these? I'd be curious to read more about it.

MRayermannMSFT · 2026-02-18T00:28:54Z

Sure! Here's what I'm basing my work off of:

ECMA-48 §5.4 - CSI sequence structure

The spec (ECMA-48) defines the CSI control sequence format in §5.4 as:

CSI P...P I...I F

With the byte ranges spelled out explicitly:

P ... P are Parameter Bytes, which, if present, consist of bit combinations from 03/00 to 03/15 [i.e. 0x30–0x3F];

I ... I are Intermediate Bytes, which, if present, consist of bit combinations from 02/00 to 02/15 [i.e. 0x20–0x2F]. Together with the Final Byte F, they identify the control function;

F is the Final Byte; it consists of a bit combination from 04/00 to 07/14 [i.e. 0x40–0x7E]; it terminates the control sequence.

Wikipedia's ANSI escape code — CSI article also summarizes these ranges with the same ECMA-48 §5.4 citation.

Problems with the current regex

The current final-byte character class is:

[\dA-PR-TZcf-nq-uy=><~]

1. Missing final bytes. The regex omits 26 valid final bytes from the 0x40–0x7E range, including commonly-used ones like X (0x58, ECH – Erase Character), @ (0x40, ICH – Insert Character), b (0x62, REP – Repeat), d (0x64, VPA – Vertical Position Absolute), and e (0x65, VPR – Vertical Position Relative).

2. Digits treated as final bytes. The class includes \d (0x30–0x39), but digits are parameter bytes per ECMA-48 §5.4 item 2 - not final bytes. This causes partial matching: ESC[31X is consumed as ESC[31 (treating 1 as the final byte), leaving a stray X in the output.

Real-world impact/what motivated my PR - Windows ConPTY ECH

Windows ConPTY emits ECH sequences (ESC[nX) to erase characters on a line. Microsoft documents this under Console Virtual Terminal Sequences → Text Modification:

Sequence Code Description Behavior

ESC [ <n> X ECH Erase Character Erase <n> characters from the current cursor position by overwriting them with a space character.

After stripping with the current regex, the X leaks through:

Input:  "hello\x1b[5Xworld"
Expect: "helloworld"
Actual: "helloXworld"  ← \x1b[5 matched (digit '5' consumed as final byte), X left behind

Other sequences from the same page that are similarly affected include ICH (ESC[n@), and sequences with intermediate bytes like DECSCUSR (ESC[n SP q, documented under Cursor Shape).

I'm far from an escape code expert, so there's a chance I'm understanding the spec wrong, so happy to make changes to this PR as needed!

sindresorhus · 2026-02-18T13:43:20Z

A few things:

The new pattern strips ESC[>0h and ESC[<0c as only ESC[, not the full sequence. That means valid private-parameter CSI gets partially consumed and leaves control text behind.
The pattern now matches bare ESC[ as a complete escape sequence. So incomplete or truncated CSI gets treated as valid and removed.
There is no test that asserts full matching for private-parameter CSI (ESC[>..., ESC[<...) or prevents partial ESC[ matches.

MRayermannMSFT mentioned this pull request Feb 15, 2026

PowerShell Tool Hangs Indefinitely on Windows 11 Pro github/copilot-cli#1434

Closed

MRayermannMSFT changed the title ~~Fix Incomplete Csi Final-Byte And Intermediate-Byte Matching~~ Fix Incomplete CSI Final-Byte And Intermediate-Byte Matching Feb 15, 2026

MRayermannMSFT force-pushed the bug-incomplete-csi-final-byte-class branch from 673bfeb to 63145fe Compare February 15, 2026 23:55

MRayermannMSFT added 2 commits February 18, 2026 20:44

Fix incomplete CSI final-byte and intermediate-byte matching

cb24c34

Fix CI checks

d52f332

sindresorhus force-pushed the bug-incomplete-csi-final-byte-class branch from 63145fe to d52f332 Compare February 18, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Incomplete CSI Final-Byte And Intermediate-Byte Matching#66

Fix Incomplete CSI Final-Byte And Intermediate-Byte Matching#66
MRayermannMSFT wants to merge 2 commits intochalk:mainfrom
MRayermannMSFT:bug-incomplete-csi-final-byte-class

MRayermannMSFT commented Feb 15, 2026 •

edited

Loading

Uh oh!

Qix- commented Feb 17, 2026

Uh oh!

MRayermannMSFT commented Feb 18, 2026

Uh oh!

sindresorhus commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

MRayermannMSFT commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Uh oh!

Qix- commented Feb 17, 2026

Uh oh!

MRayermannMSFT commented Feb 18, 2026

ECMA-48 §5.4 - CSI sequence structure

Problems with the current regex

Real-world impact/what motivated my PR - Windows ConPTY ECH

Uh oh!

sindresorhus commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MRayermannMSFT commented Feb 15, 2026 •

edited

Loading