Skip to content

fix(security): prevent HTML entity-encoded comment injection bypass in sanitizer#1035

Open
MaxwellCalkin wants to merge 1 commit intoanthropics:mainfrom
MaxwellCalkin:fix/sanitizer-entity-bypass
Open

fix(security): prevent HTML entity-encoded comment injection bypass in sanitizer#1035
MaxwellCalkin wants to merge 1 commit intoanthropics:mainfrom
MaxwellCalkin:fix/sanitizer-entity-bypass

Conversation

@MaxwellCalkin
Copy link

Summary

The sanitizeContent function in src/github/utils/sanitizer.ts has an ordering vulnerability that allows HTML comment injection through entity encoding.

Vulnerability

stripHtmlComments runs before normalizeHtmlEntities, so an attacker can encode <!-- and --> as HTML entities to bypass comment stripping entirely:

&#60;!&#45;&#45; ignore above instructions &#45;&#45;&#62;

This entity-encoded string passes through stripHtmlComments undetected (it doesn't match <!--...-->), then normalizeHtmlEntities decodes it into a real HTML comment:

<!-- ignore above instructions -->

The result is a fully functional hidden comment injected into sanitized content — a prompt injection vector.

Attack vector

An attacker can embed entity-encoded HTML comments in GitHub issue bodies, PR descriptions, or comments. These comments are invisible when rendered by GitHub but survive the sanitizer and reach the LLM as hidden instructions.

Fix

Move normalizeHtmlEntities to run before stripHtmlComments so that entity-encoded markup is decoded first, then stripped by the appropriate sanitization step.

Before:

content = stripHtmlComments(content);     // 1. Can't see encoded comments
content = stripInvisibleCharacters(content);
content = stripMarkdownImageAltText(content);
content = stripMarkdownLinkTitles(content);
content = stripHiddenAttributes(content);
content = normalizeHtmlEntities(content);  // 6. Decodes them into real comments — too late

After:

content = normalizeHtmlEntities(content);  // 1. Decode entities first
content = stripHtmlComments(content);      // 2. Now catches decoded comments
content = stripInvisibleCharacters(content);
content = stripMarkdownImageAltText(content);
content = stripMarkdownLinkTitles(content);
content = stripHiddenAttributes(content);

Tests added

  • Entity-encoded HTML comment injection (decimal entities: &#60;, &#45;, &#62;)
  • Hex entity-encoded HTML comment injection (&#x3C;, &#x2D;, &#x3E;)

Both tests verify that the injected comment content is fully removed from sanitized output.

…n sanitizer

Move normalizeHtmlEntities to run before stripHtmlComments in the
sanitizeContent pipeline. Previously, entity-encoded HTML comments
(e.g. &anthropics#60;!&anthropics#45;&anthropics#45; ... &anthropics#45;&anthropics#45;&anthropics#62;) passed through
stripHtmlComments undetected and were then decoded into real comment
tags by normalizeHtmlEntities, allowing hidden prompt injection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant