Skip to content

sanitize_content_core.cjs: incomplete invisible-char strip set and missing detection gate for MCP-driven safe-output writes #24154

@szabta89

Description

@szabta89

Summary

hardenUnicodeText in sanitize_content_core.cjs (v0.65.0–v0.65.3) strips a subset of invisible Unicode characters but omits U+034F (Combining Grapheme Joiner), U+200E (Left-to-Right Mark), U+200F (Right-to-Left Mark), and U+00AD (Soft Hyphen). A prompt-injection payload can use any of these four characters to obfuscate markers or patterns and pass through the sanitizer unchanged. Separately, native-card-style workflows where write safe-outputs are triggered via MCP tool calls compile to a safe_outputs job with needs: agent only — no detection job is wired, contrary to the reference documentation's claim that detection is "automatically enabled when safe outputs are configured."

Affected Area

Safe-outputs content sanitization layer (output trust boundary). Specifically:

  • sanitize_content_core.cjshardenUnicodeText strip regex (line 935)
  • Compiled safe_outputs job dependency in native-card workflows vs. pentest-full-style orchestrated workflows

Reproduction Outline

  1. On any gh-aw runner, load the deployed sanitizer and inject a CGJ-obfuscated string:
    const {hardenUnicodeText} = require(process.env.RUNNER_TEMP + '/gh-aw/safeoutputs/sanitize_content_core.cjs');
    const marker = 'SECRET-MARKER-test';
    const obf = marker.split('').join('\u034F'); // Combining Grapheme Joiner between each char
    const result = hardenUnicodeText(obf);
    console.log('CGJ survives:', result.includes('\u034F')); // → true
  2. Pass the obfuscated string as the body of a create_issue safe-outputs call from an individual native-card workflow (one that uses MCP tool calls rather than frontmatter-declared safe-outputs).
  3. Observe: safe_outputs executes without a detection gate; the issue body contains the obfuscated string verbatim.
  4. Confirm the detection gap by comparing the compiled safe_outputs job: native-card has needs: agent; pentest-full has needs: [agent, detection].

Observed Behavior

  • U+034F, U+200E, U+200F, and U+00AD survive hardenUnicodeText unchanged (output length equals input length, characters still present).
  • Markdown table-cell and code-block fragmentation of marker strings also survives sanitizeContentCore intact.
  • Native-card compiled workflows have no detection job and no detection dependency on safe_outputs, leaving the detection gate absent for MCP-driven writes.

Expected Behavior

  • hardenUnicodeText should strip (or normalize) all invisible/format Unicode characters, including U+034F, U+200E, U+200F, and U+00AD. Consider extending to the full Unicode Cf (format) category.
  • Any workflow where write safe-outputs are configured — whether via frontmatter declarations or MCP tool calls — should compile with a detection job and gate safe_outputs on needs.detection.result == 'success', consistent with the documented automatic-detection guarantee.

Security Relevance

Incomplete invisible-character stripping allows prompt-injection payloads to use standard Unicode obfuscation techniques to survive the sanitizer and reach GitHub API write endpoints undetected. The missing detection gate in native-card workflows means the "automatically enabled" detection guarantee does not hold for the MCP-driven write path, which is the primary write path for agentic tool-call workflows. Together these gaps reduce the effective depth of the output trust boundary.

gh-aw version (this workflow): v0.65.5

Original finding: https://github.com/githubnext/gh-aw-security/issues/1654

Generated by File Issue ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions