Skip to content

a modular offensive security framework designed for executing Unicode-based attacks, like those seen in the "GlassWorm" compromises

License

Notifications You must be signed in to change notification settings

umpolungfish/noseeum

Repository files navigation

noseeum

A FRAMEWORK FOR UNICODE-BASED EXPLOITATION

glasswurm

Python   Unicode   Security   Cross-Platform

OverviewFeaturesAgentsInstallationUsageDevelopmentStructure


OVERVIEW

Primary Function: Execute Unicode smuggling attacks including Trojan Source, homoglyph substitution, and invisible character encoding to hide malicious code in plain sight

noseeum is a modular offensive security framework for executing Unicode-based attacks

noseeum encodes their payload in the same/similar fashion as exhibited in the "GlassWorm" malware of late 2025

noseeum employs a range of obfuscation and encoding techniques into an extensible CLI

NOSEEUM IN ACTION

Below is a screencap of the VirusTotal analysis of the unencoded powershell malware (BEFORE processing with noseeum) as well as its "MITRE ATT&CK Tactics and Techniques" Chart

  • NOTE THE 8/62 DETECTION RATE
  • HASH = f6adc7db3ce7e756bcfd995c6bfeae1480e4626ab4c049644754903e2610a104

before

before MITRE

Below is a screencap of the VirusTotal analysis of the Zero Width Character-encoded powershell malware (AFTER processing with noseeum) as well as its "MITRE ATT&CK Tactics and Techniques" Chart

  • NOTE THE 0/62 DETECTION RATE
  • HASH = b700553732b9c8c2843885dc4f1122d2471beac47d682e67863f81cbb6d9a55f

after

after MITRE

FEATURES

Unified Command-Line Interface

Noseeum provides a single, clean command-line interface powered by Python's click library

  • Modular Architecture: Each attack vector is a self-contained module, allowing for rapid development and integration of new exploits

  • Multiple Attack Vectors:

    • Bidi (Trojan Source): Make malicious code appear as harmless comments
    • Homoglyph: Evade signature-based detection and confuse human analysts by substituting characters with visually identical ones
    • Invisible Ink: Hide payloads steganographically within benign text or generate imperceptible prompts to jailbreak LLMs
    • File Steganography: Encode entire files as zero-width character sequences and decode them back
    • Language-Specific Exploits: Target unique weaknesses in Python, JavaScript, and Java
    • Normalization Exploitation: Craft payloads that normalize differently across system components (parser vs. scanner)
    • Unassigned Planes / Variation Selectors: Generate syntactically valid identifiers using characters from unassigned Unicode planes (U+20000–U+2FFFD)
    • Payload-injection via Identifier Characters: Encode malicious data within language constructs like object properties, class names, or function names
  • Advanced Language Modules:

    • Go: Exploits Go's configurable lexer and permissive Unicode handling
    • Kotlin: Uses permissive frontend with restrictive backend to create compilation-failing code
    • JavaScript: Performs AST-level manipulations and low-entropy payload generation
    • Swift: Leverages ambiguous identifier handling and unassigned planes support
  • Globally Installable`: Can be installed as a system-wide command-line tool using pip

Detection and Scanning Module

Includes a scanner to identify the presence of these same Unicode smuggling vulnerabilities in source code

  • File Vulnerability Scanning: Scan individual files for Unicode smuggling vulnerabilities
  • Multi-Language Support: Detect vulnerabilities across Python, JavaScript, Java, and other languages
  • Comprehensive Detection: Identifies various types of Unicode exploits including Bidi, homoglyphs, and invisible characters

AGENT MENAGERIE

AI Agents   Claude   Swarm

New in 2026: noseeum now includes a complete autonomous agent system powered by Claude AI, featuring 15 specialized agents that can operate independently or as coordinated swarms for comprehensive Unicode security research, attack development, and defense.

Agent Categories

🔬 Research & Discovery

  • Unicode Archaeologist: Discovers new Unicode vulnerabilities, mines CVE databases, tracks exploitable control characters
  • Language Grammar Hunter: Analyzes programming language specifications for Unicode handling quirks and parser edge cases

⚔️ Attack Development

  • Payload Artisan: Generates context-aware malicious payloads that blend naturally with target codebases
  • Stealth Optimizer: Optimizes attacks for maximum evasion against security tools (Semgrep, Bandit, ESLint)
  • Polyglot Specialist: Creates sophisticated cross-language polyglot attacks exploiting syntax overlaps

🛡️ Defense & Validation

  • Red Team Validator: Tests attack effectiveness against real-world security scanners and linters
  • YARA Rule Smith: Generates YARA detection rules and IOC signatures for Unicode attacks
  • Detector Adversary: Continuously improves noseeum's scanner through adversarial testing

📊 Analysis & Documentation

  • Vulnerability Cartographer: Maps attack surfaces, generates visualization, creates comprehensive attack trees
  • Report Synthesizer: Produces technical reports, CVE submissions, and security advisories

🔧 Infrastructure

  • Test Oracle: Maintains comprehensive test coverage, generates fuzzing inputs, ensures code quality
  • Module Architect: Scaffolds new attack modules following framework patterns and best practices

🎯 Specialized Research

  • Homoglyph Curator: Discovers and maintains registry of visually similar Unicode characters
  • Normalization Alchemist: Exploits Unicode normalization (NFC, NFD, NFKC, NFKD) edge cases and collisions
  • Bidirectional Puppeteer: Masters Trojan Source attacks using RTL/LTR control characters

Agent System Features

  • Autonomous Operation: Each agent operates independently with minimal supervision
  • Swarm Intelligence: Coordinate multiple agents for complex multi-stage attacks
  • Persistent Memory: Agents maintain state and learnings across sessions
  • Inter-Agent Communication: Agents collaborate and share findings
  • Tool Integration: Native integration with noseeum framework modules
  • Artifact Generation: All agents produce structured outputs and actionable results
  • Enhanced Logging: Comprehensive logging with configurable verbosity levels (DEBUG/INFO/WARNING/ERROR)
  • Detailed Output: Rich console output with timestamps, status updates, and execution metrics
  • File-based Logging: All agent activities logged to individual files in agents/logs/ directory

Quick Start with Agents

# Setup agent system
./agents/setup.sh
export ANTHROPIC_API_KEY="your-key"

# List all available agents
python3 agents/cli.py list

# Run single agent
python3 agents/cli.py run unicode_archaeologist "Discover new vulnerabilities"

# Run coordinated swarm
python3 agents/cli.py swarm "Comprehensive Python Unicode analysis"

# Run specific agents as swarm
python3 agents/cli.py swarm "Generate and validate attacks" \
  --agents payload_artisan,red_team_validator,yara_rule_smith

Agent Usage Examples

Discover new Unicode vulnerabilities:

python3 agents/cli.py run unicode_archaeologist \
  "Find exploitable format characters in Unicode blocks U+2000-U+206F"

Generate stealthy attack payloads:

python3 agents/cli.py run payload_artisan \
  "Generate context-aware Bidi attacks for Python" \
  --context '{"language":"python","attack_type":"bidi"}'

Test and validate attacks:

python3 agents/cli.py run red_team_validator \
  "Validate attacks against Semgrep and Bandit" \
  --context '{"attack":"payload.py","tools":["semgrep","bandit"]}'

Create detection rules:

python3 agents/cli.py run yara_rule_smith \
  "Generate YARA rules for invisible character attacks" \
  --context '{"attack_type":"invisible"}'

Multi-stage research pipeline:

from agents import AgentOrchestrator

orchestrator = AgentOrchestrator()

# Stage 1: Research
research = orchestrator.run_agent('unicode_archaeologist', 'Find vulnerabilities')

# Stage 2: Weaponize
attacks = orchestrator.run_agent('payload_artisan', 'Generate attacks',
                                 context={'findings': research['findings']})

# Stage 3: Validate
results = orchestrator.run_agent('red_team_validator', 'Test attacks',
                                 context={'attacks': attacks['payloads']})

# Stage 4: Document
report = orchestrator.run_agent('report_synthesizer', 'Create report',
                                context={'results': results})

Agent Documentation

For comprehensive agent documentation, see:

Agent System Architecture

The agent system is built on a modular architecture with:

  • Base Framework: Shared tools, memory, and communication systems
  • Orchestrator: Swarm coordination with thread-pool execution
  • CLI Interface: Comprehensive command-line tools
  • Python API: Programmatic agent control
  • Test Suite: Comprehensive integration tests
  • Examples: Working examples for all use cases

INSTALLATION

Global Installation (Recommended)

noseeum can be installed as a globally accessible command-line tool:

  1. Clone the repository:

    git clone <repository_url>
    cd noseeum
  2. Install required data files: Before using the framework, you need to generate the required registry files:

    python3 create_registry.py      # Creates homoglyph_registry.json
    python3 create_nfkc_map.py      # Creates nfkc_map.json
  3. Install the package:

    pip install .

    or using the Makefile:

    make install

This will install the noseeum command globally on your system, making it accessible from any directory

Uninstallation

To remove the globally installed package:

make uninstall

BASIC USAGE

All functionality is accessed through the noseeum command

View all available commands:

noseeum --help

View attack-specific commands:

noseeum attack --help

Scan a file for vulnerabilities:

noseeum detect --file /path/to/your/file.js

For a complete breakdown of every command, option, and argument, refer to the USAGE.md document

DEVELOPMENT

This project uses a Makefile to streamline common development tasks.

  • make install: Sets up the development environment, installs dependencies from requirements.txt, creates required data files, and installs the noseeum package in editable mode
  • make uninstall: Removes the noseeum package from your system
  • make clean: Deletes all build artifacts, such as build/, dist/, and .egg-info/ directories

PACKAGE STRUCTURE

The framework is organized as follows:

  • noseeum/: Main Python package containing:
    • attacks/: Individual modules for each attack vector
    • core/: Core engine, grammar database, and integration components
    • detector/: Scanning and detection functionality
    • utils/: Helper utilities and error handling
    • data/: Embedded data files (homoglyph_registry.json, nfkc_map.json)
  • agents/: NEW Autonomous agent system:
    • base/: Base agent framework (agent, tools, memory, communication)
    • research/: Research agents (Unicode Archaeologist, Language Grammar Hunter)
    • attack_dev/: Attack development agents (Payload Artisan, Stealth Optimizer, Polyglot Specialist)
    • defense/: Defense agents (Red Team Validator, YARA Rule Smith, Detector Adversary)
    • analysis/: Analysis agents (Vulnerability Cartographer, Report Synthesizer)
    • infrastructure/: Infrastructure agents (Test Oracle, Module Architect)
    • specialized/: Specialized research agents (Homoglyph Curator, Normalization Alchemist, Bidi Puppeteer)
    • orchestrator.py: Swarm coordination system
    • cli.py: Agent CLI interface
    • tests/: Agent integration tests
    • examples/: Usage examples and tutorials
  • create_registry.py: Script to generate the homoglyph registry
  • create_nfkc_map.py: Script to generate the NFKC mapping

RECENT IMPROVEMENTS

Autonomous Agent System (2026 - Latest)

  • 🚀 NEW: Complete Agent Menagerie - Added 15 autonomous Claude-powered agents for Unicode security research
  • Agent Categories: Research, Attack Development, Defense, Analysis, Infrastructure, Specialized Research
  • Swarm Intelligence: Coordinate multiple agents for complex multi-stage operations
  • Orchestration System: Thread-pool based swarm coordinator with intelligent task distribution
  • CLI & API: Comprehensive command-line interface and Python API for agent control
  • Persistent Memory: Agents maintain state and learnings across sessions
  • Inter-Agent Communication: Collaboration and information sharing between agents
  • Complete Documentation: 4 comprehensive docs (7,000+ lines), 4 working examples, full test suite
  • Production Ready: 41 files, 36 Python modules, comprehensive integration tests

Code Quality & Reliability

  • Fixed critical logic bug in homoglyph identifier replacement that could cause incorrect output
  • Added Python 3.8+ compatibility by replacing Python 3.9+ type annotations
  • Improved error handling by replacing bare except clauses with proper exception types
  • Consolidated duplicate code by moving file encoding logic to shared utilities
  • Enhanced CLI consistency by standardizing error output with click.echo()
  • Completed language support by adding grammar definitions for Java, Rust, C, and C++
  • Improved path validation with more reliable directory traversal prevention
  • Added pytest dependency to requirements for proper test execution

Testing

Run the test suite with:

pip install -e ".[dev]"  # Install with dev dependencies
pytest tests/

LICENSE

noseeum is sicced upon the Whyrld under the UNLICENSE

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages