noseeum

A FRAMEWORK FOR UNICODE-BASED EXPLOITATION

Overview • Features • Agents • Installation • Usage • Development • Structure

OVERVIEW

Primary Function: Execute Unicode smuggling attacks including Trojan Source, homoglyph substitution, and invisible character encoding to hide malicious code in plain sight

noseeum is a modular offensive security framework for executing Unicode-based attacks

noseeum encodes their payload in the same/similar fashion as exhibited in the "GlassWorm" malware of late 2025

noseeum employs a range of obfuscation and encoding techniques into an extensible CLI

NOSEEUM IN ACTION

Below is a screencap of the VirusTotal analysis of the unencoded powershell malware (BEFORE processing with noseeum) as well as its "MITRE ATT&CK Tactics and Techniques" Chart

NOTE THE 8/62 DETECTION RATE
HASH = f6adc7db3ce7e756bcfd995c6bfeae1480e4626ab4c049644754903e2610a104

Below is a screencap of the VirusTotal analysis of the Zero Width Character-encoded powershell malware (AFTER processing with noseeum) as well as its "MITRE ATT&CK Tactics and Techniques" Chart

NOTE THE 0/62 DETECTION RATE
HASH = b700553732b9c8c2843885dc4f1122d2471beac47d682e67863f81cbb6d9a55f

FEATURES

Unified Command-Line Interface

Noseeum provides a single, clean command-line interface powered by Python's click library

Modular Architecture: Each attack vector is a self-contained module, allowing for rapid development and integration of new exploits
Multiple Attack Vectors:
- Bidi (Trojan Source): Make malicious code appear as harmless comments
- Homoglyph: Evade signature-based detection and confuse human analysts by substituting characters with visually identical ones
- Invisible Ink: Hide payloads steganographically within benign text or generate imperceptible prompts to jailbreak LLMs
- File Steganography: Encode entire files as zero-width character sequences and decode them back
- Language-Specific Exploits: Target unique weaknesses in Python, JavaScript, and Java
- Normalization Exploitation: Craft payloads that normalize differently across system components (parser vs. scanner)
- Unassigned Planes / Variation Selectors: Generate syntactically valid identifiers using characters from unassigned Unicode planes (U+20000–U+2FFFD)
- Payload-injection via Identifier Characters: Encode malicious data within language constructs like object properties, class names, or function names
Advanced Language Modules:
- Go: Exploits Go's configurable lexer and permissive Unicode handling
- Kotlin: Uses permissive frontend with restrictive backend to create compilation-failing code
- JavaScript: Performs AST-level manipulations and low-entropy payload generation
- Swift: Leverages ambiguous identifier handling and unassigned planes support
Globally Installable`: Can be installed as a system-wide command-line tool using pip

Detection and Scanning Module

Includes a scanner to identify the presence of these same Unicode smuggling vulnerabilities in source code

File Vulnerability Scanning: Scan individual files for Unicode smuggling vulnerabilities
Multi-Language Support: Detect vulnerabilities across Python, JavaScript, Java, and other languages
Comprehensive Detection: Identifies various types of Unicode exploits including Bidi, homoglyphs, and invisible characters

AGENT MENAGERIE

New in 2026: noseeum now includes a complete autonomous agent system powered by Claude AI, featuring 15 specialized agents that can operate independently or as coordinated swarms for comprehensive Unicode security research, attack development, and defense.

Agent Categories

🔬 Research & Discovery

Unicode Archaeologist: Discovers new Unicode vulnerabilities, mines CVE databases, tracks exploitable control characters
Language Grammar Hunter: Analyzes programming language specifications for Unicode handling quirks and parser edge cases

⚔️ Attack Development

Payload Artisan: Generates context-aware malicious payloads that blend naturally with target codebases
Stealth Optimizer: Optimizes attacks for maximum evasion against security tools (Semgrep, Bandit, ESLint)
Polyglot Specialist: Creates sophisticated cross-language polyglot attacks exploiting syntax overlaps

🛡️ Defense & Validation

Red Team Validator: Tests attack effectiveness against real-world security scanners and linters
YARA Rule Smith: Generates YARA detection rules and IOC signatures for Unicode attacks
Detector Adversary: Continuously improves noseeum's scanner through adversarial testing

📊 Analysis & Documentation

Vulnerability Cartographer: Maps attack surfaces, generates visualization, creates comprehensive attack trees
Report Synthesizer: Produces technical reports, CVE submissions, and security advisories

🔧 Infrastructure

Test Oracle: Maintains comprehensive test coverage, generates fuzzing inputs, ensures code quality
Module Architect: Scaffolds new attack modules following framework patterns and best practices

🎯 Specialized Research

Homoglyph Curator: Discovers and maintains registry of visually similar Unicode characters
Normalization Alchemist: Exploits Unicode normalization (NFC, NFD, NFKC, NFKD) edge cases and collisions
Bidirectional Puppeteer: Masters Trojan Source attacks using RTL/LTR control characters

Agent System Features

Autonomous Operation: Each agent operates independently with minimal supervision
Swarm Intelligence: Coordinate multiple agents for complex multi-stage attacks
Persistent Memory: Agents maintain state and learnings across sessions
Inter-Agent Communication: Agents collaborate and share findings
Tool Integration: Native integration with noseeum framework modules
Artifact Generation: All agents produce structured outputs and actionable results
Enhanced Logging: Comprehensive logging with configurable verbosity levels (DEBUG/INFO/WARNING/ERROR)
Detailed Output: Rich console output with timestamps, status updates, and execution metrics
File-based Logging: All agent activities logged to individual files in agents/logs/ directory

Quick Start with Agents

# Setup agent system
./agents/setup.sh
export ANTHROPIC_API_KEY="your-key"

# List all available agents
python3 agents/cli.py list

# Run single agent
python3 agents/cli.py run unicode_archaeologist "Discover new vulnerabilities"

# Run coordinated swarm
python3 agents/cli.py swarm "Comprehensive Python Unicode analysis"

# Run specific agents as swarm
python3 agents/cli.py swarm "Generate and validate attacks" \
  --agents payload_artisan,red_team_validator,yara_rule_smith

Agent Usage Examples

Discover new Unicode vulnerabilities:

python3 agents/cli.py run unicode_archaeologist \
  "Find exploitable format characters in Unicode blocks U+2000-U+206F"

Generate stealthy attack payloads:

python3 agents/cli.py run payload_artisan \
  "Generate context-aware Bidi attacks for Python" \
  --context '{"language":"python","attack_type":"bidi"}'

Test and validate attacks:

python3 agents/cli.py run red_team_validator \
  "Validate attacks against Semgrep and Bandit" \
  --context '{"attack":"payload.py","tools":["semgrep","bandit"]}'

Create detection rules:

python3 agents/cli.py run yara_rule_smith \
  "Generate YARA rules for invisible character attacks" \
  --context '{"attack_type":"invisible"}'

Multi-stage research pipeline:

from agents import AgentOrchestrator

orchestrator = AgentOrchestrator()

# Stage 1: Research
research = orchestrator.run_agent('unicode_archaeologist', 'Find vulnerabilities')

# Stage 2: Weaponize
attacks = orchestrator.run_agent('payload_artisan', 'Generate attacks',
                                 context={'findings': research['findings']})

# Stage 3: Validate
results = orchestrator.run_agent('red_team_validator', 'Test attacks',
                                 context={'attacks': attacks['payloads']})

# Stage 4: Document
report = orchestrator.run_agent('report_synthesizer', 'Create report',
                                context={'results': results})

Agent Documentation

For comprehensive agent documentation, see:

AGENT_QUICKSTART.md - Quick start guide
agents/README.md - Architecture overview
agents/USAGE.md - Detailed usage guide
AGENTS_IMPLEMENTATION.md - Implementation details

Agent System Architecture

The agent system is built on a modular architecture with:

Base Framework: Shared tools, memory, and communication systems
Orchestrator: Swarm coordination with thread-pool execution
CLI Interface: Comprehensive command-line tools
Python API: Programmatic agent control
Test Suite: Comprehensive integration tests
Examples: Working examples for all use cases

INSTALLATION

Global Installation (Recommended)

noseeum can be installed as a globally accessible command-line tool:

Clone the repository:
```
git clone <repository_url>
cd noseeum
```

Install required data files: Before using the framework, you need to generate the required registry files:

python3 create_registry.py      # Creates homoglyph_registry.json
python3 create_nfkc_map.py      # Creates nfkc_map.json

Install the package:
```
pip install .
```
or using the Makefile:
```
make install
```

This will install the noseeum command globally on your system, making it accessible from any directory

Uninstallation

To remove the globally installed package:

make uninstall

BASIC USAGE

All functionality is accessed through the noseeum command

View all available commands:

noseeum --help

View attack-specific commands:

noseeum attack --help

Scan a file for vulnerabilities:

noseeum detect --file /path/to/your/file.js

For a complete breakdown of every command, option, and argument, refer to the USAGE.md document

DEVELOPMENT

This project uses a Makefile to streamline common development tasks.

make install: Sets up the development environment, installs dependencies from requirements.txt, creates required data files, and installs the noseeum package in editable mode
make uninstall: Removes the noseeum package from your system
make clean: Deletes all build artifacts, such as build/, dist/, and .egg-info/ directories

PACKAGE STRUCTURE

The framework is organized as follows:

noseeum/: Main Python package containing:
- attacks/: Individual modules for each attack vector
- core/: Core engine, grammar database, and integration components
- detector/: Scanning and detection functionality
- utils/: Helper utilities and error handling
- data/: Embedded data files (homoglyph_registry.json, nfkc_map.json)
agents/: NEW Autonomous agent system:
- base/: Base agent framework (agent, tools, memory, communication)
- research/: Research agents (Unicode Archaeologist, Language Grammar Hunter)
- attack_dev/: Attack development agents (Payload Artisan, Stealth Optimizer, Polyglot Specialist)
- defense/: Defense agents (Red Team Validator, YARA Rule Smith, Detector Adversary)
- analysis/: Analysis agents (Vulnerability Cartographer, Report Synthesizer)
- infrastructure/: Infrastructure agents (Test Oracle, Module Architect)
- specialized/: Specialized research agents (Homoglyph Curator, Normalization Alchemist, Bidi Puppeteer)
- orchestrator.py: Swarm coordination system
- cli.py: Agent CLI interface
- tests/: Agent integration tests
- examples/: Usage examples and tutorials
create_registry.py: Script to generate the homoglyph registry
create_nfkc_map.py: Script to generate the NFKC mapping

RECENT IMPROVEMENTS

Autonomous Agent System (2026 - Latest)

🚀 NEW: Complete Agent Menagerie - Added 15 autonomous Claude-powered agents for Unicode security research
Agent Categories: Research, Attack Development, Defense, Analysis, Infrastructure, Specialized Research
Swarm Intelligence: Coordinate multiple agents for complex multi-stage operations
Orchestration System: Thread-pool based swarm coordinator with intelligent task distribution
CLI & API: Comprehensive command-line interface and Python API for agent control
Persistent Memory: Agents maintain state and learnings across sessions
Inter-Agent Communication: Collaboration and information sharing between agents
Complete Documentation: 4 comprehensive docs (7,000+ lines), 4 working examples, full test suite
Production Ready: 41 files, 36 Python modules, comprehensive integration tests

Code Quality & Reliability

Fixed critical logic bug in homoglyph identifier replacement that could cause incorrect output
Added Python 3.8+ compatibility by replacing Python 3.9+ type annotations
Improved error handling by replacing bare except clauses with proper exception types
Consolidated duplicate code by moving file encoding logic to shared utilities
Enhanced CLI consistency by standardizing error output with click.echo()
Completed language support by adding grammar definitions for Java, Rust, C, and C++
Improved path validation with more reliable directory traversal prevention
Added pytest dependency to requirements for proper test execution

Testing

Run the test suite with:

pip install -e ".[dev]"  # Install with dev dependencies
pytest tests/

LICENSE

noseeum is sicced upon the Whyrld under the

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
agents		agents
assets		assets
noseeum		noseeum
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
UNLICENSE		UNLICENSE
agents-cli		agents-cli
commit.txt		commit.txt
config.json		config.json
create_nfkc_map.py		create_nfkc_map.py
create_registry.py		create_registry.py
eslint.config.mjs		eslint.config.mjs
etq		etq
homoglyph_registry.json		homoglyph_registry.json
install_security_tools.sh		install_security_tools.sh
nfkc_map.json		nfkc_map.json
requirements.txt		requirements.txt
setup.py		setup.py

License

umpolungfish/noseeum

Folders and files

Latest commit

History

Repository files navigation