Autonomous Web Penetration Testing Agent
Multi-provider LLM support · Human-in-the-loop · Lightweight architecture
GhostPWN is an autonomous web penetration testing agent designed for academic research in offensive security. It orchestrates multiple LLM providers to perform grey-box web application testing through a multi-agent pipeline, with human oversight at every critical decision point.
The core research contribution is comparative analysis of LLM providers (Claude, GPT, Gemini) on offensive security tasks — evaluating reasoning quality, vulnerability detection accuracy, and exploit generation across providers.
┌─────────────────────────────────────────────────────┐
│ GhostPWN Agent │
│ │
│ ┌──────────┐ ┌──────────┐ ┌─────────┐ ┌──────┐ │
│ │ Recon │→ │ Analysis │→ │ Exploit │→ │Report│ │
│ │ Agent │ │ Agent │ │ Agent │ │Agent │ │
│ └──────────┘ └──────────┘ └─────────┘ └──────┘ │
│ │ │ │ │ │
│ └──────────────┴─────────────┴───────────┘ │
│ │ │
│ ┌─────────▼─────────┐ │
│ │ Vercel AI SDK │ │
│ │ (Orchestrator) │ │
│ └─────────┬─────────┘ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌────────┐ │
│ │ OpenAI │ │Anthropic │ │ Other │ │
│ └─────────┘ └──────────┘ └────────┘ │
│ │
│ ┌──────────────────┐ ┌────────────────────────┐ │
│ │ SQLite │ │ OpenTUI Terminal UI │ │
│ └──────────────────┘ └────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ HTTP Clients: Playwright · Selenium · │ │
│ └──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Pipeline: Recon → Analysis → Exploit → Report
Each agent operates autonomously within its phase but requires human approval before exploit execution.
| Layer | Technology |
|---|---|
| Runtime | Bun |
| Language | TypeScript (strict, no any) |
| Framework | SolidJS (OpenTUI template) |
| LLM Orchestration | Vercel AI SDK |
| Terminal UI | OpenTUI (React-based TUI) |
| Database | SQLite |
| HTTP Clients | Playwright, Selenium, requests |
- Agent Orchestration Framework — Multi-agent pipeline powered by Vercel AI SDK with phase-based task delegation.
- OpenTUI Terminal Interface — Real-time state sync and interactive terminal UI for monitoring agent progress.
- Multi-Provider LLM Integration — Comparative execution across OpenAI, Anthropic, and local models.
- Human-in-the-Loop Workflow — Mandatory human validation before exploit execution.
- SQLite Knowledge Base — Persistent storage for reconnaissance data, vulnerability fingerprints, and session history.
- Academic Documentation — Publication-ready analysis and methodology documentation.
- No Docker/Temporal dependencies — Self-contained, single-process architecture.
- Lightweight — Minimal external dependencies; runs on a single machine.
- Research-grade — Code quality and documentation suitable for academic publication.
- Ethical boundaries — Designed for authorized testing against known vulnerable applications (OWASP Juice Shop, DVWA).
| Target | Type | Purpose |
|---|---|---|
| OWASP Juice Shop | Intentionally vulnerable | Grey-box web app testing |
| DVWA | Intentionally vulnerable | Baseline vulnerability coverage |
- Comparative LLM Analysis — Benchmark Claude, GPT, and Gemini on vulnerability detection, exploit reasoning, and report generation.
- Autonomous Agent Evaluation — Measure end-to-end pipeline effectiveness across different provider configurations.
# Clone
git clone https://github.com/GhostPWN/ghostpwn.git
cd ghostpwn
# Install dependencies
bun install
# Run
bun run startMIT License. See LICENSE for details.
Contributions are welcome. Please open an issue before submitting a PR to discuss the proposed change.
Built for academic research in offensive security.