A series of Capture The Flag (CTF) challenges demonstrating security risks in AI agents with command execution capabilities.
This repository contains multiple levels of CTF challenges, each demonstrating different security concepts around AI agents and command execution:
-
Level 1: Command Execution via API
- Demonstrates basic command injection risks
- Shows why blindly executing commands is dangerous
- Uses a restricted command allowlist
-
Level 2: Multi-Stage Command Injection
- Demonstrates input validation bypass techniques
- Shows attack chaining through note → report → summary workflow
- Features naive security filtering that can be circumvented
- Docker Desktop for Mac
- curl
- jq (for pretty JSON output)
brew install jq- Clone the repository:
git clone https://github.com/yourusername/agent-ctf.git
cd agent-ctf- Start a level (example with Level 1):
cd level1
docker compose up --build- Try the challenge by interacting with the API endpoint
- Goal: Find and read a hidden flag file
- Concept: Command execution via API endpoints
- Target:
/tmp/ctf/level1/flag.txt - Level 1 Details
- Containers run read-only
- Dropped capabilities
- Command allowlisting
- Resource limits
- Temporary filesystems
Each level follows this structure:
levelN/
├── docker-compose.yml
├── Dockerfile
├── seed.sh
├── safe_sh
├── app/
│ ├── agent.py
│ └── run.sh
└── README.md
Want to add a level? PRs welcome! Each level should:
- Be self-contained in Docker
- Have clear learning objectives
- Include proper security controls
- Document solution methods
MIT License - See LICENSE for details