Write test steps in plain English. An AI agent executes them. You get structured pass/fail results.
Bugatti is a test harness that drives AI coding agents through structured test plans defined in TOML files. Point it at a Flask app, an Express API, a static site, or a CLI tool — the agent figures out how to verify each step and reports back with OK, WARN, or ERROR.
Manual QA doesn't scale. Traditional E2E test frameworks are brittle and expensive to maintain. Bugatti sits in between — you describe what to test in natural language, and an AI agent handles the how.
- Plain-English test steps — no selectors, no page objects, no test framework DSL
- Structured results —
RESULT OK,RESULT WARN,RESULT ERRORper step - Composable test files — include shared setup, glob multiple suites together
- Built-in infrastructure — short-lived setup commands, long-lived servers with readiness polling
- Full audit trail — transcripts, logs, and reports saved per run
- Config — loads
bugatti.config.toml(optional) - Parse — reads the test file and expands includes into a flat step list
- Setup — runs short-lived commands, spawns long-lived commands, polls readiness
- Bootstrap — sends harness instructions + result contract to the agent
- Execute — sends each step instruction, streams the response, parses the
RESULTverdict - Report — writes run metadata, transcripts, and a markdown report to
.bugatti/runs/<run_id>/ - Teardown — stops long-lived processes
curl -sSf https://raw.githubusercontent.com/codesoda/bugatti-cli/main/install.sh | shDownloads the latest release binary from GitHub.
git clone https://github.com/codesoda/bugatti-cli.git
cd bugatti-cli
sh install.shBuilds from source with cargo build --release.
Already using Claude Code, Cursor, Windsurf, or another AI coding agent? Just paste this prompt:
I want to add automated testing to this project using bugatti (https://bugatti.dev/llms.txt).
Help me get it installed and configured, then interview me about what tests I want to create first.
The agent will read the docs, install bugatti, set up your config, and walk you through writing your first tests.
Create a test file:
# login.test.toml
name = "Login flow"
[[steps]]
instruction = "Navigate to /login and verify the page loads"
[[steps]]
instruction = "Enter valid credentials and submit the form"
[[steps]]
instruction = "Verify you are redirected to the dashboard"Run it:
bugatti test login.test.tomlOr discover and run all test files in the project:
bugatti testDiscovery finds all *.test.toml files recursively, skipping hidden directories and _-prefixed files.
Create bugatti.config.toml in your project root:
[provider]
name = "claude-code"
extra_system_prompt = "Use the browser for UI tests."
agent_args = ["--dangerously-skip-permissions"]
step_timeout_secs = 300
strict_warnings = true
base_url = "http://localhost:3000"
[commands.migrate]
kind = "short_lived"
cmd = "npm run db:migrate"
[commands.server]
kind = "long_lived"
cmd = "npm start"
readiness_url = "http://localhost:3000/health"
# Multiple readiness URLs and custom timeout
[commands.docker-stack]
kind = "long_lived"
cmd = "docker compose up"
readiness_urls = ["http://localhost:3000/health", "http://localhost:5432"]
readiness_timeout_secs = 120| Field | Default | Description |
|---|---|---|
name |
"claude-code" |
Provider to use |
extra_system_prompt |
— | Additional system prompt for the agent |
agent_args |
[] |
Extra CLI args passed to the provider |
step_timeout_secs |
300 |
Default timeout per step (seconds) |
strict_warnings |
false |
Treat WARN results as failures |
base_url |
— | Base URL for the app under test (relative URLs in steps resolve against this) |
| Kind | Behavior |
|---|---|
short_lived |
Runs to completion before tests start. Fails the run on non-zero exit. |
long_lived |
Spawns in the background. Optional readiness_url/readiness_urls polled until ready. Torn down after tests complete. |
| Field | Default | Description |
|---|---|---|
readiness_url |
— | Single URL to poll before the command is considered ready |
readiness_urls |
[] |
Multiple URLs to poll (all must respond) |
readiness_timeout_secs |
30 |
How long to wait for readiness before failing |
| Flag | Description |
|---|---|
--strict-warnings |
Treat WARN results as failures (overrides config) |
--skip-cmd <name> |
Skip a configured command |
--skip-readiness <name> |
Skip readiness check for a command |
--from-checkpoint <name> |
Resume from a named checkpoint (auto-skips earlier steps, restores state) |
Test files are TOML with a .test.toml extension. Each step must have exactly one of instruction, include_path, or include_glob.
| Field | Description |
|---|---|
instruction |
Plain-English instruction sent to the agent |
include_path |
Path to another test file to inline |
include_glob |
Glob pattern to inline multiple test files |
step_timeout_secs |
Per-step timeout override (seconds) |
skip |
If true, step is skipped (counts as passed) |
checkpoint |
Checkpoint name — saves state after pass, restores if skipped |
Steps can include other test files inline using include_path or include_glob. The included file's steps are expanded in place, creating a flat step list for execution.
# login.test.toml
name = "Login flow"
[[steps]]
include_path = "_setup.test.toml"
[[steps]]
instruction = "Navigate to /login and submit credentials"
[[steps]]
instruction = "Verify redirect to dashboard"The steps from _setup.test.toml are inserted at position 1, before the login steps. Paths are relative to the including file's directory.
Include multiple files matching a glob pattern. Matched files are sorted alphabetically for deterministic order.
# full-suite.test.toml
name = "Full test suite"
[[steps]]
include_path = "_setup.test.toml"
[[steps]]
include_glob = "features/*.test.toml"This includes _setup.test.toml first, then all *.test.toml files under features/ in alphabetical order.
Files prefixed with _ are excluded from automatic discovery (bugatti test without a path argument). This lets you create reusable building blocks that are only executed when included by other test files.
project/
bugatti.config.toml
_setup.test.toml # shared — not discovered
_teardown.test.toml # shared — not discovered
login.test.toml # discovered — includes _setup
checkout.test.toml # discovered — includes _setup
features/
_auth-helpers.test.toml # shared — not discovered
signup.test.toml # discovered
Example shared setup file:
# _setup.test.toml
name = "Shared setup"
[[steps]]
instruction = "Verify the health endpoint returns 200"
[[steps]]
instruction = "Clear test data from the database"Included files can include other files. Bugatti expands everything into a flat step list with sequential IDs.
# _auth.test.toml
name = "Auth helpers"
[[steps]]
instruction = "Log in as test user"
[[steps]]
include_path = "_verify-session.test.toml"Bugatti detects circular includes and fails with a clear error showing the chain:
include cycle detected: root.test.toml -> auth.test.toml -> root.test.toml
Each step tracks which file it came from. The console output shows this:
STEP 1/5 ... Verify health endpoint (from _setup.test.toml)
STEP 2/5 ... Clear test data (from _setup.test.toml)
STEP 3/5 ... Navigate to /login (from login.test.toml)
This makes it easy to identify which file to edit when a step fails.
Override provider settings for a specific test:
name = "Custom provider test"
[overrides.provider]
extra_system_prompt = "Be concise"
step_timeout_secs = 600
base_url = "http://localhost:5000"Add skip = true to any step to bypass it during execution. Skipped steps count as passed, take zero time, and do not send anything to the agent.
[[steps]]
instruction = "Create account and complete onboarding"
skip = true
[[steps]]
instruction = "Configure billing with test card"
skip = true
[[steps]]
instruction = "Invite team member and verify email"Console output when steps are skipped:
SKIP 1/3 ... Create account and complete onboarding (from ftue.test.toml)
SKIP 2/3 ... Configure billing with test card (from ftue.test.toml)
STEP 3/3 ... Invite team member and verify email (from ftue.test.toml)
When to use skip:
- Iterative development — you've run the full suite and steps 1-4 pass. Now you're working on step 5. Mark 1-4 as
skip = trueso each run goes straight to step 5 without re-executing the passing steps. - Focusing on a failing step — a step deep in the suite fails. Skip everything before it to iterate faster on the fix.
- Pairing with checkpoints — skip steps that set up state, and use checkpoints to restore that state instead (see below).
Disabling a skip — remove the line or comment it out with #:
[[steps]]
instruction = "Create account and complete onboarding"
#skip = trueImportant: skipping a step does not undo its effects. If steps 1-3 set up database state and you skip them, that state won't exist unless you either run them first or restore it via a checkpoint.
Checkpoints save and restore external state (databases, files, services) at step boundaries. Combined with skip = true, they let you jump to any point in a test suite without re-executing earlier steps.
A 10-step FTUE (first-time user experience) test takes 15 minutes. Step 8 fails. You fix the code and re-run, but steps 1-7 execute again — another 12 minutes wasted. With checkpoints, you run once, save state after step 7, then on subsequent runs skip steps 1-7 and restore the checkpoint. Step 8 runs immediately against the saved state.
1. Add a [checkpoint] section to bugatti.config.toml:
[checkpoint]
save = "./scripts/checkpoint.sh save"
restore = "./scripts/checkpoint.sh restore"
timeout_secs = 180 # optional, default 120sThe save and restore fields are shell commands. They receive environment variables telling them which checkpoint to operate on.
2. Add checkpoint = "name" to steps in your test file:
name = "FTUE: Full onboarding flow"
[[steps]]
instruction = "Create account via signup form"
checkpoint = "after-signup"
[[steps]]
instruction = "Complete onboarding wizard"
checkpoint = "after-onboarding"
[[steps]]
instruction = "Configure billing with test card"
checkpoint = "after-billing"
[[steps]]
instruction = "Invite team member"
[[steps]]
instruction = "Verify team member received invite email"Checkpoint names must be unique within a test file. Not every step needs a checkpoint — place them at meaningful state boundaries.
When a non-skipped step with checkpoint passes, bugatti runs the save command immediately after:
STEP 1/5 ... Create account via signup form (from ftue.test.toml)
OK 1/5 (23.4s)
SAVE ....... checkpoint "after-signup"
OK ......... checkpoint "after-signup" saved
STEP 2/5 ... Complete onboarding wizard (from ftue.test.toml)
OK 2/5 (45.1s)
SAVE ....... checkpoint "after-onboarding"
OK ......... checkpoint "after-onboarding" saved
Checkpoints are not saved when a step fails — there's no point saving broken state.
When you mark steps as skip = true, bugatti looks at the skipped steps to find the last checkpoint before the first non-skipped step, then runs the restore command:
[[steps]]
instruction = "Create account via signup form"
checkpoint = "after-signup"
skip = true
[[steps]]
instruction = "Complete onboarding wizard"
checkpoint = "after-onboarding"
skip = true
[[steps]]
instruction = "Configure billing with test card"
checkpoint = "after-billing"
skip = true
[[steps]]
instruction = "Invite team member"
[[steps]]
instruction = "Verify team member received invite email"Console output:
SKIP 1/5 ... Create account via signup form (from ftue.test.toml)
SKIP 2/5 ... Complete onboarding wizard (from ftue.test.toml)
SKIP 3/5 ... Configure billing with test card (from ftue.test.toml)
RESTORE .... checkpoint "after-billing"
OK ......... checkpoint "after-billing" restored
STEP 4/5 ... Invite team member (from ftue.test.toml)
Only the last checkpoint is restored — restoring "after-billing" already includes the state from "after-signup" and "after-onboarding".
If you skip steps after the last checkpoint, bugatti warns you that the restored state may be incomplete:
[[steps]]
instruction = "Create account via signup form"
checkpoint = "after-signup"
skip = true
[[steps]]
instruction = "Complete onboarding wizard"
skip = true # no checkpoint!
[[steps]]
instruction = "Configure billing with test card"
skip = true # no checkpoint!
[[steps]]
instruction = "Invite team member"WARN ....... restoring checkpoint "after-signup" from step 1, but 2 step(s) after it were also skipped without checkpoints
RESTORE .... checkpoint "after-signup"
OK ......... checkpoint "after-signup" restored
STEP 4/5 ... Invite team member (from ftue.test.toml)
This means steps 2-3 were skipped but their effects aren't captured in the restored checkpoint. The test may fail because of missing state. Either add checkpoints to those steps or accept the gap.
Save and restore commands receive these environment variables:
| Variable | Example | Description |
|---|---|---|
BUGATTI_CHECKPOINT_ID |
after-onboarding |
The checkpoint name from the step |
BUGATTI_CHECKPOINT_PATH |
.bugatti/checkpoints/after-onboarding/ |
Directory for this checkpoint's data |
The checkpoint directory is created automatically before the command runs. Your script decides what to put in it.
| Field | Default | Description |
|---|---|---|
save |
required | Shell command to save a checkpoint |
restore |
required | Shell command to restore a checkpoint |
timeout_secs |
120 |
Timeout for save/restore commands (kills process on expiry) |
A checkpoint script that saves and restores a PostgreSQL database and an uploads directory:
#!/bin/bash
set -eu
action="${1:?usage: checkpoint.sh save|restore}"
case "$action" in
save)
pg_dump myapp_dev > "$BUGATTI_CHECKPOINT_PATH/db.sql"
cp -r ./uploads "$BUGATTI_CHECKPOINT_PATH/uploads"
echo "Saved DB + uploads for checkpoint $BUGATTI_CHECKPOINT_ID"
;;
restore)
dropdb --if-exists myapp_dev
createdb myapp_dev
psql -d myapp_dev -f "$BUGATTI_CHECKPOINT_PATH/db.sql"
rm -rf ./uploads
cp -r "$BUGATTI_CHECKPOINT_PATH/uploads" ./uploads
echo "Restored DB + uploads for checkpoint $BUGATTI_CHECKPOINT_ID"
;;
esacComment out the checkpoint line on individual steps:
[[steps]]
instruction = "Create account via signup form"
#checkpoint = "after-signup"Or remove the [checkpoint] section from bugatti.config.toml — steps with checkpoint will be ignored if no save/restore commands are configured.
Instead of manually adding skip = true to steps, use --from-checkpoint:
# Resume from the "after-billing" checkpoint — skips steps 1-3 automatically
bugatti test ftue.test.toml --from-checkpoint after-billingThis auto-skips all steps up to and including the step with checkpoint = "after-billing", restores the checkpoint, and executes the remaining steps. No need to edit the TOML file.
If the checkpoint name doesn't exist, bugatti lists the available checkpoints:
ERROR: checkpoint "typo" not found. Available: after-signup, after-onboarding, after-billing
# 1. First run — all steps execute, checkpoints saved at each boundary
bugatti test ftue.test.toml
# 2. Step 4 fails. Fix the code.
# 3. Re-run from the last checkpoint before step 4
bugatti test ftue.test.toml --from-checkpoint after-billing
# 4. Step 4 passes. Do a clean run to confirm everything works end-to-end.
bugatti test ftue.test.tomlWorking examples in examples/:
| Example | What it tests | Key features |
|---|---|---|
static-html |
Local HTML page via browser | No server, browser testing |
python-flask |
Flask API + UI | Long-lived server, readiness URL, strict warnings |
node-express |
Express TypeScript API + UI | pnpm install, shared setup via _ prefix, multi-port, test discovery |
rust-cli |
Rust CLI tool | Short-lived build command, per-step timeout |
| Code | Meaning |
|---|---|
0 |
All steps passed |
1 |
One or more steps failed |
2 |
Configuration or parse error |
3 |
Provider or readiness failure |
4 |
Step timed out |
5 |
Run interrupted (Ctrl+C) |
6 |
Setup command failed |
Full documentation at bugatti.dev. LLM-friendly reference at bugatti.dev/llms.txt.
MIT