bugatti

Write test steps in plain English. An AI agent executes them. You get structured pass/fail results.

Bugatti is a test harness that drives AI coding agents through structured test plans defined in TOML files. Point it at a Flask app, an Express API, a static site, or a CLI tool — the agent figures out how to verify each step and reports back with OK, WARN, or ERROR.

Why?

Manual QA doesn't scale. Traditional E2E test frameworks are brittle and expensive to maintain. Bugatti sits in between — you describe what to test in natural language, and an AI agent handles the how.

Plain-English test steps — no selectors, no page objects, no test framework DSL
Structured results — RESULT OK, RESULT WARN, RESULT ERROR per step
Composable test files — include shared setup, glob multiple suites together
Built-in infrastructure — short-lived setup commands, long-lived servers with readiness polling
Full audit trail — transcripts, logs, and reports saved per run

How It Works

Config — loads bugatti.config.toml (optional)
Parse — reads the test file and expands includes into a flat step list
Setup — runs short-lived commands, spawns long-lived commands, polls readiness
Bootstrap — sends harness instructions + result contract to the agent
Execute — sends each step instruction, streams the response, parses the RESULT verdict
Report — writes run metadata, transcripts, and a markdown report to .bugatti/runs/<run_id>/
Teardown — stops long-lived processes

Install

Pre-built binary (macOS arm64)

curl -sSf https://raw.githubusercontent.com/codesoda/bugatti-cli/main/install.sh | sh

Downloads the latest release binary from GitHub.

From a clone (requires Rust)

git clone https://github.com/codesoda/bugatti-cli.git
cd bugatti-cli
sh install.sh

Builds from source with cargo build --release.

Quick Start with a Coding Agent

Already using Claude Code, Cursor, Windsurf, or another AI coding agent? Just paste this prompt:

I want to add automated testing to this project using bugatti (https://bugatti.dev/llms.txt).
Help me get it installed and configured, then interview me about what tests I want to create first.

The agent will read the docs, install bugatti, set up your config, and walk you through writing your first tests.

Quick Start (Manual)

Create a test file:

# login.test.toml
name = "Login flow"

[[steps]]
instruction = "Navigate to /login and verify the page loads"

[[steps]]
instruction = "Enter valid credentials and submit the form"

[[steps]]
instruction = "Verify you are redirected to the dashboard"

Run it:

bugatti test login.test.toml

Or discover and run all test files in the project:

bugatti test

Discovery finds all *.test.toml files recursively, skipping hidden directories and _-prefixed files.

Configuration

Create bugatti.config.toml in your project root:

[provider]
name = "claude-code"
extra_system_prompt = "Use the browser for UI tests."
agent_args = ["--dangerously-skip-permissions"]
step_timeout_secs = 300
strict_warnings = true
base_url = "http://localhost:3000"

[commands.migrate]
kind = "short_lived"
cmd = "npm run db:migrate"

[commands.server]
kind = "long_lived"
cmd = "npm start"
readiness_url = "http://localhost:3000/health"

# Multiple readiness URLs and custom timeout
[commands.docker-stack]
kind = "long_lived"
cmd = "docker compose up"
readiness_urls = ["http://localhost:3000/health", "http://localhost:5432"]
readiness_timeout_secs = 120

Provider Settings

Field	Default	Description
`name`	`"claude-code"`	Provider to use
`extra_system_prompt`	—	Additional system prompt for the agent
`agent_args`	`[]`	Extra CLI args passed to the provider
`step_timeout_secs`	`300`	Default timeout per step (seconds)
`strict_warnings`	`false`	Treat WARN results as failures
`base_url`	—	Base URL for the app under test (relative URLs in steps resolve against this)

Commands

Kind	Behavior
`short_lived`	Runs to completion before tests start. Fails the run on non-zero exit.
`long_lived`	Spawns in the background. Optional `readiness_url`/`readiness_urls` polled until ready. Torn down after tests complete.

Long-lived command options

Field	Default	Description
`readiness_url`	—	Single URL to poll before the command is considered ready
`readiness_urls`	`[]`	Multiple URLs to poll (all must respond)
`readiness_timeout_secs`	`30`	How long to wait for readiness before failing

CLI Flags

Flag	Description
`--strict-warnings`	Treat WARN results as failures (overrides config)
`--skip-cmd <name>`	Skip a configured command
`--skip-readiness <name>`	Skip readiness check for a command
`--from-checkpoint <name>`	Resume from a named checkpoint (auto-skips earlier steps, restores state)

Test Files

Test files are TOML with a .test.toml extension. Each step must have exactly one of instruction, include_path, or include_glob.

Steps

Field	Description
`instruction`	Plain-English instruction sent to the agent
`include_path`	Path to another test file to inline
`include_glob`	Glob pattern to inline multiple test files
`step_timeout_secs`	Per-step timeout override (seconds)
`skip`	If `true`, step is skipped (counts as passed)
`checkpoint`	Checkpoint name — saves state after pass, restores if skipped

Includes and Shared Test Files

Steps can include other test files inline using include_path or include_glob. The included file's steps are expanded in place, creating a flat step list for execution.

Single file include

# login.test.toml
name = "Login flow"

[[steps]]
include_path = "_setup.test.toml"

[[steps]]
instruction = "Navigate to /login and submit credentials"

[[steps]]
instruction = "Verify redirect to dashboard"

The steps from _setup.test.toml are inserted at position 1, before the login steps. Paths are relative to the including file's directory.

Glob include

Include multiple files matching a glob pattern. Matched files are sorted alphabetically for deterministic order.

# full-suite.test.toml
name = "Full test suite"

[[steps]]
include_path = "_setup.test.toml"

[[steps]]
include_glob = "features/*.test.toml"

This includes _setup.test.toml first, then all *.test.toml files under features/ in alphabetical order.

Shared files with `_` prefix

Files prefixed with _ are excluded from automatic discovery (bugatti test without a path argument). This lets you create reusable building blocks that are only executed when included by other test files.

project/
  bugatti.config.toml
  _setup.test.toml          # shared — not discovered
  _teardown.test.toml       # shared — not discovered
  login.test.toml            # discovered — includes _setup
  checkout.test.toml         # discovered — includes _setup
  features/
    _auth-helpers.test.toml  # shared — not discovered
    signup.test.toml          # discovered

Example shared setup file:

# _setup.test.toml
name = "Shared setup"

[[steps]]
instruction = "Verify the health endpoint returns 200"

[[steps]]
instruction = "Clear test data from the database"

Nested includes

Included files can include other files. Bugatti expands everything into a flat step list with sequential IDs.

# _auth.test.toml
name = "Auth helpers"

[[steps]]
instruction = "Log in as test user"

[[steps]]
include_path = "_verify-session.test.toml"

Cycle detection

Bugatti detects circular includes and fails with a clear error showing the chain:

include cycle detected: root.test.toml -> auth.test.toml -> root.test.toml

Provenance

Each step tracks which file it came from. The console output shows this:

STEP 1/5 ... Verify health endpoint (from _setup.test.toml)
STEP 2/5 ... Clear test data (from _setup.test.toml)
STEP 3/5 ... Navigate to /login (from login.test.toml)

This makes it easy to identify which file to edit when a step fails.

Per-Test Overrides

Override provider settings for a specific test:

name = "Custom provider test"

[overrides.provider]
extra_system_prompt = "Be concise"
step_timeout_secs = 600
base_url = "http://localhost:5000"

Skipping Steps

Add skip = true to any step to bypass it during execution. Skipped steps count as passed, take zero time, and do not send anything to the agent.

[[steps]]
instruction = "Create account and complete onboarding"
skip = true

[[steps]]
instruction = "Configure billing with test card"
skip = true

[[steps]]
instruction = "Invite team member and verify email"

Console output when steps are skipped:

SKIP 1/3 ... Create account and complete onboarding (from ftue.test.toml)
SKIP 2/3 ... Configure billing with test card (from ftue.test.toml)
STEP 3/3 ... Invite team member and verify email (from ftue.test.toml)

When to use skip:

Iterative development — you've run the full suite and steps 1-4 pass. Now you're working on step 5. Mark 1-4 as skip = true so each run goes straight to step 5 without re-executing the passing steps.
Focusing on a failing step — a step deep in the suite fails. Skip everything before it to iterate faster on the fix.
Pairing with checkpoints — skip steps that set up state, and use checkpoints to restore that state instead (see below).

Disabling a skip — remove the line or comment it out with #:

[[steps]]
instruction = "Create account and complete onboarding"
#skip = true

Important: skipping a step does not undo its effects. If steps 1-3 set up database state and you skip them, that state won't exist unless you either run them first or restore it via a checkpoint.

Checkpoints

Checkpoints save and restore external state (databases, files, services) at step boundaries. Combined with skip = true, they let you jump to any point in a test suite without re-executing earlier steps.

The problem checkpoints solve

A 10-step FTUE (first-time user experience) test takes 15 minutes. Step 8 fails. You fix the code and re-run, but steps 1-7 execute again — another 12 minutes wasted. With checkpoints, you run once, save state after step 7, then on subsequent runs skip steps 1-7 and restore the checkpoint. Step 8 runs immediately against the saved state.

Setup

1. Add a [checkpoint] section to bugatti.config.toml:

[checkpoint]
save = "./scripts/checkpoint.sh save"
restore = "./scripts/checkpoint.sh restore"
timeout_secs = 180   # optional, default 120s

The save and restore fields are shell commands. They receive environment variables telling them which checkpoint to operate on.

2. Add checkpoint = "name" to steps in your test file:

name = "FTUE: Full onboarding flow"

[[steps]]
instruction = "Create account via signup form"
checkpoint = "after-signup"

[[steps]]
instruction = "Complete onboarding wizard"
checkpoint = "after-onboarding"

[[steps]]
instruction = "Configure billing with test card"
checkpoint = "after-billing"

[[steps]]
instruction = "Invite team member"

[[steps]]
instruction = "Verify team member received invite email"

Checkpoint names must be unique within a test file. Not every step needs a checkpoint — place them at meaningful state boundaries.

How save works

When a non-skipped step with checkpoint passes, bugatti runs the save command immediately after:

STEP 1/5 ... Create account via signup form (from ftue.test.toml)
  OK 1/5 (23.4s)
SAVE ....... checkpoint "after-signup"
OK ......... checkpoint "after-signup" saved
STEP 2/5 ... Complete onboarding wizard (from ftue.test.toml)
  OK 2/5 (45.1s)
SAVE ....... checkpoint "after-onboarding"
OK ......... checkpoint "after-onboarding" saved

Checkpoints are not saved when a step fails — there's no point saving broken state.

How restore works

When you mark steps as skip = true, bugatti looks at the skipped steps to find the last checkpoint before the first non-skipped step, then runs the restore command:

[[steps]]
instruction = "Create account via signup form"
checkpoint = "after-signup"
skip = true

[[steps]]
instruction = "Complete onboarding wizard"
checkpoint = "after-onboarding"
skip = true

[[steps]]
instruction = "Configure billing with test card"
checkpoint = "after-billing"
skip = true

[[steps]]
instruction = "Invite team member"

[[steps]]
instruction = "Verify team member received invite email"

Console output:

SKIP 1/5 ... Create account via signup form (from ftue.test.toml)
SKIP 2/5 ... Complete onboarding wizard (from ftue.test.toml)
SKIP 3/5 ... Configure billing with test card (from ftue.test.toml)
RESTORE .... checkpoint "after-billing"
OK ......... checkpoint "after-billing" restored
STEP 4/5 ... Invite team member (from ftue.test.toml)

Only the last checkpoint is restored — restoring "after-billing" already includes the state from "after-signup" and "after-onboarding".

Gap warning

If you skip steps after the last checkpoint, bugatti warns you that the restored state may be incomplete:

[[steps]]
instruction = "Create account via signup form"
checkpoint = "after-signup"
skip = true

[[steps]]
instruction = "Complete onboarding wizard"
skip = true                                    # no checkpoint!

[[steps]]
instruction = "Configure billing with test card"
skip = true                                    # no checkpoint!

[[steps]]
instruction = "Invite team member"

WARN ....... restoring checkpoint "after-signup" from step 1, but 2 step(s) after it were also skipped without checkpoints
RESTORE .... checkpoint "after-signup"
OK ......... checkpoint "after-signup" restored
STEP 4/5 ... Invite team member (from ftue.test.toml)

This means steps 2-3 were skipped but their effects aren't captured in the restored checkpoint. The test may fail because of missing state. Either add checkpoints to those steps or accept the gap.

Environment variables

Save and restore commands receive these environment variables:

Variable	Example	Description
`BUGATTI_CHECKPOINT_ID`	`after-onboarding`	The checkpoint name from the step
`BUGATTI_CHECKPOINT_PATH`	`.bugatti/checkpoints/after-onboarding/`	Directory for this checkpoint's data

The checkpoint directory is created automatically before the command runs. Your script decides what to put in it.

Checkpoint config reference

Field	Default	Description
`save`	required	Shell command to save a checkpoint
`restore`	required	Shell command to restore a checkpoint
`timeout_secs`	`120`	Timeout for save/restore commands (kills process on expiry)

Example checkpoint script

A checkpoint script that saves and restores a PostgreSQL database and an uploads directory:

#!/bin/bash
set -eu
action="${1:?usage: checkpoint.sh save|restore}"

case "$action" in
  save)
    pg_dump myapp_dev > "$BUGATTI_CHECKPOINT_PATH/db.sql"
    cp -r ./uploads "$BUGATTI_CHECKPOINT_PATH/uploads"
    echo "Saved DB + uploads for checkpoint $BUGATTI_CHECKPOINT_ID"
    ;;
  restore)
    dropdb --if-exists myapp_dev
    createdb myapp_dev
    psql -d myapp_dev -f "$BUGATTI_CHECKPOINT_PATH/db.sql"
    rm -rf ./uploads
    cp -r "$BUGATTI_CHECKPOINT_PATH/uploads" ./uploads
    echo "Restored DB + uploads for checkpoint $BUGATTI_CHECKPOINT_ID"
    ;;
esac

Disabling checkpoints

Comment out the checkpoint line on individual steps:

[[steps]]
instruction = "Create account via signup form"
#checkpoint = "after-signup"

Or remove the [checkpoint] section from bugatti.config.toml — steps with checkpoint will be ignored if no save/restore commands are configured.

Resuming from a checkpoint via CLI

Instead of manually adding skip = true to steps, use --from-checkpoint:

# Resume from the "after-billing" checkpoint — skips steps 1-3 automatically
bugatti test ftue.test.toml --from-checkpoint after-billing

This auto-skips all steps up to and including the step with checkpoint = "after-billing", restores the checkpoint, and executes the remaining steps. No need to edit the TOML file.

If the checkpoint name doesn't exist, bugatti lists the available checkpoints:

ERROR: checkpoint "typo" not found. Available: after-signup, after-onboarding, after-billing

Typical workflow

# 1. First run — all steps execute, checkpoints saved at each boundary
bugatti test ftue.test.toml

# 2. Step 4 fails. Fix the code.

# 3. Re-run from the last checkpoint before step 4
bugatti test ftue.test.toml --from-checkpoint after-billing

# 4. Step 4 passes. Do a clean run to confirm everything works end-to-end.
bugatti test ftue.test.toml

Examples

Working examples in examples/:

Example	What it tests	Key features
`static-html`	Local HTML page via browser	No server, browser testing
`python-flask`	Flask API + UI	Long-lived server, readiness URL, strict warnings
`node-express`	Express TypeScript API + UI	pnpm install, shared setup via `_` prefix, multi-port, test discovery
`rust-cli`	Rust CLI tool	Short-lived build command, per-step timeout

Exit Codes

Code	Meaning
`0`	All steps passed
`1`	One or more steps failed
`2`	Configuration or parse error
`3`	Provider or readiness failure
`4`	Step timed out
`5`	Run interrupted (Ctrl+C)
`6`	Setup command failed

Docs

Full documentation at bugatti.dev. LLM-friendly reference at bugatti.dev/llms.txt.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
scripts/ralph		scripts/ralph
src		src
tasks		tasks
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs
install.sh		install.sh
prd.json		prd.json
progress.txt		progress.txt

Folders and files

Latest commit

History

Repository files navigation