GitHub - sauravvenkat/forkline: Forkline is a replay-first tracing and diffing library for agentic AI workflows that lets you deterministically reproduce, fork, and compare agent runs to find exactly where behavior diverged.

Forkline is a local-first, replay-first tracing and diffing library for agentic AI workflows.

Its purpose is simple and strict:

Make agent runs reproducible, inspectable, and diffable.

Forkline treats nondeterminism as something to be controlled, not merely observed.

Why Forkline exists

Modern agentic systems fail in a frustrating way:

The same prompt behaves differently on different days
Tool calls change silently
Debugging becomes guesswork
CI becomes flaky or meaningless

Logs and dashboards tell you that something changed.
Forkline is built to tell you where, when, and why.

What Forkline does

Forkline allows you to:

Record an agent run as a deterministic, local artifact
Replay that run without re-invoking the LLM
Diff two runs and detect the first point of divergence
Capture tool calls safely with deterministic redaction
Use agent workflows in CI without network calls or flakiness

This turns agent behavior into something you can reason about like code.

Quick Start

# Clone and setup
cd forkline
source dev.env

# Run the example
python examples/minimal.py

# Inspect the recorded run
python scripts/inspect_runs.py

See QUICKSTART_RECORDING_V0.md for full getting started guide.

Design principles

Forkline is intentionally opinionated.

Replay-first, not dashboards-first
Determinism over probabilistic insight
Local-first artifacts
Diff over metrics
Explicit schemas over implicit behavior

If a feature does not help reproduce, replay, or diff an agent run, it does not belong in Forkline.

Security & Data Redaction

Forkline is designed to be safe by default when handling sensitive data.

Core invariant

By default, Forkline artifacts MUST NOT contain recoverable sensitive user, customer, or proprietary data.

This means:

No raw LLM prompts or responses are persisted by default
Secrets are NEVER written to disk in any mode
PII and customer data are redacted before persistence
Redaction happens at capture time, before any disk write

What IS recorded (SAFE mode)

Forkline preserves everything needed for replay and diffing:

Step ordering and control flow
Tool and model identifiers
Timestamps and execution metadata
Stable cryptographic hashes of redacted values
Structural shape of inputs/outputs

This enables deterministic replay, accurate diffing, and forensic debugging — without exposing sensitive data.

Escalation modes

For development and debugging, Forkline supports explicit opt-in modes:

SAFE (default): Production-safe, full redaction
DEBUG: Local development, raw values persisted
ENCRYPTED_DEBUG: Encrypted payloads for break-glass production debugging

Full policy

For the complete security design and redaction mechanisms, see:

👉 docs/REDACTION_POLICY.md

Why CLI-first

Forkline is CLI-first by design, not by convenience.

Agent debugging and reproducibility are developer workflows.
They live in terminals, CI pipelines, local machines, and code reviews — not dashboards.

Determinism and scriptability

CLI commands are composable, automatable, and repeatable.

This makes Forkline usable in:

CI pipelines
test suites
local debugging loops
regression checks

If it can’t be scripted, it can’t be trusted as infrastructure.

Local-first by default

A CLI enforces Forkline’s local-first philosophy:

artifacts live on disk
runs replay offline
no hidden network dependencies
no opaque browser state

This keeps behavior inspectable and failure modes obvious.

Diff is terminal-native

Diffing is already how developers reason about change:

git diff
pytest failures
compiler diagnostics
performance regressions

Forkline extends this mental model to agent behavior.

A CLI makes Forkline additive to existing tooling, not a replacement.

Avoiding dashboard gravity

Dashboards optimize for:

aggregation over root cause
real-time metrics over replayability
visualization over determinism

Forkline explicitly avoids this gravity.

If a feature requires a UI to be understandable, it is usually hiding complexity rather than exposing truth.

UIs can come later — CLIs must come first

Forkline does not reject UIs.
It rejects UI-first design.

The CLI defines the real API surface and semantic contract. Any future UI must be a thin layer on top — never the other way around.

Forkline is CLI-first because reproducibility, diffing, and trust are terminal-native problems.

What Forkline is NOT

Forkline explicitly does not aim to be:

An evaluation or benchmarking framework
Prompt engineering or prompt optimization tooling
A hosted SaaS or dashboard product
A generic “AI observability” platform

Forkline is a debugging and reproducibility tool, not an analytics product.

Roadmap

Forkline follows a disciplined, execution-first roadmap.

The v0 series focuses on correctness and determinism, not polish.

Deterministic run recording
Offline replay engine
First-divergence diffing
Minimal CLI (run, replay, diff)
CI-friendly deterministic mode

The canonical roadmap and design contract live here:

👉 docs/ROADMAP.md

Status

Forkline is early-stage and under active development.

APIs are expected to change until v1.0.
Feedback is welcome, especially around replay semantics and diffing behavior.

License

Forkline is licensed under the Apache 2.0 License.

Philosophy (one sentence)

Forkline exists because “it changed” is not a useful debugging answer.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
forkline		forkline
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dev.env		dev.env
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Why Forkline exists

What Forkline does

Quick Start

Design principles

Security & Data Redaction

Core invariant

What IS recorded (SAFE mode)

Escalation modes

Full policy

Why CLI-first

Determinism and scriptability

Local-first by default

Diff is terminal-native

Avoiding dashboard gravity

UIs can come later — CLIs must come first

What Forkline is NOT

Roadmap

Status

License

Philosophy (one sentence)

About

Uh oh!

Releases 1

Packages

Languages

License

sauravvenkat/forkline

Folders and files

Latest commit

History

Repository files navigation

Why Forkline exists

What Forkline does

Quick Start

Design principles

Security & Data Redaction

Core invariant

What IS recorded (SAFE mode)

Escalation modes

Full policy

Why CLI-first

Determinism and scriptability

Local-first by default

Diff is terminal-native

Avoiding dashboard gravity

UIs can come later — CLIs must come first

What Forkline is NOT

Roadmap

Status

License

Philosophy (one sentence)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages