Skip to content

Forkline is a replay-first tracing and diffing library for agentic AI workflows that lets you deterministically reproduce, fork, and compare agent runs to find exactly where behavior diverged.

License

Notifications You must be signed in to change notification settings

sauravvenkat/forkline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forkline

Forkline is a local-first, replay-first tracing and diffing library for agentic AI workflows.

Its purpose is simple and strict:

Make agent runs reproducible, inspectable, and diffable.

Forkline treats nondeterminism as something to be controlled, not merely observed.


Why Forkline exists

Modern agentic systems fail in a frustrating way:

  • The same prompt behaves differently on different days
  • Tool calls change silently
  • Debugging becomes guesswork
  • CI becomes flaky or meaningless

Logs and dashboards tell you that something changed.
Forkline is built to tell you where, when, and why.


What Forkline does

Forkline allows you to:

  • Record an agent run as a deterministic, local artifact
  • Replay that run without re-invoking the LLM
  • Diff two runs and detect the first point of divergence
  • Capture tool calls safely with deterministic redaction
  • Use agent workflows in CI without network calls or flakiness

This turns agent behavior into something you can reason about like code.


Quick Start

# Clone and setup
cd forkline
source dev.env

# Run the example
python examples/minimal.py

# Inspect the recorded run
python scripts/inspect_runs.py

See QUICKSTART_RECORDING_V0.md for full getting started guide.


Design principles

Forkline is intentionally opinionated.

  • Replay-first, not dashboards-first
  • Determinism over probabilistic insight
  • Local-first artifacts
  • Diff over metrics
  • Explicit schemas over implicit behavior

If a feature does not help reproduce, replay, or diff an agent run, it does not belong in Forkline.


Security & Data Redaction

Forkline is designed to be safe by default when handling sensitive data.

Core invariant

By default, Forkline artifacts MUST NOT contain recoverable sensitive user, customer, or proprietary data.

This means:

  • No raw LLM prompts or responses are persisted by default
  • Secrets are NEVER written to disk in any mode
  • PII and customer data are redacted before persistence
  • Redaction happens at capture time, before any disk write

What IS recorded (SAFE mode)

Forkline preserves everything needed for replay and diffing:

  • Step ordering and control flow
  • Tool and model identifiers
  • Timestamps and execution metadata
  • Stable cryptographic hashes of redacted values
  • Structural shape of inputs/outputs

This enables deterministic replay, accurate diffing, and forensic debugging — without exposing sensitive data.

Escalation modes

For development and debugging, Forkline supports explicit opt-in modes:

  • SAFE (default): Production-safe, full redaction
  • DEBUG: Local development, raw values persisted
  • ENCRYPTED_DEBUG: Encrypted payloads for break-glass production debugging

Full policy

For the complete security design and redaction mechanisms, see:

👉 docs/REDACTION_POLICY.md


Why CLI-first

Forkline is CLI-first by design, not by convenience.

Agent debugging and reproducibility are developer workflows.
They live in terminals, CI pipelines, local machines, and code reviews — not dashboards.

Determinism and scriptability

CLI commands are composable, automatable, and repeatable.

This makes Forkline usable in:

  • CI pipelines
  • test suites
  • local debugging loops
  • regression checks

If it can’t be scripted, it can’t be trusted as infrastructure.


Local-first by default

A CLI enforces Forkline’s local-first philosophy:

  • artifacts live on disk
  • runs replay offline
  • no hidden network dependencies
  • no opaque browser state

This keeps behavior inspectable and failure modes obvious.


Diff is terminal-native

Diffing is already how developers reason about change:

  • git diff
  • pytest failures
  • compiler diagnostics
  • performance regressions

Forkline extends this mental model to agent behavior.

A CLI makes Forkline additive to existing tooling, not a replacement.


Avoiding dashboard gravity

Dashboards optimize for:

  • aggregation over root cause
  • real-time metrics over replayability
  • visualization over determinism

Forkline explicitly avoids this gravity.

If a feature requires a UI to be understandable, it is usually hiding complexity rather than exposing truth.


UIs can come later — CLIs must come first

Forkline does not reject UIs.
It rejects UI-first design.

The CLI defines the real API surface and semantic contract. Any future UI must be a thin layer on top — never the other way around.

Forkline is CLI-first because reproducibility, diffing, and trust are terminal-native problems.


What Forkline is NOT

Forkline explicitly does not aim to be:

  • An evaluation or benchmarking framework
  • Prompt engineering or prompt optimization tooling
  • A hosted SaaS or dashboard product
  • A generic “AI observability” platform

Forkline is a debugging and reproducibility tool, not an analytics product.


Roadmap

Forkline follows a disciplined, execution-first roadmap.

The v0 series focuses on correctness and determinism, not polish.

  1. Deterministic run recording
  2. Offline replay engine
  3. First-divergence diffing
  4. Minimal CLI (run, replay, diff)
  5. CI-friendly deterministic mode

The canonical roadmap and design contract live here:

👉 docs/ROADMAP.md


Status

Forkline is early-stage and under active development.

APIs are expected to change until v1.0.
Feedback is welcome, especially around replay semantics and diffing behavior.


License

Forkline is licensed under the Apache 2.0 License.


Philosophy (one sentence)

Forkline exists because “it changed” is not a useful debugging answer.

About

Forkline is a replay-first tracing and diffing library for agentic AI workflows that lets you deterministically reproduce, fork, and compare agent runs to find exactly where behavior diverged.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages