Benchctl

A CLI framework for orchestrating benchmarks across distributed or local setups. Designed for research benchmarking scenarios with workflow coordination, data collection, visualization, and result management.

Motivation

As a part of my studies and work, I had to write many different benchmarks. And consistently, I lost a lot of time in plumbing work:

Managing a bunch of different ssh connections to different VMs
Copying files over different filesystems
Running commands everywhere
Plotting data
Remembering which results belong to certain parameters used for the benchmark runs
Managing metadata

So I decided to write a framework that would take care of all of this for me. It ended up turning into a specialized "workflow engine" of sorts. I also looked into Apache Airflow, but it was too complex for this use case.

Features

Distributed Execution: Run benchmarks across multiple remote hosts or locally
YAML Configuration: Declarative workflow definition with hosts, stages, and plots
Health Checks: Built-in readiness detection (port, HTTP, file, process, command)
Data Collection: Automatic file collection via SCP with schema validation
Background Stages: Keep monitoring commands running alongside your benchmark until all your non-background stages finish
Visualization: Auto-generated plots (time series, histograms, boxplots)
Metadata Tracking: Custom metadata support for benchmark runs.
Result Management: Organized storage with run IDs and comprehensive metadata, so you always know exactly which parameters and configuration was used for a specific benchmark run.
Append Metadata from Stages: Stages can emit JSON on stdout and append it to run metadata automatically
Live Command Streaming: Stage commands stream directly to your terminal with preserved ANSI colors locally and over SSH

Note: benchctl is under active development. There is currently no commitment to API stability. Features, flags, and file formats may change in future releases until I release v1.0.0.

Installation

Quick Install (Linux/macOS)

curl -sSL https://raw.githubusercontent.com/luccadibe/benchctl/main/install-benchctl.sh | bash

Or download and run manually:

wget https://raw.githubusercontent.com/luccadibe/benchctl/main/install-benchctl.sh
chmod +x install-benchctl.sh
./install-benchctl.sh

Using Arch Linux

yay -S benchctl-bin

Manual Installation

Go to the releases page and download the latest binary for your OS.

Quick Start

Create Configuration (benchmark.yaml):

benchmark:
  name: my-benchmark
  output_dir: ./results

hosts:
  local: {}  # Local execution
  server1:
    ip: 192.168.1.100
    username: user
    key_file: ~/.ssh/id_rsa

stages:
  - name: setup
    host: local
    command: echo "Setting up benchmark..."
    
  - name: start-server
    host: server1
    command: docker run -d -p 8080:8080 my-server:latest
    health_check:
      type: port
      target: "8080"
      timeout: 30s

  - name: run-load-test
    host: local
    script: load-generator.sh
    outputs:
      - name: results
        remote_path: /tmp/results.csv
        data_schema:
          format: csv
          columns:
            - name: timestamp
              type: timestamp
              unit: s
              format: unix
            - name: latency_ms
              type: float
              unit: ms

plots:
  - name: latency-over-time
    title: Request Latency Over Time
    source: results
    type: time_series
    x: timestamp
    y: latency_ms
    engine: seaborn
    format: png
    options:
      dpi: 150
      width_px: 1200
      height_px: 600
      x_label_angle: 45
      x_timestamp_format: medium

Run Benchmark:

benchctl run --config benchmark.yaml

View Results:

# Results saved to ./results/1/...
# Check metadata.json for run details
# Generated plots and collected files are stored directly in ./results/1/

Plotting engines

Default engine: seaborn (Python) via uv run with an embedded script (PEP 723).
Alternative: gonum (pure Go).

Python/uv requirements (for seaborn)

You need python >= 3.10 and uv available on your PATH.
First run downloads Python deps into uv’s cache; subsequent runs are fast.
No virtualenvs or repo Python files required; everything is embedded and invoked via uv run.

Configuration Reference

Hosts

Define execution environments:

hosts:
  local: {}  # Local host
  remote:
    ip: 10.0.0.1
    username: benchmark
    key_file: ~/.ssh/benchmark_key
    password: optional_password

Stages

Stages are sequential workflow steps, they are executed in the order they are defined and they must have a unique name.

stages:
  - name: build
    host: local
    command: make build

  - name: deploy
    host: remote
    script: deploy.sh
    health_check:
      type: http
      target: "http://localhost:8080/health"

  - name: monitor-resources
    host: local
    command: ./scripts/monitor.sh
    background: true  # keeps running until the workflow shuts it down safely
    
  - name: load-test
    host: local
    script: load-test.sh
    outputs:
      - name: metrics
        remote_path: /tmp/metrics.csv

Shell execution

Stages run through a shell command. Set benchmark.shell to control it (the default is bash -lic), which loads login + interactive environment (PATH, JAVA_HOME, etc). Override per stage with stages[].shell.

Note: You cannot pass arguments to a script like script.sh <args>. Use command instead.

Hosts and multi-host stages

Use host for a single host or hosts for multiple hosts. If neither is set, the stage runs on local.
Hosts in hosts execute sequentially in the listed order.
Outputs collected from multi-host stages are suffixed as outputName__host.ext.
If append_metadata is enabled with multiple hosts, only the first host’s output is used (a warning is logged).

Example:

stages:
  - name: run-everywhere
    hosts: [vm1, vm2]
    command: uname -a

Skipping stages

Set stages[].skip: true to skip a stage.
Or pass benchctl run --skip <stage-name> multiple times (CLI overrides config).
The metadata.json that is stored in each run directory will contain the exact stages that were executed, so you can easily see which stages were executed and which were skipped.

Background stages run alongside the rest of the workflow. benchctl keeps them alive until the final non-background stage finishes, then sends SIGTERM to the stage's process group, waits BackgroundTerminationGrace (2 seconds by default), and finally SIGKILL if they are still running. This uses setsid to start a new process group, so the entire background task tree is terminated reliably. Their outputs are collected after shutdown, so its ideal for monitoring tasks, like resource usage monitoring. Background stages cannot use append_metadata.

Data Schema

Define CSV column types for validation and plotting:

data_schema:
  format: csv
  columns:
    - name: timestamp
      type: timestamp
      unit: s
      format: unix   # optional; supported: unix, unix_ms, unix_us, unix_ns, rfc3339, rfc3339_nano, iso8601
    - name: latency_ms
      type: float
      unit: ms
    - name: status
      type: string

Plots

Auto-generated visualizations:

plots:
  - name: latency-histogram
    title: Latency Distribution
    source: metrics
    type: histogram
    x: latency_ms
    
  - name: throughput-timeseries
    title: Requests Over Time
    source: metrics
    type: time_series
    x: timestamp
    y: requests_per_second
    groupby: pod_name
    engine: seaborn

Set groupby (seaborn engine only) to split plots by a categorical column.
Leave engine unset to use the seaborn backend; set to gonum for Go rendered (no python needed).

Usage

Basic Commands

# Run benchmark
benchctl run --config benchmark.yaml

# Skip stages by name
benchctl run --config benchmark.yaml --skip setup --skip warmup

# Add custom metadata
benchctl run --config benchmark.yaml --metadata "someFeature"="true" --metadata "someOtherFeature"="false"

# Pass environment variables to stages
benchctl run --config benchmark.yaml -e BRANCH=main -e LG_MAX_RPS=2000

# Inspect a run
benchctl inspect <run-id>

Metadata

Append JSON metadata directly from a stage by enabling append_metadata.

In your configuration, you can add a stage that outputs a some metadata in JSON format to stdout:

stages:
  - name: analyse
    host: local
    command: |
      uv run python - <<'PY'
      # /// script
      # requires-python = ">=3.10"
      # dependencies = []
      # ///
      import json
      # Compute something and print a single JSON object
      print(json.dumps({
          "latency_p50_ms": "123.4",
          "notes": "baseline run"
      }))
      PY
    append_metadata: true

At runtime, the JSON keys/values are stringified and merged into the run’s metadata under custom. This is helpful to annotate runs with additional metadata that is dependent on the stage's output. It can be used to enable easy comparison of runs, for example to compare performance statistics. You can run your data-analysis script and append the metadata to the run.

Stage Environment Variables

During stage execution, the following environment variables are exported for commands/scripts:

BENCHCTL_RUN_ID: the current run ID
BENCHCTL_RUN_DIR: absolute path to the run directory (e.g., ./results/1)
BENCHCTL_OUTPUT_DIR: benchmark output root (from benchmark.output_dir)
BENCHCTL_CONFIG_PATH: set if provided in the environment when invoking benchctl
BENCHCTL_BIN: absolute path to the running benchctl binary

Use these to locate inputs/outputs or to parameterize your scripts.

Examples

See the examples/ directory for complete benchmark configurations.

Local container testing

Roadmap

Add a local Web UI for viewing benchmark results and plots.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.vscode		.vscode
cmd		cmd
docs		docs
examples/local_container		examples/local_container
internal		internal
test		test
ui		ui
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
LICENSE.txt		LICENSE.txt
README.md		README.md
go.mod		go.mod
go.sum		go.sum
install-benchctl.sh		install-benchctl.sh
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchctl

Motivation

Features

Installation

Quick Install (Linux/macOS)

Using Arch Linux

Manual Installation

Quick Start

Plotting engines

Python/uv requirements (for seaborn)

Configuration Reference

Hosts

Stages

Shell execution

Hosts and multi-host stages

Skipping stages

Data Schema

Plots

Usage

Basic Commands

Metadata

Stage Environment Variables

Examples

Roadmap

License

About

Uh oh!

Releases 7

Packages

Languages

License

luccadibe/benchctl

Folders and files

Latest commit

History

Repository files navigation

Benchctl

Motivation

Features

Installation

Quick Install (Linux/macOS)

Using Arch Linux

Manual Installation

Quick Start

Plotting engines

Python/uv requirements (for seaborn)

Configuration Reference

Hosts

Stages

Shell execution

Hosts and multi-host stages

Skipping stages

Data Schema

Plots

Usage

Basic Commands

Metadata

Stage Environment Variables

Examples

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages