A CLI framework for orchestrating benchmarks across distributed or local setups. Designed for research benchmarking scenarios with workflow coordination, data collection, visualization, and result management.
As a part of my studies and work, I had to write many different benchmarks. And consistently, I lost a lot of time in plumbing work:
- Managing a bunch of different ssh connections to different VMs
- Copying files over different filesystems
- Running commands everywhere
- Plotting data
- Remembering which results belong to certain parameters used for the benchmark runs
- Managing metadata
So I decided to write a framework that would take care of all of this for me. It ended up turning into a specialized "workflow engine" of sorts. I also looked into Apache Airflow, but it was too complex for this use case.
- Distributed Execution: Run benchmarks across multiple remote hosts or locally
- YAML Configuration: Declarative workflow definition with hosts, stages, and plots
- Health Checks: Built-in readiness detection (port, HTTP, file, process, command)
- Data Collection: Automatic file collection via SCP with schema validation
- Background Stages: Keep monitoring commands running alongside your benchmark until all your non-background stages finish
- Visualization: Auto-generated plots (time series, histograms, boxplots)
- Metadata Tracking: Custom metadata support for benchmark runs.
- Result Management: Organized storage with run IDs and comprehensive metadata, so you always know exactly which parameters and configuration was used for a specific benchmark run.
- Append Metadata from Stages: Stages can emit JSON on stdout and append it to run metadata automatically
- Live Command Streaming: Stage commands stream directly to your terminal with preserved ANSI colors locally and over SSH
Note:
benchctlis under active development. There is currently no commitment to API stability. Features, flags, and file formats may change in future releases until I release v1.0.0.
curl -sSL https://raw.githubusercontent.com/luccadibe/benchctl/main/install-benchctl.sh | bashOr download and run manually:
wget https://raw.githubusercontent.com/luccadibe/benchctl/main/install-benchctl.sh
chmod +x install-benchctl.sh
./install-benchctl.shyay -S benchctl-binGo to the releases page and download the latest binary for your OS.
- Create Configuration (
benchmark.yaml):
benchmark:
name: my-benchmark
output_dir: ./results
hosts:
local: {} # Local execution
server1:
ip: 192.168.1.100
username: user
key_file: ~/.ssh/id_rsa
stages:
- name: setup
host: local
command: echo "Setting up benchmark..."
- name: start-server
host: server1
command: docker run -d -p 8080:8080 my-server:latest
health_check:
type: port
target: "8080"
timeout: 30s
- name: run-load-test
host: local
script: load-generator.sh
outputs:
- name: results
remote_path: /tmp/results.csv
data_schema:
format: csv
columns:
- name: timestamp
type: timestamp
unit: s
format: unix
- name: latency_ms
type: float
unit: ms
plots:
- name: latency-over-time
title: Request Latency Over Time
source: results
type: time_series
x: timestamp
y: latency_ms
engine: seaborn
format: png
options:
dpi: 150
width_px: 1200
height_px: 600
x_label_angle: 45
x_timestamp_format: medium- Run Benchmark:
benchctl run --config benchmark.yaml- View Results:
# Results saved to ./results/1/...
# Check metadata.json for run details
# Generated plots and collected files are stored directly in ./results/1/- Default engine:
seaborn(Python) viauv runwith an embedded script (PEP 723). - Alternative:
gonum(pure Go).
- You need
python >= 3.10and uv available on your PATH. - First run downloads Python deps into uv’s cache; subsequent runs are fast.
- No virtualenvs or repo Python files required; everything is embedded and invoked via
uv run.
Define execution environments:
hosts:
local: {} # Local host
remote:
ip: 10.0.0.1
username: benchmark
key_file: ~/.ssh/benchmark_key
password: optional_passwordStages are sequential workflow steps, they are executed in the order they are defined and they must have a unique name.
stages:
- name: build
host: local
command: make build
- name: deploy
host: remote
script: deploy.sh
health_check:
type: http
target: "http://localhost:8080/health"
- name: monitor-resources
host: local
command: ./scripts/monitor.sh
background: true # keeps running until the workflow shuts it down safely
- name: load-test
host: local
script: load-test.sh
outputs:
- name: metrics
remote_path: /tmp/metrics.csvStages run through a shell command. Set benchmark.shell to control it (the default is bash -lic), which loads login + interactive environment (PATH, JAVA_HOME, etc). Override per stage with stages[].shell.
Note: You cannot pass arguments to a script like
script.sh <args>. Usecommandinstead.
- Use
hostfor a single host orhostsfor multiple hosts. If neither is set, the stage runs onlocal. - Hosts in
hostsexecute sequentially in the listed order. - Outputs collected from multi-host stages are suffixed as
outputName__host.ext. - If
append_metadatais enabled with multiple hosts, only the first host’s output is used (a warning is logged).
Example:
stages:
- name: run-everywhere
hosts: [vm1, vm2]
command: uname -a- Set
stages[].skip: trueto skip a stage. - Or pass
benchctl run --skip <stage-name>multiple times (CLI overrides config). - The
metadata.jsonthat is stored in each run directory will contain the exact stages that were executed, so you can easily see which stages were executed and which were skipped.
Background stages run alongside the rest of the workflow. benchctl keeps them alive until the final non-background stage finishes, then sends SIGTERM to the stage's process group, waits BackgroundTerminationGrace (2 seconds by default), and finally SIGKILL if they are still running.
This uses setsid to start a new process group, so the entire background task tree is terminated reliably.
Their outputs are collected after shutdown, so its ideal for monitoring tasks, like resource usage monitoring.
Background stages cannot use append_metadata.
Define CSV column types for validation and plotting:
data_schema:
format: csv
columns:
- name: timestamp
type: timestamp
unit: s
format: unix # optional; supported: unix, unix_ms, unix_us, unix_ns, rfc3339, rfc3339_nano, iso8601
- name: latency_ms
type: float
unit: ms
- name: status
type: stringAuto-generated visualizations:
plots:
- name: latency-histogram
title: Latency Distribution
source: metrics
type: histogram
x: latency_ms
- name: throughput-timeseries
title: Requests Over Time
source: metrics
type: time_series
x: timestamp
y: requests_per_second
groupby: pod_name
engine: seaborn- Set
groupby(seaborn engine only) to split plots by a categorical column. - Leave
engineunset to use the seaborn backend; set togonumfor Go rendered (no python needed).
# Run benchmark
benchctl run --config benchmark.yaml
# Skip stages by name
benchctl run --config benchmark.yaml --skip setup --skip warmup
# Add custom metadata
benchctl run --config benchmark.yaml --metadata "someFeature"="true" --metadata "someOtherFeature"="false"
# Pass environment variables to stages
benchctl run --config benchmark.yaml -e BRANCH=main -e LG_MAX_RPS=2000
# Inspect a run
benchctl inspect <run-id>Append JSON metadata directly from a stage by enabling append_metadata.
In your configuration, you can add a stage that outputs a some metadata in JSON format to stdout:
stages:
- name: analyse
host: local
command: |
uv run python - <<'PY'
# /// script
# requires-python = ">=3.10"
# dependencies = []
# ///
import json
# Compute something and print a single JSON object
print(json.dumps({
"latency_p50_ms": "123.4",
"notes": "baseline run"
}))
PY
append_metadata: trueAt runtime, the JSON keys/values are stringified and merged into the run’s metadata under custom.
This is helpful to annotate runs with additional metadata that is dependent on the stage's output.
It can be used to enable easy comparison of runs, for example to compare performance statistics.
You can run your data-analysis script and append the metadata to the run.
During stage execution, the following environment variables are exported for commands/scripts:
BENCHCTL_RUN_ID: the current run IDBENCHCTL_RUN_DIR: absolute path to the run directory (e.g., ./results/1)BENCHCTL_OUTPUT_DIR: benchmark output root (frombenchmark.output_dir)BENCHCTL_CONFIG_PATH: set if provided in the environment when invoking benchctlBENCHCTL_BIN: absolute path to the running benchctl binary
Use these to locate inputs/outputs or to parameterize your scripts.
See the examples/ directory for complete benchmark configurations.
- Local container testing
- Add a local Web UI for viewing benchmark results and plots.
MIT