-
Notifications
You must be signed in to change notification settings - Fork 0
Performance Baselines
I use this document as the frozen performance-baseline contract for the current bootstrap toolchain.
Contract version: perfbase.v1
I freeze deterministic throughput floors (ops/sec) for representative CLI workloads in the pinned Linux x86-64 bootstrap environment.
The gated workloads are:
-
verifyontests/valid_min.l0 -
buildontests/valid_min.l0 -
runonvalid_minimage (2-arg arithmetic kernel) -
runonvalid_sysv_abi_sum6_loweredimage (6-arg SysV kernel) -
build-elfontests/valid_sysv_abi_sum6_lowered.l0 -
mapcat,schemacat,tracecat,tracejoinon deterministic trace artifacts
I enforce this contract in:
tests/performance_gates.sh
That harness is integrated into tests/run.sh and therefore enforced by make test.
I also run this automatically in GitHub Actions:
-
.github/workflows/performance.ymlon nightly schedule and on version tags (v*). - CI uses a blocking smoke gate (
tests/ci_smoke_bench.sh) plus a non-blocking full-gate observation run (tests/performance_gates.sh) because hosted runner throughput is less stable than my pinned local baseline environment.
verify.valid_min >= 1800 ops/sbuild.valid_min >= 1300 ops/srun.add >= 2400 ops/srun.sum6 >= 2200 ops/sbuild-elf.sum6 >= 1100 ops/smapcat.trace_map >= 2500 ops/sschemacat.trace_schema >= 2500 ops/stracecat.trace_bin >= 2500 ops/stracejoin.trace_bin+map >= 2300 ops/s
I support a CI profile for hosted runners by setting:
L0_PERF_PROFILE=ci
This uses conservative floors intended for shared GitHub-hosted runner variance. My local/pinned production profile remains the default when this variable is not set.
- Cross-machine or cross-CPU performance comparability guarantees.
- Full benchmarking methodology for release marketing claims.
- Auto-tuning of thresholds; v1 thresholds are pinned and explicitly versioned.
For apples-to-apples comparison snapshots, I now capture reproducibility metadata and confidence hints in:
docs/PERFORMANCE_COMPARISON_APPLES_TO_APPLES.mddocs/PERFORMANCE_COMPARISON_APPLES_TO_APPLES.json
I can control this benchmark behavior with:
-
L0_A2A_PIN_CPU(optional CPU affinity, e.g.0) L0_A2A_BUILD_ITERSL0_A2A_BUILD_SAMPLESL0_A2A_RUNTIME_ITERSL0_A2A_RUNTIME_SAMPLESL0_A2A_WARMUP_RUNS
The report includes:
- CPU/model/topology metadata
- sample counts and warmup configuration
- median throughput values
- CI95 on runtime sample means
- outlier policy controls (
L0_A2A_TRIM_COUNT) used for CI computation when enough samples are present - stability warning threshold control (
L0_A2A_RUNTIME_CI95_PCT_WARN)
- How-To-Write-L0
- Language-Reference
- Instruction-Set
- CLI-and-Compiler-Spec
- Implementable-Spec
- Command-Reference
- Examples-Catalog
- LLM-Quick-Reference
- Opcode-Examples
- LLM-Doc-Index