perf: node-native benchmark runner (+ BMF output)#388
perf: node-native benchmark runner (+ BMF output)#388charliecreates[bot] wants to merge 10 commits intomainfrom
Conversation
|
orrery preview: https://pr-388.orrery-c4f.pages.dev/ |
|
@CharlieHelps please review this PR |
There was a problem hiding this comment.
Key issues: the CLI and built-in suite resolution are currently process.cwd()-dependent and will break when run outside the repo root, and fixture resolution also uses process.cwd() which can lead to non-deterministic or unsafe kernel loading. normalizeFixtureRefs() uses an unsafe cast that makes it brittle when validate() is bypassed. The node-native runner’s timing conversion can lose precision for long durations, and meta-kernel symbol/value mismatches are silently truncated, which will produce confusing resolution behavior.
Summary of changes
What changed
New benchmark CLI entrypoint
- Added a root-level
benchscript that builds@rybosome/tspice-perf-analysisand runspackages/perf-analysis/dist/cli/bench.js. - Introduced
packages/perf-analysis/src/cli/bench.tsimplementing a minimal argument parser for--backend,--suite, and--outDir, with built-in suite resolution underpackages/perf-analysis/src/suites/yaml/v1/.
v1 contract implementation (parse/validate/normalize)
- Implemented YAML parsing via
yamlpackage (parseYaml.ts). - Added runtime validation for the v1 benchmark contract (
validate.ts), including fixture reference shorthands. - Added structural normalization of fixture references (
normalizeFixtureRefs.ts) to canonicalFixtureRefshapes. - Expanded
FixtureRefV1to be YAML-friendly (types.ts).
Node-native benchmark runner + outputs
- Added
packages/perf-analysis/src/runners/node-native/*:runNodeNativeBench.ts: warmup + measured iterations, per-op latency sampling, p50/p95 + throughput, writesraw.jsonandbmf.json.dispatch.ts: maps suitecallstrings to backend invocations.metaKernel.ts: expands.tmmeta-kernels safely with optional directory restriction and sanitization.stats.ts: quantile helper.
- Re-exported node-native runner types and
runNodeNativeBenchfrompackages/perf-analysis/src/runners/index.ts.
Fixture resolution
- Implemented
$FIXTURESpath resolution, fixture id resolution underfixtureRoots, and fixture-pack directory aliasing (dir -><dir>/<basename>.tm) inresolveFixtureRef.ts. - Extended
ResolvedFixtureRefwith optionalrestrictToDirto constrain meta-kernel expansion.
Docs + suite
- Updated
packages/perf-analysis/README.mdwith instructions forpnpm bench --backend node-native --suite microand output locations. - Added suite file
packages/perf-analysis/src/suites/yaml/v1/micro.yml.
Package dependencies
- Added dependencies to
packages/perf-analysis/package.json:@rybosome/tspiceandyaml(and corresponding lockfile updates).
packages/perf-analysis/src/contracts/benchmark-contract/v1/normalizeFixtureRefs.ts
Show resolved
Hide resolved
packages/perf-analysis/src/contracts/benchmark-contract/v1/normalizeFixtureRefs.ts
Show resolved
Hide resolved
packages/perf-analysis/src/shared/fixtures/resolveFixtureRef.ts
Outdated
Show resolved
Hide resolved
|
@CharlieHelps I need your help preparing this PR to merge. You should read the diff, related issues/docs, and PR reviews closely and then ensure ALL of the following is true:
It's your job to push commits to the PR to complete all of these tasks. You MUST wait to confirm that ALL of the above is true before stopping. I expect to be able to merge the PR after you are done. |
|
@rybosome Pushed the 4 local merge-readiness commits to :\n- 8fdaceb\n- bceaa58\n- 112c4ac\n- 1f1f1cc (HEAD)\n\nAll CI checks are green (Deploy production is skipped as expected). Branch is up-to-date and GitHub reports no merge conflicts (mergeable: MERGEABLE).\n\nResolved all 9 review threads listed in the task. Ready to merge once review requirement is satisfied. Remaining blocker: an approving review from a human reviewer (I can’t approve my own PR). |
|
@CharlieHelps please conduct a thorough PR review |
|
Pulled down PR #388 locally and ran High-levelThis is a nice chunk of functionality:
What looks strong
Suggestions / nits (mostly non-blocking)
Overall: this looks close to merge-ready to me — the only thing I’d strongly consider tightening before shipping is the Windows/absolute-path edge case in the meta-kernel restriction helper. |
@CharlieHelps sounds good, please fix the pathin edge case in the meta-kernel restriction helper so that we can merge this. |
|
(Edited) Formatting issue in previous comment — please ignore. See follow-up: #388 (comment) |
|
Pushed a fix for the Windows cross-drive escape case.
|
|
@CharlieHelps please conduct a thorough review of this PR |
|
Thanks for the ping — I pulled PR #388 and did a pass over the contract/fixture/runner pieces. Verification
What looks good
One fix I’d like before merge (Windows cross-drive escape)You patched On Windows, Suggested fix is to mirror the backend-verify logic: const rel = path.relative(allowedDirReal, resolvedReal)
if (rel === ".." || rel.startsWith(".." + path.sep) || path.isAbsolute(rel)) {
throw new Error(...)
}Nits / follow-ups
|
|
@CharlieHelps thank you for the additional review, my replies inline.
Agreed, please add this to the PR
good suggestion, please add this
please go with timestamped subdir default |
|
✅ Pushed the requested updates in commit cef0db644b4a68f0e15e0b3516dad0cd68323cb0:\n\n- Hardened to also reject cross-drive Windows returning absolute paths (), matching backend-verify.\n- Documented metric units: are , is (BMF has no units field). Updated and bench CLI help.\n- Default output dir now uses a timestamped subdir to avoid clobbering; still uses the exact provided directory.\n- Verified
Compliance file check passed.
Checking formatting...
packages/tspice-native-darwin-arm64 | WARN Unsupported platform: wanted: {"cpu":["arm64"],"os":["darwin"],"libc":["any"]} (current: {"os":"linux","cpu":"arm64","libc":"glibc"})
• Packages in scope: @rybosome/orrery, @rybosome/tspice, @rybosome/tspice-backend-contract, @rybosome/tspice-backend-fake, @rybosome/tspice-backend-shim-c, @rybosome/tspice-backend-verify, @rybosome/tspice-backend-wasm, @rybosome/tspice-bench-contract, @rybosome/tspice-core, @rybosome/tspice-native-darwin-arm64, @rybosome/tspice-native-darwin-x64, @rybosome/tspice-native-linux-x64-gnu, @rybosome/tspice-perf-analysis, @rybosome/tspice-web-components Tasks: 11 successful, 11 total
OK
• Packages in scope: @rybosome/tspice-backend-contract Tasks: 1 successful, 1 total • Packages in scope: @rybosome/orrery, @rybosome/tspice, @rybosome/tspice-backend-contract, @rybosome/tspice-backend-fake, @rybosome/tspice-backend-node, @rybosome/tspice-backend-shim-c, @rybosome/tspice-backend-verify, @rybosome/tspice-backend-wasm, @rybosome/tspice-bench-contract, @rybosome/tspice-core, @rybosome/tspice-native-darwin-arm64, @rybosome/tspice-native-darwin-x64, @rybosome/tspice-native-linux-x64-gnu, @rybosome/tspice-perf-analysis, @rybosome/tspice-web-components Tasks: 11 successful, 11 total
• Packages in scope: @rybosome/orrery, @rybosome/tspice, @rybosome/tspice-backend-contract, @rybosome/tspice-backend-fake, @rybosome/tspice-backend-shim-c, @rybosome/tspice-backend-verify, @rybosome/tspice-backend-wasm, @rybosome/tspice-bench-contract, @rybosome/tspice-core, @rybosome/tspice-native-darwin-arm64, @rybosome/tspice-native-darwin-x64, @rybosome/tspice-native-linux-x64-gnu, @rybosome/tspice-perf-analysis, @rybosome/tspice-web-components Tasks: 20 successful, 20 total
• Packages in scope: @rybosome/orrery, @rybosome/tspice, @rybosome/tspice-backend-contract, @rybosome/tspice-backend-fake, @rybosome/tspice-backend-node, @rybosome/tspice-backend-shim-c, @rybosome/tspice-backend-verify, @rybosome/tspice-backend-wasm, @rybosome/tspice-bench-contract, @rybosome/tspice-core, @rybosome/tspice-native-darwin-arm64, @rybosome/tspice-native-darwin-x64, @rybosome/tspice-native-linux-x64-gnu, @rybosome/tspice-perf-analysis, @rybosome/tspice-web-components Tasks: 11 successful, 11 total
|
|
✅ Added the requested updates in commit:
|
|
@CharlieHelps please conduct a thorough review of this PR |
There was a problem hiding this comment.
Thanks — this is a big step toward reproducible perf runs and the contract-v1 pipeline reads nicely.
What looks solid
parse → validate → normalizeseparation underpackages/perf-analysis/src/contracts/benchmark-contract/v1/is clean and makes runner logic much simpler.- Fixture resolution (
$FIXTURES+ escape protection + the “fixture-pack dir alias” meta-kernel restriction) feels thoughtfully designed (packages/perf-analysis/src/shared/fixtures/resolveFixtureRef.ts). - Runner structure (warmup vs measured, isolate between iterations) is clear, and the missing-native-addon remediation message is excellent (
.../runners/node-native/runNodeNativeBench.ts). - Emitting both
raw.jsonandbmf.jsonis a great debugging + pipeline combo.
Things to double-check / consider tightening
-
Docs/description vs behavior on output dir + overwrite
- PR text mentions default
./benchmarks/results+ overwrite, butbench.tsappears to default to a timestamped subdir (benchmarks/results/<timestamp>). - Either way,
writeFile()will silently clobber existing files in a chosen--outDir. - Suggestion: align README/PR text with actual default, and consider an explicit
--overwrite/--no-clobberchoice (or at least warn when outputs already exist).
- PR text mentions default
-
Atomic writes
- Current
writeFile(raw.json)/writeFile(bmf.json)isn’t atomic. A crash mid-write can leave truncated JSON. - Consider
write tmp → renamefor each output.
- Current
-
BMF semantics / units
- Metrics are implicitly
ns/opfor latency andops/secfor throughput. That’s fine, but worth making very explicit since BMF has no unit field. - Throughput uses mean while latency uses quantiles — also fine, but I’d document that as a deliberate choice.
- Metrics are implicitly
-
Suite YAML extensibility
validate()rejecting unknown top-level keys is great for catching typos, but may make forward-compat painful. If you expect extension, maybe allowx-...keys (or similar) rather than hard erroring.
-
Timing + warmup semantics
- Timer excludes kernel loading (
furnshAll()happens beforehrtime), which seems intentional — but I’d call it out in docs so consumers know what’s being measured. - Warmup isolates per warmup-iteration (not per op). Probably fine; just confirming intent.
- Timer excludes kernel loading (
-
Native addon detection brittleness
- If any logic relies on regex-ing error messages for “addon missing”, that can be brittle across Node/platforms. If you see flakes, a more structured detection (expected path checks / typed error) might be worth it.
Tests / hardening follow-ups
I didn’t see new unit coverage for the highest-risk bits:
- contract v1 validate/normalize (
unknown keys, fixture ref forms, null handling) $FIXTURESresolution (multi-root selection, escape attempts, Windows drive-letter behavior)- fixture-pack dir alias + meta-kernel restriction behavior
stats.tsquantiles + BMF mapping shape
Local verification
pnpm checkpassed for me.pnpm bench --backend node-native ...fails on linux-arm64 due to missing native addon (expected), and the error message is clear + actionable.
|
Done — I left a thorough PR review here: #388 (review) |
|
@CharlieHelps thank you for the review, replies inline
Please update the README/PR text with reality (the code behavior)
Sounds good, please do this
Ok, please document this thoroughly.
No extension expected.
Can you explain this one more clearly? Not sure this is fine.
ok, please add unit test coverage for these highest-risk bits.
|
|
Re: #388 (comment) Updates in the latest push:
BMF units / semanticsBMF doesn’t have a unit field, so the docs are explicit about implied units:
(See the “BMF semantics + units” section in Timing + warmup semantics
Tests / verificationAdded unit tests covering:
And |
Closes #337
Summary
$FIXTURESin paths)raw.jsonandbmf.jsonoutputspnpm bench --backend node-native --suite microBMF mapping note
We emit
latency_p50andlatency_p95as separate measures (instead of using BMFupper_value) becauseupper_valueis just a generic bound, while distinct measures are first-class for alerting/thresholding.Output directory / overwrite behavior
./benchmarks/results/<YYYYMMDD-HHmmss>/(timestamped per run to avoid clobbering).--outDirwrites directly to the chosen directory.raw.json/bmf.jsonalready exist in the output dir, they are replaced (via atomic tmp→rename).Native prerequisite
The
node-nativebackend requires the native addon to be built/available; if it's missing, the runner exits with a clear prerequisite/failure message.