feat(benchmarks): competitive comparison docs and JSON output (closes #7) by ringo380 · Pull Request #18 · ringo380/inferno

ringo380 · 2026-04-02T23:46:05Z

Summary

Updates METAL_GPU_RESULTS.md with a competitive comparison table (Inferno vs llama.cpp, Ollama, LM Studio on M4 Max), projected throughput across all Apple Silicon variants based on memory bandwidth scaling, and a community contribution guide
Adds --output-json <file> flag to cargo run -- bench for machine-readable result export
Expands the ignored test_inference_performance test to report the full metric set required by Test Metal GPU Performance Across Apple Silicon Variants #7: hardware info, load time, per-run throughput, latency stats
Adds scripts/benchmark_metal.sh — one-command standardized benchmark that writes a JSON results file contributors can share in Test Metal GPU Performance Across Apple Silicon Variants #7

Test plan

cargo test --lib — 900 tests pass
cargo test --test metal_performance_tests --features gguf — 4 pass, 1 ignored (requires real model)
Manual: cargo run --release -- bench --model <gguf> --output-json /tmp/result.json to verify JSON output

Run npm audit fix to patch all non-breaking security issues: - next: 15.5.11 → 15.5.14 (HTTP request smuggling, image cache growth) - axios: DoS via __proto__ in mergeConfig - socket.io-parser: unbounded binary attachments - serialize-javascript: RCE via RegExp/Date - storybook: dev server WebSocket hijacking - lodash: prototype pollution + code injection - brace-expansion, minimatch, picomatch, ajv: ReDoS vulnerabilities 10 low-severity vulnerabilities remain in deep transitive chains (elliptic via @storybook/nextjs, @tootallnate/once via jest-environment-jsdom) that require breaking changes to fix and affect dev tooling only. Closes #2

… issue #7 - Update METAL_GPU_RESULTS.md with competitive comparison table (vs llama.cpp, Ollama, LM Studio), projected throughput across all Apple Silicon variants based on memory bandwidth scaling, and a community contribution guide for other hardware owners - Add --output-json flag to bench CLI for machine-readable result export - Expand ignored test_inference_performance to report full metric set (hardware info, load time, per-run breakdown, latency stats) - Add scripts/benchmark_metal.sh for standardized one-command benchmarking Closes #7

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 35872e6a44

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T23:50:46Z

scripts/benchmark_metal.sh

+echo "Tokens:     $TOKENS per iteration"
+echo ""
+
+cargo run --release -- bench \


Add gguf feature flag when invoking benchmark command

The new benchmark script calls cargo run --release -- bench without enabling any model backend feature, so in a default checkout it builds with default features only (which exclude gguf) and bench --model *.gguf cannot resolve a backend at runtime. This makes the advertised one-command Metal benchmark fail for users unless they manually add features; invoke Cargo with --features gguf (or an equivalent feature set) here to make the script work out of the box.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-02T23:50:46Z

tests/metal_performance_tests.rs

+        let mut sys = System::new_all();
+        sys.refresh_all();
+        let hostname = sys.host_name().unwrap_or_else(|| "unknown".to_string());
+        let total_memory_gb = sys.total_memory() as f64 / 1_073_741_824.0;


Convert sysinfo memory units before reporting total RAM

sysinfo in this repo is pinned to 0.29.x, where total_memory() is reported in KiB; dividing by 1_073_741_824.0 treats the value as bytes and underreports RAM by ~1024x (for example, typical 32–64 GB machines print near 0.0 GB). Because this test output is intended for benchmark submissions, the hardware metadata becomes misleading unless the conversion uses KiB→GiB scaling.

Useful? React with 👍 / 👎.

- Fix inverted token heuristic in performance test: was dividing by 0.75 (inflating count ~33%); now uses chars/4 to match bench CLI exactly - Fix median calculation to handle partial inference failures: use proper odd/even formula instead of len()/2 which was wrong for even arrays - Add hardware fields to BenchmarkJsonResult (hostname, total_memory_gb, os_version) so JSON exports are self-contained for issue #7 contributors - Validate --output-json parent directory exists before running benchmark to avoid a cryptic OS error after a full run completes - Fix sysinfo memory divisor to use 1_073_741_824.0 per CLAUDE.md

ringo380 added 2 commits April 2, 2026 17:32

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

ringo380 merged commit 95e7998 into main Apr 3, 2026
7 of 20 checks passed

ringo380 deleted the feature/metal-benchmarking-docs branch April 3, 2026 00:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(benchmarks): competitive comparison docs and JSON output (closes #7)#18

feat(benchmarks): competitive comparison docs and JSON output (closes #7)#18
ringo380 merged 3 commits intomainfrom
feature/metal-benchmarking-docs

ringo380 commented Apr 2, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ringo380 commented Apr 2, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant