Skip to content

[Tracking] March 2026 benchmark findings baseline #1

@Simonsbs

Description

@Simonsbs

Context

Latest benchmark snapshots (March 2, 2026) show mixed results:

  • Apples-to-apples sum6 runtime: L0 501.81 Mops/s vs GCC 501.09 Mops/s (near parity).
  • Build throughput on same kernel: L0 1250 ops/s vs GCC 60 ops/s.
  • Real LLM cmd benchmark (qwen2.5:1.5b): verify 0%, semantic 0% across 6 tasks.
  • Token compactness remains favorable: avg L0/C token ratio 0.6202.

Goal

Track this snapshot as the baseline and keep follow-up work linked to concrete deltas.

Tasks

  • Add links to latest benchmark docs/wiki in this issue.
  • Track improvement deltas for runtime throughput, build throughput, verify success, semantic success, and token ratio.
  • Close only after successor tasks have landed and new benchmark reports are published.

References

  • docs/PERFORMANCE_COMPARISON_APPLES_TO_APPLES.md
  • docs/PERFORMANCE_COMPARISON.md
  • docs/LLM_BENCHMARK_RESULTS.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/benchmarkBenchmark methodology and automationarea/llm-usabilityLLM generation reliability and token efficiencyarea/perfCompiler/runtime performance workstreamspriority/highHigh prioritytype/trackingTracking/meta issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions