Skip to content

perf(assertion-executor): perf-only hot-path wins and profiling loop#675

Open
odyslam wants to merge 15 commits intomainfrom
codex/final-polish-bench
Open

perf(assertion-executor): perf-only hot-path wins and profiling loop#675
odyslam wants to merge 15 commits intomainfrom
codex/final-polish-bench

Conversation

@odyslam
Copy link
Contributor

@odyslam odyslam commented Mar 14, 2026

Summary

  • lands only the perf-path changes extracted from the stacked executor PRs; no new cheatcodes or other business-logic feature work
  • keeps the executor, tracer, store, and precompile hot-path optimizations together with the benchmark/profiling scaffolding used to isolate and validate them
  • hardens the new perf paths with targeted tests, CI cleanup, and explanatory comments for reviewers

Review Guide

  1. Executor hot path
    • crates/assertion-executor/src/executor/mod.rs
    • crates/assertion-executor/src/executor/with_inspector.rs
    • empty-selector short-circuits, single-selector direct execution, and small-batch scheduling
  2. Tracer / store hot path
    • crates/assertion-executor/src/inspectors/tracer.rs
    • crates/assertion-executor/src/store/assertion_store.rs
    • crates/assertion-executor/src/inspectors/precompiles/legacy/get_logs.rs
    • crates/assertion-executor/src/inspectors/precompiles/legacy/state_changes.rs
    • cached log encoding, tx-log indexing, storage/write indexing, and lower-allocation trigger reads
  3. Benchmarking / profiling loop
    • crates/benchmark-utils/src/perf.rs
    • crates/benchmark-utils/src/bin/benchmark-utils-perf.rs
    • crates/assertion-executor/benches/
    • Makefile
    • .github/workflows/benchmarks.yml
    • .claude/skills/samply-profile/SKILL.md
    • phase-isolated Criterion benches plus the warmed samply entrypoint

Non-goals

  • no new cheatcodes
  • no new trigger semantics or assertion feature work
  • no intended behavior changes beyond preserving the existing assertion path while reducing setup and tracing overhead

Validation

  • cargo clippy -p assertion-executor -p benchmark-utils --all-targets --features test -- -D warnings -D clippy::pedantic
  • cargo test -p benchmark-utils
  • cargo test -p assertion-executor --lib
  • cargo bench --manifest-path crates/assertion-executor/Cargo.toml --features test --benches --no-run
  • RUSTFLAGS='-C opt-level=1' SIDECAR_BENCH_TRANSPORT=mock RUST_LOG=error cargo bench -p sidecar --features bench-utils --benches --no-run

Representative Bench Deltas Vs main

  • assertion_store::read/miss_nonexistent_assertion: about -97.7%
  • call_tracer_truncate/15k_deep_pending: about -19.4%
  • avg_block_100_aa: about -32.5%
  • erc20_transaction: about -65.5%
  • uniswap_transaction_aa: about -28.1%

Samply Follow-up

  • added warmed samply runs via benchmark-utils-perf
  • the warmed profiles currently point to assertion-side account-load / db-clone setup churn as the next likely optimization target

@github-actions
Copy link

Benchmark report is ready.

@odyslam odyslam changed the title Fix assertion_store hot path regression Bucket executor performance improvements Mar 14, 2026
@odyslam odyslam changed the base branch from stack/06-final-polish to main March 14, 2026 12:16
@github-actions
Copy link

Benchmark report is ready.

@odyslam odyslam force-pushed the codex/final-polish-bench branch from 696078b to f0d5516 Compare March 14, 2026 13:13
@odyslam odyslam changed the title Bucket executor performance improvements perf(assertion-executor): extract perf-only executor and tracer wins Mar 14, 2026
odyslam and others added 7 commits March 14, 2026 15:29
…path

Assertions that match no trigger selectors were previously returned from
the store read path with an empty selector set, causing unnecessary setup
work (persistent account insertion, MultiForkDb creation) in the executor.

Changed read_adopter from .map() to .filter_map() so assertions with zero
matched selectors are dropped before reaching execution.

Tests: assertion_store suite (32/32 pass), full lib suite (210 pass, 1
pre-existing failure from missing test artifact)

Bench baseline (before change):
  assertion_store::read/hit_existing_assertion: 679.68 ns
  assertion_store::read/miss_nonexistent: 324.00 ns
  executor_transaction_performance/eoa_transaction_aa: 44.824 µs
  executor_transaction_performance/erc20_transaction_aa: 67.632 µs
  executor_transaction_performance/uniswap_transaction_aa: 387.06 µs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit a10e259)
…xecution

Both run_assertion_contract and run_assertion_contract_with_inspector now
return immediately when fn_selectors is empty, avoiding unnecessary
prepare_assertion_contract work (persistent account insertion, MultiForkDb
creation, PhEvmInspector cloning).

Combined with Phase 1 selector-pruning, this ensures zero-work execution
when trigger matching produces no selectors.

Tests: executor suite (6/6 pass)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit dac36b2)
Adds a PARALLEL_THRESHOLD constant (currently 2). When the number of
assertion contracts or fn_selectors is below this threshold, execution
uses sequential iterators instead of rayon's parallel iterators to avoid
thread scheduling overhead.

This matters because:
- Most transactions trigger 0-1 assertion contracts
- Most assertion contracts have 1-3 fn_selectors
- Rayon fan-out has ~10-20µs scheduling overhead per task

The change is behavior-preserving: only scheduling strategy changes.

Also re-exports AssertionsForExecution from store module for use in
the executor's sequential path closure type annotation.

Tests: full lib suite (210 pass, 1 pre-existing failure)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 1d72940)
@odyslam odyslam force-pushed the codex/final-polish-bench branch from 6db8e8a to d99f7d6 Compare March 14, 2026 14:40
@github-actions
Copy link

Benchmark report is ready.

@github-actions
Copy link

Benchmark report is ready.

@github-actions
Copy link

Benchmark report is ready.

@odyslam odyslam changed the title perf(assertion-executor): extract perf-only executor and tracer wins perf(assertion-executor): perf-only hot-path wins and profiling loop Mar 17, 2026
@odyslam
Copy link
Contributor Author

odyslam commented Mar 17, 2026

Updated the PR title/body to make review easier.

Suggested review order:

  1. executor hot path (executor/mod.rs, executor/with_inspector.rs)
  2. tracer/store hot path (inspectors/tracer.rs, store/assertion_store.rs, legacy precompile caches)
  3. benchmarking/profiling scaffolding (benchmark-utils, benches, Makefile, workflow, samply skill)

This PR is intentionally perf-only: no new cheatcodes and no new business-logic feature work.

@odyslam odyslam marked this pull request as ready for review March 17, 2026 02:10
@odyslam odyslam requested a review from makemake-kbo as a code owner March 17, 2026 02:10
@github-actions
Copy link

Benchmark report is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant