Skip to content

refactor: harden runtime safety across client and prover#2342

Open
sergeytimoshin wants to merge 6 commits intomainfrom
sergey/runtime-safety-hardening
Open

refactor: harden runtime safety across client and prover#2342
sergeytimoshin wants to merge 6 commits intomainfrom
sergey/runtime-safety-hardening

Conversation

@sergeytimoshin
Copy link
Contributor

@sergeytimoshin sergeytimoshin commented Mar 14, 2026

Summary by CodeRabbit

  • New Features

    • Expose richer batch snapshots and a next-insertion index for address queues to improve batch processing visibility.
  • Bug Fixes

    • More robust startup and health checks with clearer error reporting and guarded initialization.
    • Improved error handling for proof and integer conversion failures.
  • Refactor

    • Unified batch retrieval and proof reconstruction paths; grouped PDA fetching for better batching and concurrency.
  • Tests

    • Added tests validating queue/leaf index behavior and batch proof reconstruction.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors batch retrieval into a height-parametric AddressBatchSnapshot<HEIGHT>, adds on-demand leaves_hashchain derivation, makes several prover/proof-client constructors fallible with propagated errors, converts many proof inputs to borrowed slices and fixed-size proof arrays, adds tree_next_insertion_index propagation, and introduces grouped PDA proof fetching and additional health/startup checks.

Changes

Cohort / File(s) Summary
Forester batch snapshot & queue
forester/src/processor/v2/helpers.rs, forester/src/processor/v2/strategy/address.rs
Replaces BatchDataSlice with AddressBatchSnapshot<const HEIGHT: usize>, adds get_batch_snapshot with on-demand leaves_hashchain derivation, and exposes tree_next_insertion_index().
Proof worker init propagation
forester/src/processor/v2/proof_worker.rs, forester/src/processor/v2/processor.rs
ProofClients::new() and spawn_proof_workers() now return Result and errors are propagated from processor when spawning workers.
Prover client startup & health
prover/client/src/prover.rs, prover/client/src/constants.rs, prover/client/src/proof_client.rs, prover/client/src/errors.rs, prover/client/src/helpers.rs
Introduces HTTP/TCP/curl health checks, build_http_client() helper, makes client constructors fallible (Result), adds IntegerConversion error variant, and changes SERVER_ADDRESS to 127.0.0.1.
Proof input API & proofs → slices/arrays
prover/client/src/proof_types/batch_address_append/proof_inputs.rs, prover/client/src/proof_types/batch_append/proof_inputs.rs, prover/client/src/proof_types/batch_update/proof_inputs.rs, prover/client/tests/batch_address_append.rs
Converts many inputs from owned Vec to borrowed slices and generic index types, changes per-element proofs to fixed-size [[u8;32]; HEIGHT], adds TryInto<usize> conversions and fallible Merkle-root computations. Tests updated accordingly (warmup, fixed-size proofs).
SDK indexer: proof reconstruction & queue data
sdk-libs/client/src/indexer/types/queue.rs, sdk-libs/client/src/indexer/photon_indexer.rs, sdk-libs/program-test/src/indexer/test_indexer.rs
Adds tree_next_insertion_index to queue types, moves to const-HEIGHT proof arrays, adds reconstruct_proofs/reconstruct_all_proofs using build_node_lookup() for O(1) lookups, validates proof height and populates tree_next_insertion_index in responses.
SDK load-accounts batching
sdk-libs/client/src/interface/load_accounts.rs
Adds PDA grouping (MAX_PDAS_PER_IX = 8), batched proof fetch (fetch_proof_batches), and LoadAccountsError::MissingPdaCompressedData for missing PDA compressed data.
SDK decompression token seed grouping
sdk-libs/sdk-types/src/interface/program/decompression/processor.rs, .../token.rs, .../pda.rs
Deepens token_seeds to Vec<Vec<Vec<u8>>> (per-vault seed groups) and documents CPI seed/group behavior (comment-only in pda.rs).
Instructions / token seed minor fixes
sdk-libs/client/src/interface/instructions.rs, sdk-libs/sdk-types/src/interface/account/token_seeds.rs
Injects queue_pubkey_index into PackedStateTreeInfo and sources queue index from tree_info.queue_pubkey_index; renames unused param to suppress warnings.

Sequence Diagram(s)

sequenceDiagram
participant StreamQ as StreamingAddressQueue
participant Storage as AddressStorage/Canopy
participant HashUtil as HashChainUtil
participant Staging as StagingTree

StreamQ->>Storage: get_batch_snapshot(start,end,hashchain_idx)
alt leaves_hashchain present
  Storage-->>StreamQ: AddressBatchSnapshot{addresses,values,indices,proofs,leaves_hashchain}
else missing leaves_hashchain
  Storage->>HashUtil: create_hash_chain_from_slice(addresses)
  HashUtil-->>Storage: leaves_hashchain
  Storage-->>StreamQ: AddressBatchSnapshot{... leaves_hashchain}
end
StreamQ->>Staging: process_batch(AddressBatchSnapshot)
Staging-->>StreamQ: process result / mapped errors
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

ai-review

Suggested reviewers

  • SwenSchaeferjohann
  • ananas-block

Poem

🌱 Small snapshots rise in ordered rows,

Heights aligned where hash-chains softly grow.
Queues whisper indices, proofs take flight,
Fallible starts now guarded by light.
Code stitched tighter — ready for the night.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the primary objective: converting fallible operations to return Results, making error handling explicit rather than implicit.
Docstring Coverage ✅ Passed Docstring coverage is 96.45% which is sufficient. The required threshold is 70.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sergey/runtime-safety-hardening
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can generate a title for your PR based on the changes with custom instructions.

Set the reviews.auto_title_instructions setting to generate a title for your PR based on the changes in the PR with custom instructions.

@sergeytimoshin sergeytimoshin force-pushed the sergey/runtime-safety-hardening branch 4 times, most recently from af9dae3 to e3f17dd Compare March 14, 2026 19:12
@sergeytimoshin sergeytimoshin force-pushed the sergey/runtime-safety-hardening branch from e3f17dd to 6e6d9c0 Compare March 14, 2026 19:21
@sergeytimoshin sergeytimoshin force-pushed the sergey/runtime-safety-hardening branch 2 times, most recently from 65676df to 131375b Compare March 14, 2026 19:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
sdk-libs/client/src/interface/initialize_config.rs (1)

120-124: ⚠️ Potential issue | 🟠 Major

Import and reuse the module constant instead of hardcoding the discriminator.

Line 122 duplicates INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR from instructions.rs as a hardcoded local constant. When the discriminator value is already defined at the module level, hardcoding it creates a maintenance burden and risks silent drift if one copy is updated without the other. Replace the local const with an import from the instructions module.

use crate::interface::instructions::INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR;

// Then use it:
data.extend_from_slice(&INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/initialize_config.rs` around lines 120 - 124,
Replace the locally hardcoded const DISCRIMINATOR used for the
"initialize_compression_config" anchor with the shared module constant
INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR from instructions.rs: remove the
local DISCRIMINATOR definition and import
INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR into the file, then use that
constant where the code currently appends the discriminator (the code around
serialize_anchor_data and instruction_data) so the anchor uses the single
canonical value.
sdk-libs/client/src/utils.rs (1)

10-16: ⚠️ Potential issue | 🟡 Minor

Use strict UTF-8 parsing and trim only line endings when extracting filesystem paths from command output.

String::from_utf8_lossy silently replaces invalid UTF-8 bytes with the Unicode replacement character (U+FFFD), and str::trim() removes all leading/trailing whitespace—including spaces and tabs. For a binary locator, these behaviors can corrupt a valid path. Decode strictly with from_utf8(), then strip only the trailing line ending (\r\n or \n), leaving any legitimate path characters intact.

🛠️ Correct path extraction
-        let light_path = String::from_utf8_lossy(&output.stdout).trim().to_string();
+        let light_path = String::from_utf8(output.stdout)
+            .ok()?
+            .trim_end_matches(&['\r', '\n'][..])
+            .to_owned();

Also applies to: lines 30–38

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/utils.rs` around lines 10 - 16, The current use of
String::from_utf8_lossy and str::trim when building light_path can corrupt valid
filesystem paths; replace the lossy decoding with strict UTF-8 parsing using
std::str::from_utf8(&output.stdout).ok() (returning None on parse error), then
remove only trailing line endings (handle "\r\n" and "\n") via trim_end_matches
to produce the final String; apply the same fix to the other occurrence (the
similar code block around lines 30–38) so both code paths strictly validate
UTF-8 and only strip line endings when producing light_path.
forester/src/forester_status.rs (1)

673-687: ⚠️ Potential issue | 🟠 Major

Don't report unreadable V1 queues as empty.

Lines 674-684 and 729-739 currently turn parse_hash_set_from_bytes failures into (Some(0), Some(0)). That makes a corrupted or layout-mismatched queue look healthy, which underreports pending work in the status output and hides the real operational failure. Prefer logging and returning None here, or bubble the parse error so "empty" and "unparseable" stay distinct.

Also applies to: 728-742

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/forester_status.rs` around lines 673 - 687, The code currently
maps parse_hash_set_from_bytes::<QueueAccount> failures into (Some(0), Some(0)),
hiding parse/layout errors; change the logic in the block using
parse_hash_set_from_bytes::<QueueAccount> so that on parse failure you log the
error (including context) and return (None, None) (or propagate the parse error)
instead of unwrap_or((0,0)); adjust the closure that computes (len, cap) (which
references .iter(), .get_capacity(), and .sequence_number) to only produce
(Some(len), Some(cap)) when parse succeeds and otherwise produce (None, None);
apply the same fix to the second occurrence of this pattern (the similar block
around the other parse call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@forester/src/processor/v2/helpers.rs`:
- Around line 489-534: The get_batch_snapshot function must only return fully
available, internally consistent fixed-size batches: do not shrink the requested
range to available; instead, if end > self.wait_for_batch(end) return Ok(None).
Also verify that all companion vectors (data.addresses, data.low_element_values,
data.low_element_next_values, data.low_element_indices,
data.low_element_next_indices, and data.reconstruct_proofs range) have at least
end elements and that their slices [start..end] produce the same length as
addresses before constructing AddressBatchSnapshot; if any length mismatch or
addresses.is_empty() then return Ok(None) (not Err or a truncated snapshot).
Keep the existing fallback for missing leaves_hash_chains but only use it when
the full batch is present.

In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs`:
- Line 369: The new error-return paths can leave changelog, indexed_changelog,
and sparse_merkle_tree half-updated (e.g., after calling
patcher.push_changelog_entry::<HEIGHT>, inserting into indexed_changelog, and
sparse_merkle_tree.append()), so make these mutations transactional: either
stage changes in temporary structures and only commit them back to changelog,
indexed_changelog, and sparse_merkle_tree after all fallible operations succeed,
or implement explicit rollback logic that reverts the earlier
push_changelog_entry::<HEIGHT>, the indexed_changelog insert, and the
sparse_merkle_tree.append() on every subsequent ?-return path (the branches
noted around the indexed_changelog insertion and the block that can fail at
lines ~398-408 and ~431-435). Ensure any temporary clones or rollback routines
preserve original state and are invoked before returning Err.
- Around line 202-209: Before taking slices or indexing, validate that
new_element_values.len() >= zkp_batch_size and that all parallel input
slices/vecs have length >= zkp_batch_size; if any check fails return the
appropriate ProverClientError instead of panicking. Specifically, add a guard
before the line using let new_element_values =
&new_element_values[..zkp_batch_size]; that verifies new_element_values.len() >=
zkp_batch_size and also verify lengths of the parallel collections used later
(the collections referenced by low_element_circuit_merkle_proofs,
new_element_circuit_merkle_proofs, patched_low_element_next_values/indices,
patched_low_element_values/indices or the original input slices they derive
from) and return an Err(ProverClientError::...) on mismatch. Apply the same
validation for the other slicing/indexing blocks around the 244-273 region.

In `@prover/client/src/proof_types/batch_update/proof_inputs.rs`:
- Around line 115-117: The public_inputs method currently indexes self.0[0],
which panics on empty BatchUpdateInputs and only returns the first circuit's
inputs; change public_inputs to first guard for empty inputs (e.g., return a
ProverClientError variant when self.0.is_empty()) and then iterate over all
entries in self.0, calling public_inputs_arr()? on each element and collect
their [u8; 32] results into the returned Vec<[u8; 32]>, preserving failure
semantics via the existing Result type; update references to public_inputs and
use ProverClientError for the empty-case error.

In `@prover/client/src/proof.rs`:
- Around line 72-89: Replace the loose trim_start_matches("0x") logic in
deserialize_hex_string_to_be_bytes with a strict single-prefix handling and
switch to unsigned parsing: check hex_str.strip_prefix("0x") and use the
stripped slice if present (or the original string if not) so inputs like
"0x0x01" do not silently become valid; parse into num_bigint::BigUint (use
BigUint::from_str_radix or parse_bytes) instead of BigInt, use
BigUint::to_bytes_be() for the bytes, and keep the existing
zero-pad/length-check logic but map parse errors to
ProverClientError::InvalidHexString with the original hex_str included for
debugging (referencing the function deserialize_hex_string_to_be_bytes and
variables big_int/big_int_bytes which should be renamed to
big_uint/big_uint_bytes).
- Around line 167-170: The deserialization in negate_g1 currently uses
G1::deserialize_with_mode(..., Validate::No), allowing untrusted prover bytes
through; change the call to use Validate::Yes so the G1 point is checked
(on-curve/subgroup) before any negation. Update the G1::deserialize_with_mode
invocation in negate_g1 (after convert_endianness) to pass Validate::Yes and
keep the existing map_err to ProverClientError::InvalidProofData to surface
validation failures.

In `@prover/client/src/prover.rs`:
- Around line 35-36: The startup and health-check logic currently treats any
Process::wait_with_output() result and any reqwest send().await as success; fix
by explicitly checking the child process exit status (use
Output::status.success() or Output::exit_ok()) after wait_with_output() in the
startup sequence (the function invoking wait_with_output()), and update the
health_check() implementation to inspect the HTTP response status (call
response.error_for_status() or check response.status().is_success() after
send().await) before returning true; ensure both the startup caller and
health_check() return Err on non-successful exit codes or HTTP 4xx/5xx responses
so failing startups/health checks are detected.
- Line 19: spawn_prover currently uses the blocking
std::process::Command::wait_with_output() and health_check uses
std::thread::sleep(), which block the Tokio runtime; replace the child spawn and
wait logic in spawn_prover by using tokio::process::Command and await its
.output().await (or .status().await) instead of wait_with_output, and change any
std::thread::sleep(duration) in health_check to
tokio::time::sleep(duration).await so all waits are async; update imports (use
tokio::process::Command and tokio::time::sleep) and ensure error handling
remains the same for the functions spawn_prover and health_check.

In `@prover/client/tests/batch_address_append.rs`:
- Around line 97-102: The test still treats
proof_types::batch_address_append::to_json as infallible; update the test to
handle its Result return (for the calls around the existing to_json usages at
the two sites noted) by either changing the test signature to return Result<(),
Box<dyn std::error::Error>> and using the ? operator on each
proof_types::batch_address_append::to_json(...) call, or explicitly
unwrap/map_err them (e.g., .unwrap() or .expect("...")); locate the two to_json
call sites in batch_address_append.rs and replace the current direct String
usage with error-aware handling so the code compiles with the fallible API.

In `@sdk-libs/client/src/indexer/types/proof.rs`:
- Around line 214-223: The loop currently keeps the first computed
output_tree_index and silently ignores any subsequent differing indices; change
this to detect mismatches and return an error instead of keeping the first
value. In the code paths using account.tree_info.next_tree_info and
account.tree_info.pack_output_tree_index(packed_accounts) (and the local
variable output_tree_index), compute each index as before but if
output_tree_index is Some(prev) and the new index != prev then return a
descriptive error (e.g., Err or a custom ValidationError) indicating mixed
output-tree indices; otherwise set output_tree_index = Some(index) when it was
None. Apply this guard where PackedStateTreeInfos expects a single
output_tree_index so inconsistent state proofs are rejected.

In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Around line 209-214: The function group_pda_specs currently uses
assert!(max_per_group > 0, ...) which introduces a panic; replace this with
debug_assert!(max_per_group > 0, ...) to keep the developer invariant check
during testing while avoiding release-time panics, and keep the signature and
return type unchanged (call sites that pass MAX_PDAS_PER_IX will be unaffected);
if you intend this to be a public/runtime-validated API instead, alternatively
change group_pda_specs to return Result<Vec<Vec<&PdaSpec<V>>>, SomeError> and
explicitly return an Err when max_per_group == 0, but prefer debug_assert! for
internal invariants.

In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 69-70: The local variable named `path` is misleading because it
becomes a full command string; rename both occurrences (the initial `let path =
"cli/test_bin/run test-validator";` and the subsequent `let mut path =
format!("{}/{}", project_root.trim(), path);`) to `command` (e.g., `let command
= ...` and `let mut command = ...`) and update all subsequent uses like
`path.push_str(...)` to `command.push_str(...)` so the variable name reflects it
holds a command/command line rather than a filesystem path.
- Around line 128-136: The explicit std::mem::drop(child) after spawning the
process (Command::new(...).spawn() -> child) is unnecessary; remove the drop and
instead add a brief comment above the spawn or before awaiting
tokio::time::sleep(tokio::time::Duration::from_secs(config.wait_time)).await
explaining the intent to detach/ignore the child process (e.g., "intentionally
detaching spawned child; not awaiting its exit"), so the purpose is clear while
relying on scope-based drop for cleanup.
- Around line 113-126: The blocking std::process::Command.wait() call in
local_test_validator.rs stalls the tokio runtime; replace std::process::Command
and std::process::Stdio with tokio::process::Command and tokio::process::Stdio,
spawn the child the same way you do now, and await the child's termination with
child.wait().await (checking the returned ExitStatus.success()). Update imports
accordingly and remove the blocking child.wait() call so the async function does
not perform blocking IO.

---

Outside diff comments:
In `@forester/src/forester_status.rs`:
- Around line 673-687: The code currently maps
parse_hash_set_from_bytes::<QueueAccount> failures into (Some(0), Some(0)),
hiding parse/layout errors; change the logic in the block using
parse_hash_set_from_bytes::<QueueAccount> so that on parse failure you log the
error (including context) and return (None, None) (or propagate the parse error)
instead of unwrap_or((0,0)); adjust the closure that computes (len, cap) (which
references .iter(), .get_capacity(), and .sequence_number) to only produce
(Some(len), Some(cap)) when parse succeeds and otherwise produce (None, None);
apply the same fix to the second occurrence of this pattern (the similar block
around the other parse call).

In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Around line 120-124: Replace the locally hardcoded const DISCRIMINATOR used
for the "initialize_compression_config" anchor with the shared module constant
INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR from instructions.rs: remove the
local DISCRIMINATOR definition and import
INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR into the file, then use that
constant where the code currently appends the discriminator (the code around
serialize_anchor_data and instruction_data) so the anchor uses the single
canonical value.

In `@sdk-libs/client/src/utils.rs`:
- Around line 10-16: The current use of String::from_utf8_lossy and str::trim
when building light_path can corrupt valid filesystem paths; replace the lossy
decoding with strict UTF-8 parsing using
std::str::from_utf8(&output.stdout).ok() (returning None on parse error), then
remove only trailing line endings (handle "\r\n" and "\n") via trim_end_matches
to produce the final String; apply the same fix to the other occurrence (the
similar code block around lines 30–38) so both code paths strictly validate
UTF-8 and only strip line endings when producing light_path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0f7749e7-8edf-4ebf-9da0-2da516629b7a

📥 Commits

Reviewing files that changed from the base of the PR and between 2644529 and 65676df.

⛔ Files ignored due to path filters (63)
  • forester-utils/src/account_zero_copy.rs is excluded by none and included by none
  • forester-utils/src/address_merkle_tree_config.rs is excluded by none and included by none
  • forester-utils/src/address_staging_tree.rs is excluded by none and included by none
  • forester-utils/src/forester_epoch.rs is excluded by none and included by none
  • forester-utils/src/instructions/compress_and_close_mint.rs is excluded by none and included by none
  • forester-utils/src/rate_limiter.rs is excluded by none and included by none
  • forester-utils/src/registry.rs is excluded by none and included by none
  • program-tests/account-compression-test/tests/address_merkle_tree_tests.rs is excluded by none and included by none
  • program-tests/account-compression-test/tests/merkle_tree_tests.rs is excluded by none and included by none
  • program-tests/batched-merkle-tree-test/tests/merkle_tree.rs is excluded by none and included by none
  • program-tests/compressed-token-test/tests/freeze/functional.rs is excluded by none and included by none
  • program-tests/compressed-token-test/tests/v1.rs is excluded by none and included by none
  • program-tests/system-cpi-v2-test/tests/event.rs is excluded by none and included by none
  • program-tests/system-cpi-v2-test/tests/invoke_cpi_with_read_only.rs is excluded by none and included by none
  • program-tests/utils/src/account_zero_copy.rs is excluded by none and included by none
  • program-tests/utils/src/actions/legacy/instructions/transfer2.rs is excluded by none and included by none
  • program-tests/utils/src/address_tree_rollover.rs is excluded by none and included by none
  • program-tests/utils/src/assert_compressed_tx.rs is excluded by none and included by none
  • program-tests/utils/src/assert_merkle_tree.rs is excluded by none and included by none
  • program-tests/utils/src/assert_queue.rs is excluded by none and included by none
  • program-tests/utils/src/batched_address_tree.rs is excluded by none and included by none
  • program-tests/utils/src/e2e_test_env.rs is excluded by none and included by none
  • program-tests/utils/src/lib.rs is excluded by none and included by none
  • program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
  • program-tests/utils/src/state_tree_rollover.rs is excluded by none and included by none
  • program-tests/utils/src/test_batch_forester.rs is excluded by none and included by none
  • program-tests/utils/src/test_forester.rs is excluded by none and included by none
  • sdk-tests/anchor-manual-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/anchor-semi-manual-test/tests/shared/mod.rs is excluded by none and included by none
  • sdk-tests/anchor-semi-manual-test/tests/stress_test.rs is excluded by none and included by none
  • sdk-tests/client-test/tests/light_client.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/amm_stress_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/amm_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/basic_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d10_ata_idempotent_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d10_token_accounts_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d11_zero_copy_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/failing_tests.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/integration_tests.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/pinocchio-light-program-test/tests/shared/mod.rs is excluded by none and included by none
  • sdk-tests/pinocchio-light-program-test/tests/stress_test.rs is excluded by none and included by none
  • sdk-tests/pinocchio-manual-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-native-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-pinocchio-v1-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-pinocchio-v2-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/ctoken_pda.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/decompress_full_cpi.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/pda_ctoken.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_4_invocations.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_4_transfer2.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_deposit.rs is excluded by none and included by none
  • sdk-tests/sdk-v1-native-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-account-loader-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-ata-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-mint-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-pda-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-token-test/tests/test.rs is excluded by none and included by none
  • sparse-merkle-tree/src/indexed_changelog.rs is excluded by none and included by none
  • sparse-merkle-tree/tests/indexed_changelog.rs is excluded by none and included by none
📒 Files selected for processing (56)
  • forester/src/config.rs
  • forester/src/forester_status.rs
  • forester/src/main.rs
  • forester/src/processor/v2/helpers.rs
  • forester/src/processor/v2/proof_worker.rs
  • forester/src/processor/v2/strategy/address.rs
  • forester/src/queue_helpers.rs
  • forester/tests/test_compressible_pda.rs
  • prover/client/src/errors.rs
  • prover/client/src/helpers.rs
  • prover/client/src/proof.rs
  • prover/client/src/proof_client.rs
  • prover/client/src/proof_types/batch_address_append/json.rs
  • prover/client/src/proof_types/batch_address_append/proof_inputs.rs
  • prover/client/src/proof_types/batch_append/json.rs
  • prover/client/src/proof_types/batch_append/proof_inputs.rs
  • prover/client/src/proof_types/batch_update/json.rs
  • prover/client/src/proof_types/batch_update/proof_inputs.rs
  • prover/client/src/proof_types/combined/v1/json.rs
  • prover/client/src/proof_types/combined/v2/json.rs
  • prover/client/src/proof_types/combined/v2/proof_inputs.rs
  • prover/client/src/proof_types/inclusion/v1/json.rs
  • prover/client/src/proof_types/inclusion/v1/proof_inputs.rs
  • prover/client/src/proof_types/inclusion/v2/json.rs
  • prover/client/src/proof_types/inclusion/v2/proof_inputs.rs
  • prover/client/src/proof_types/non_inclusion/v1/json.rs
  • prover/client/src/proof_types/non_inclusion/v1/proof_inputs.rs
  • prover/client/src/proof_types/non_inclusion/v2/json.rs
  • prover/client/src/proof_types/non_inclusion/v2/proof_inputs.rs
  • prover/client/src/prover.rs
  • prover/client/tests/batch_address_append.rs
  • prover/client/tests/batch_append.rs
  • prover/client/tests/batch_update.rs
  • prover/client/tests/combined.rs
  • prover/client/tests/inclusion.rs
  • prover/client/tests/init_merkle_tree.rs
  • prover/client/tests/non_inclusion.rs
  • sdk-libs/client/src/fee.rs
  • sdk-libs/client/src/indexer/error.rs
  • sdk-libs/client/src/indexer/options.rs
  • sdk-libs/client/src/indexer/photon_indexer.rs
  • sdk-libs/client/src/indexer/types/proof.rs
  • sdk-libs/client/src/indexer/types/queue.rs
  • sdk-libs/client/src/indexer/types/token.rs
  • sdk-libs/client/src/interface/initialize_config.rs
  • sdk-libs/client/src/interface/instructions.rs
  • sdk-libs/client/src/interface/load_accounts.rs
  • sdk-libs/client/src/interface/mod.rs
  • sdk-libs/client/src/interface/pack.rs
  • sdk-libs/client/src/interface/serialize.rs
  • sdk-libs/client/src/local_test_validator.rs
  • sdk-libs/client/src/rpc/client.rs
  • sdk-libs/client/src/utils.rs
  • sdk-libs/macros/src/light_pdas/seeds/extract.rs
  • sdk-libs/program-test/src/program_test/compressible_setup.rs
  • sdk-libs/program-test/src/program_test/light_program_test.rs

Comment on lines +489 to +534
pub fn get_batch_snapshot<const HEIGHT: usize>(
&self,
start: usize,
end: usize,
hashchain_idx: usize,
) -> crate::Result<Option<AddressBatchSnapshot<HEIGHT>>> {
let available = self.wait_for_batch(end);
if start >= available {
return None;
return Ok(None);
}
let actual_end = end.min(available);
let data = lock_recover(&self.data, "streaming_address_queue.data");
Some(BatchDataSlice {
addresses: data.addresses[start..actual_end].to_vec(),

let addresses = data.addresses[start..actual_end].to_vec();
if addresses.is_empty() {
return Err(anyhow!("Empty batch at start={}", start));
}

let leaves_hashchain = match data.leaves_hash_chains.get(hashchain_idx).copied() {
Some(hashchain) => hashchain,
None => {
tracing::debug!(
"Missing leaves_hash_chain for batch {} (available: {}), deriving from addresses",
hashchain_idx,
data.leaves_hash_chains.len()
);
create_hash_chain_from_slice(&addresses).map_err(|error| {
anyhow!(
"Failed to derive leaves_hash_chain for batch {} from {} addresses: {}",
hashchain_idx,
addresses.len(),
error
)
})?
}
};

Ok(Some(AddressBatchSnapshot {
low_element_values: data.low_element_values[start..actual_end].to_vec(),
low_element_next_values: data.low_element_next_values[start..actual_end].to_vec(),
low_element_indices: data.low_element_indices[start..actual_end].to_vec(),
low_element_next_indices: data.low_element_next_indices[start..actual_end].to_vec(),
})
low_element_proofs: data.reconstruct_proofs::<HEIGHT>(start..actual_end)?,
addresses,
leaves_hashchain,
}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only return complete, internally consistent address batches.

Lines 495-534 can shrink the requested range to available and then slice the companion vectors unchecked. If background fetching ends early or the indexer returns mismatched vector lengths, this either panics or feeds process_batch a truncated batch with a derived hashchain. Short or inconsistent batches should stay "not ready" until the full fixed-size batch is present.

Suggested guard
-        let actual_end = end.min(available);
+        if available < end {
+            return Ok(None);
+        }
+        let actual_end = end;
         let data = lock_recover(&self.data, "streaming_address_queue.data");

-        let addresses = data.addresses[start..actual_end].to_vec();
+        let addresses = data
+            .addresses
+            .get(start..actual_end)
+            .ok_or_else(|| anyhow!("addresses shorter than requested batch"))?
+            .to_vec();
+        // Apply the same checked slicing to every parallel vector before
+        // constructing `AddressBatchSnapshot`.
Based on learnings: ZKP batch sizes for address trees must be 10 or 250 in `src/initialize_address_tree.rs`. Only these circuit sizes are supported for address trees.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/processor/v2/helpers.rs` around lines 489 - 534, The
get_batch_snapshot function must only return fully available, internally
consistent fixed-size batches: do not shrink the requested range to available;
instead, if end > self.wait_for_batch(end) return Ok(None). Also verify that all
companion vectors (data.addresses, data.low_element_values,
data.low_element_next_values, data.low_element_indices,
data.low_element_next_indices, and data.reconstruct_proofs range) have at least
end elements and that their slices [start..end] produce the same length as
addresses before constructing AddressBatchSnapshot; if any length mismatch or
addresses.is_empty() then return Ok(None) (not Err or a truncated snapshot).
Keep the existing fallback for missing leaves_hash_chains but only use it when
the full batch is present.

);
)?;

patcher.push_changelog_entry::<HEIGHT>(changelog, changelog_entry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make the new error-return path transactional.

By the time Line 404 can fail, the current iteration has already pushed to changelog (Line 369), indexed_changelog (Line 385), and mutated sparse_merkle_tree via append(). Returning Err from these new ? paths leaves caller-owned state half-applied, so retries can start from a corrupted tree/changelog snapshot.

🔧 Safer direction

Stage mutations in temporary state and only commit them back after the whole batch succeeds. If cloning the tree is too expensive, add explicit rollback for sparse_merkle_tree, changelog, and indexed_changelog on every fallible branch after the first mutation.

Also applies to: 385-385, 398-408, 431-435

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs` at line
369, The new error-return paths can leave changelog, indexed_changelog, and
sparse_merkle_tree half-updated (e.g., after calling
patcher.push_changelog_entry::<HEIGHT>, inserting into indexed_changelog, and
sparse_merkle_tree.append()), so make these mutations transactional: either
stage changes in temporary structures and only commit them back to changelog,
indexed_changelog, and sparse_merkle_tree after all fallible operations succeed,
or implement explicit rollback logic that reverts the earlier
push_changelog_entry::<HEIGHT>, the indexed_changelog insert, and the
sparse_merkle_tree.append() on every subsequent ?-return path (the branches
noted around the indexed_changelog insertion and the block that can fail at
lines ~398-408 and ~431-435). Ensure any temporary clones or rollback routines
preserve original state and are invoked before returning Err.

Comment on lines +115 to +117
pub fn public_inputs(&self) -> Result<Vec<[u8; 32]>, ProverClientError> {
// Concatenate all public inputs into a single flat vector
vec![self.0[0].public_inputs_arr()]
Ok(vec![self.0[0].public_inputs_arr()?])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard empty input and include every circuit in public_inputs.

Line 117 still does self.0[0], so an empty BatchUpdateInputs panics, and the method silently drops every element after the first even though it returns a Vec<[u8; 32]>.

🩹 Proposed fix
 impl BatchUpdateInputs<'_> {
     pub fn public_inputs(&self) -> Result<Vec<[u8; 32]>, ProverClientError> {
-        // Concatenate all public inputs into a single flat vector
-        Ok(vec![self.0[0].public_inputs_arr()?])
+        if self.0.is_empty() {
+            return Err(ProverClientError::GenericError(
+                "batch update inputs cannot be empty".to_string(),
+            ));
+        }
+
+        self.0
+            .iter()
+            .map(BatchUpdateCircuitInputs::public_inputs_arr)
+            .collect()
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_update/proof_inputs.rs` around lines 115
- 117, The public_inputs method currently indexes self.0[0], which panics on
empty BatchUpdateInputs and only returns the first circuit's inputs; change
public_inputs to first guard for empty inputs (e.g., return a ProverClientError
variant when self.0.is_empty()) and then iterate over all entries in self.0,
calling public_inputs_arr()? on each element and collect their [u8; 32] results
into the returned Vec<[u8; 32]>, preserving failure semantics via the existing
Result type; update references to public_inputs and use ProverClientError for
the empty-case error.

Comment on lines 72 to 89
pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
let trimmed_str = hex_str.trim_start_matches("0x");
let big_int = num_bigint::BigInt::from_str_radix(trimmed_str, 16).unwrap();
let big_int = num_bigint::BigInt::from_str_radix(trimmed_str, 16)
.map_err(|e| ProverClientError::InvalidHexString(format!("{} ({})", hex_str, e)))?;
let big_int_bytes = big_int.to_bytes_be().1;
if big_int_bytes.len() < 32 {
let mut result = [0u8; 32];
result[32 - big_int_bytes.len()..].copy_from_slice(&big_int_bytes);
result
Ok(result)
} else {
big_int_bytes.try_into().unwrap()
let len = big_int_bytes.len();
big_int_bytes.try_into().map_err(|_| {
ProverClientError::InvalidHexString(format!(
"expected at most 32 bytes, got {} for {}",
len, hex_str
))
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's read the file to confirm the code matches the review
wc -l prover/client/src/proof.rs

Repository: Lightprotocol/light-protocol

Length of output: 102


🏁 Script executed:

# Read the specific lines mentioned in the review
cat -n prover/client/src/proof.rs | sed -n '70,95p'

Repository: Lightprotocol/light-protocol

Length of output: 1162


🏁 Script executed:

# Also check the negate_g1 function mentioned in the scratchpad
cat -n prover/client/src/proof.rs | sed -n '100,120p'

Repository: Lightprotocol/light-protocol

Length of output: 989


🏁 Script executed:

# Check the Cargo.toml to understand dependencies
fd -t f "Cargo.toml" prover/client/ | head -1 | xargs cat

Repository: Lightprotocol/light-protocol

Length of output: 1402


🏁 Script executed:

# Find the negate_g1 function
cat -n prover/client/src/proof.rs | grep -A 15 "fn negate_g1"

Repository: Lightprotocol/light-protocol

Length of output: 935


🏁 Script executed:

# Check the actual behavior of trim_start_matches with multiple prefixes
python3 << 'EOF'
# Simulate Rust's trim_start_matches behavior
test_str = "0x0x01"
# trim_start_matches removes all leading characters that match ANY character in the pattern
# So "0x" as a pattern means remove all '0' and 'x' characters from the start
result = test_str.lstrip("0x")
print(f"trim_start_matches('0x') on '{test_str}' = '{result}'")

# Test with strip_prefix behavior
test_str2 = "0x0x01"
if test_str2.startswith("0x"):
    result2 = test_str2[2:]  # strip_prefix
    print(f"strip_prefix('0x') on '{test_str2}' = '{result2}'")

# Test with more examples
examples = ["0x01", "0x0x01", "0x00x01", "00xABC"]
for ex in examples:
    trimmed = ex.lstrip("0x")
    print(f"'{ex}'.lstrip('0x') = '{trimmed}'")
EOF

Repository: Lightprotocol/light-protocol

Length of output: 267


🏁 Script executed:

# Check the num_bigint documentation and test negative parsing
python3 << 'EOF'
# The behavior of BigInt::from_str_radix with a sign
# In Rust's num_bigint, from_str_radix should NOT parse the sign
# But let's check if there's a signed version

# Looking at the typical implementation:
# BigInt::from_str_radix expects a string without a sign
# If you include a '-' it would fail because '-' is not a valid hex digit

# However, the concern is valid IF the input could somehow have a sign
# Let's check: what if someone passes "0x-ff"?
test_input = "0x-ff"
trimmed = test_input.lstrip("0x")
print(f"Input: '{test_input}'")
print(f"After lstrip('0x'): '{trimmed}'")
# The lstrip would remove all leading 0s and xs, so "-ff" becomes "-ff"
# But from_str_radix("-ff", 16) would fail because '-' is not hex

# However, if the sign were parsed separately or through a different mechanism:
# BigInt has a to_bytes_be() that returns (Sign, Vec<u8>)
# If you discard the sign, you lose information about negativity
print("\nThe concern is valid IF:")
print("1. Somehow a negative BigInt is created")
print("2. to_bytes_be().1 discards the sign")
print("3. The magnitude is then treated as the full value")
EOF

Repository: Lightprotocol/light-protocol

Length of output: 266


🏁 Script executed:

# Search for any sign-handling code or documentation
fd -t f "proof.rs" prover/client/src/ --exec head -30 {} \;

Repository: Lightprotocol/light-protocol

Length of output: 938


🏁 Script executed:

# Search for tests of deserialize_hex_string_to_be_bytes
rg "deserialize_hex_string_to_be_bytes" -A 5 -B 5 prover/client/src/

Repository: Lightprotocol/light-protocol

Length of output: 4954


🏁 Script executed:

# Check if there's documentation about the expected input format
fd -t f "\.md$" prover/client/ -exec cat {} \;

Repository: Lightprotocol/light-protocol

Length of output: 218


🏁 Script executed:

# Let's verify if from_str_radix can actually parse negative numbers
python3 << 'EOF'
# In Rust, BigInt::from_str_radix signature is:
# fn from_str_radix(s: &str, radix: u32) -> Result<BigInt, ParseBigIntError>
# It does NOT support signs - only the magnitude

# The behavior is:
# "ff" with radix 16 -> BigInt(255)
# "-ff" with radix 16 -> ParseBigIntError (invalid digit '-')

# So the ACTUAL risk is not from negative numbers, but from:
# 1. Malformed prefixes like "0x0x01"
# 2. Using the sign after discarding it (which can't happen)

# The correct analysis:
print("Risk 1: trim_start_matches removes ALL leading 0s and xs")
print("  '0x0x01' -> '1' (WRONG, should be '0x01' or error)")
print("Risk 2: from_str_radix cannot parse '-' so negative risk is theoretical")
print("  But using BigUint makes intent clear (always unsigned field elements)")
print()
print("Conclusion: The trim_start_matches is the real bug")
print("The BigInt vs BigUint is a code clarity/correctness improvement")
EOF

Repository: Lightprotocol/light-protocol

Length of output: 441


🏁 Script executed:

# Check if the Gnark prover output could have any unusual prefixes
rg "0x" prover/client/src/ -B 2 -A 2 | head -50

Repository: Lightprotocol/light-protocol

Length of output: 1583


🏁 Script executed:

# Look for any integration tests or examples
fd -t f "test\|example" prover/client/ | head -10

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

# Search for GnarkProofJson to understand the input format
rg "GnarkProofJson" prover/client/src/ -B 3 -A 10

Repository: Lightprotocol/light-protocol

Length of output: 2310


🏁 Script executed:

# Let me verify the actual behavior of trim_start_matches one more time with concrete Rust
python3 << 'EOF'
# In Rust, trim_start_matches("0x") removes characters from the set {'0', 'x'}
# This is NOT the same as removing the prefix "0x"

test_cases = [
    "0x01",
    "0x0x01",
    "0xab",
    "0x0x0x01",
    "xx01",
    "0x",
    "x0x01"
]

for test in test_cases:
    # Simulate trim_start_matches by removing all leading '0' and 'x'
    trimmed = test.lstrip("0x")
    # Then parse as hex
    try:
        value = int(trimmed, 16)
        print(f"'{test}' -> '{trimmed}' -> {value:032x}")
    except ValueError as e:
        print(f"'{test}' -> '{trimmed}' -> ERROR: {e}")
EOF

Repository: Lightprotocol/light-protocol

Length of output: 443


🏁 Script executed:

# Verify what Validate::No actually does in arkworks
web_search "arkworks Validate::No CanonicalDeserialize"

Repository: Lightprotocol/light-protocol

Length of output: 122


🌐 Web query:

arkworks CanonicalDeserialize Validate::No point validation

💡 Result:

In arkworks, point validation is controlled by the validate: Validate flag passed to CanonicalDeserialize::deserialize_with_mode.

  • Validate::No means “skip post-deserialization validation”; the exact checks skipped are type-dependent (it’s whatever that type implements in Valid::check()). (arkworks.rs)
  • The “unchecked” convenience methods are literally just deserialize_with_mode(..., Validate::No):
    • deserialize_compressed_uncheckedValidate::No
    • deserialize_uncompressed_uncheckedValidate::No (docs.rs)

For elliptic-curve points, the skipped validation typically includes checks like “is this encoding a valid curve point (on-curve)?” and often “is it in the correct prime-order subgroup?” (again: depends on that point type’s Valid::check() implementation). (arkworks.rs)

If you use Validate::No / *_unchecked on untrusted input, you can end up accepting invalid points; some downstream code assumes validated points and this can lead to serious bugs (some projects even warn about potential undefined behavior when skipping these checks). (docs.rs)

Citations:


Use strip_prefix to enforce strict hex format.

The current code uses trim_start_matches("0x") on line 73, which removes all leading characters matching '0' or 'x' — not just the prefix. Inputs like 0x0x01 become 1 instead of triggering an error. This validation bypass is a bug: either trim to exactly one 0x prefix using strip_prefix("0x").unwrap_or(hex_str), or reject malformed prefixes outright.

Additionally, parse into BigUint instead of BigInt to make explicit that these are field element magnitudes (always unsigned), eliminating any future confusion about sign handling and matching bn254 semantics.

Proposed fix
 pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
-    let trimmed_str = hex_str.trim_start_matches("0x");
-    let big_int = num_bigint::BigInt::from_str_radix(trimmed_str, 16)
+    let trimmed_str = hex_str.strip_prefix("0x").unwrap_or(hex_str);
+    let big_int = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
         .map_err(|e| ProverClientError::InvalidHexString(format!("{} ({})", hex_str, e)))?;
-    let big_int_bytes = big_int.to_bytes_be().1;
+    let big_int_bytes = big_int.to_bytes_be();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
let trimmed_str = hex_str.trim_start_matches("0x");
let big_int = num_bigint::BigInt::from_str_radix(trimmed_str, 16).unwrap();
let big_int = num_bigint::BigInt::from_str_radix(trimmed_str, 16)
.map_err(|e| ProverClientError::InvalidHexString(format!("{} ({})", hex_str, e)))?;
let big_int_bytes = big_int.to_bytes_be().1;
if big_int_bytes.len() < 32 {
let mut result = [0u8; 32];
result[32 - big_int_bytes.len()..].copy_from_slice(&big_int_bytes);
result
Ok(result)
} else {
big_int_bytes.try_into().unwrap()
let len = big_int_bytes.len();
big_int_bytes.try_into().map_err(|_| {
ProverClientError::InvalidHexString(format!(
"expected at most 32 bytes, got {} for {}",
len, hex_str
))
})
}
pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
let trimmed_str = hex_str.strip_prefix("0x").unwrap_or(hex_str);
let big_int = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
.map_err(|e| ProverClientError::InvalidHexString(format!("{} ({})", hex_str, e)))?;
let big_int_bytes = big_int.to_bytes_be();
if big_int_bytes.len() < 32 {
let mut result = [0u8; 32];
result[32 - big_int_bytes.len()..].copy_from_slice(&big_int_bytes);
Ok(result)
} else {
let len = big_int_bytes.len();
big_int_bytes.try_into().map_err(|_| {
ProverClientError::InvalidHexString(format!(
"expected at most 32 bytes, got {} for {}",
len, hex_str
))
})
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof.rs` around lines 72 - 89, Replace the loose
trim_start_matches("0x") logic in deserialize_hex_string_to_be_bytes with a
strict single-prefix handling and switch to unsigned parsing: check
hex_str.strip_prefix("0x") and use the stripped slice if present (or the
original string if not) so inputs like "0x0x01" do not silently become valid;
parse into num_bigint::BigUint (use BigUint::from_str_radix or parse_bytes)
instead of BigInt, use BigUint::to_bytes_be() for the bytes, and keep the
existing zero-pad/length-check logic but map parse errors to
ProverClientError::InvalidHexString with the original hex_str included for
debugging (referencing the function deserialize_hex_string_to_be_bytes and
variables big_int/big_int_bytes which should be renamed to
big_uint/big_uint_bytes).

Comment on lines 214 to 223
if let Some(next) = account.tree_info.next_tree_info {
// SAFETY: account will always have a state Merkle tree context.
// pack_output_tree_index only panics on an address Merkle tree context.
let index = next.pack_output_tree_index(packed_accounts).unwrap();
let index = next.pack_output_tree_index(packed_accounts)?;
if output_tree_index.is_none() {
output_tree_index = Some(index);
}
} else {
// SAFETY: account will always have a state Merkle tree context.
// pack_output_tree_index only panics on an address Merkle tree context.
let index = account
.tree_info
.pack_output_tree_index(packed_accounts)
.unwrap();
let index = account.tree_info.pack_output_tree_index(packed_accounts)?;
if output_tree_index.is_none() {
output_tree_index = Some(index);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject mixed output-tree indices instead of keeping the first one.

PackedStateTreeInfos can only carry one output_tree_index, but this loop silently keeps the first computed value and ignores later mismatches. If different state proofs resolve to different output trees, we pack inconsistent instruction data with no error.

🛡️ Suggested guard
             if let Some(next) = account.tree_info.next_tree_info {
                 let index = next.pack_output_tree_index(packed_accounts)?;
-                if output_tree_index.is_none() {
-                    output_tree_index = Some(index);
-                }
+                match output_tree_index {
+                    Some(existing) if existing != index => {
+                        return Err(IndexerError::InvalidParameters(format!(
+                            "state proofs reference multiple output tree indices: {} and {}",
+                            existing, index
+                        )));
+                    }
+                    None => output_tree_index = Some(index),
+                    _ => {}
+                }
             } else {
                 let index = account.tree_info.pack_output_tree_index(packed_accounts)?;
-                if output_tree_index.is_none() {
-                    output_tree_index = Some(index);
-                }
+                match output_tree_index {
+                    Some(existing) if existing != index => {
+                        return Err(IndexerError::InvalidParameters(format!(
+                            "state proofs reference multiple output tree indices: {} and {}",
+                            existing, index
+                        )));
+                    }
+                    None => output_tree_index = Some(index),
+                    _ => {}
+                }
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if let Some(next) = account.tree_info.next_tree_info {
// SAFETY: account will always have a state Merkle tree context.
// pack_output_tree_index only panics on an address Merkle tree context.
let index = next.pack_output_tree_index(packed_accounts).unwrap();
let index = next.pack_output_tree_index(packed_accounts)?;
if output_tree_index.is_none() {
output_tree_index = Some(index);
}
} else {
// SAFETY: account will always have a state Merkle tree context.
// pack_output_tree_index only panics on an address Merkle tree context.
let index = account
.tree_info
.pack_output_tree_index(packed_accounts)
.unwrap();
let index = account.tree_info.pack_output_tree_index(packed_accounts)?;
if output_tree_index.is_none() {
output_tree_index = Some(index);
}
if let Some(next) = account.tree_info.next_tree_info {
let index = next.pack_output_tree_index(packed_accounts)?;
match output_tree_index {
Some(existing) if existing != index => {
return Err(IndexerError::InvalidParameters(format!(
"state proofs reference multiple output tree indices: {} and {}",
existing, index
)));
}
None => output_tree_index = Some(index),
_ => {}
}
} else {
let index = account.tree_info.pack_output_tree_index(packed_accounts)?;
match output_tree_index {
Some(existing) if existing != index => {
return Err(IndexerError::InvalidParameters(format!(
"state proofs reference multiple output tree indices: {} and {}",
existing, index
)));
}
None => output_tree_index = Some(index),
_ => {}
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/indexer/types/proof.rs` around lines 214 - 223, The loop
currently keeps the first computed output_tree_index and silently ignores any
subsequent differing indices; change this to detect mismatches and return an
error instead of keeping the first value. In the code paths using
account.tree_info.next_tree_info and
account.tree_info.pack_output_tree_index(packed_accounts) (and the local
variable output_tree_index), compute each index as before but if
output_tree_index is Some(prev) and the new index != prev then return a
descriptive error (e.g., Err or a custom ValidationError) indicating mixed
output-tree indices; otherwise set output_tree_index = Some(index) when it was
None. Apply this guard where PackedStateTreeInfos expects a single
output_tree_index so inconsistent state proofs are rejected.

Comment on lines +209 to +214
fn group_pda_specs<'a, V>(
specs: &[&'a PdaSpec<V>],
max_per_group: usize,
) -> Vec<Vec<&'a PdaSpec<V>>> {
assert!(max_per_group > 0, "max_per_group must be non-zero");
if specs.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

The assert! contradicts this PR's goal of removing panic-prone patterns.

Given the PR objective is to harden runtime safety by replacing unwrap/panic with Result-based handling, this assert! stands out. While the current callers always pass MAX_PDAS_PER_IX = 8, making this safe in practice, the pattern is inconsistent with the broader refactoring philosophy.

Consider using debug_assert! if this is purely a programmer invariant check, or returning a Result if this function could eventually be exposed more broadly.

Option A: Use debug_assert! for internal invariant
 fn group_pda_specs<'a, V>(
     specs: &[&'a PdaSpec<V>],
     max_per_group: usize,
 ) -> Vec<Vec<&'a PdaSpec<V>>> {
-    assert!(max_per_group > 0, "max_per_group must be non-zero");
+    debug_assert!(max_per_group > 0, "max_per_group must be non-zero");
     if specs.is_empty() {
         return Vec::new();
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn group_pda_specs<'a, V>(
specs: &[&'a PdaSpec<V>],
max_per_group: usize,
) -> Vec<Vec<&'a PdaSpec<V>>> {
assert!(max_per_group > 0, "max_per_group must be non-zero");
if specs.is_empty() {
fn group_pda_specs<'a, V>(
specs: &[&'a PdaSpec<V>],
max_per_group: usize,
) -> Vec<Vec<&'a PdaSpec<V>>> {
debug_assert!(max_per_group > 0, "max_per_group must be non-zero");
if specs.is_empty() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/load_accounts.rs` around lines 209 - 214, The
function group_pda_specs currently uses assert!(max_per_group > 0, ...) which
introduces a panic; replace this with debug_assert!(max_per_group > 0, ...) to
keep the developer invariant check during testing while avoiding release-time
panics, and keep the signature and return type unchanged (call sites that pass
MAX_PDAS_PER_IX will be unaffected); if you intend this to be a
public/runtime-validated API instead, alternatively change group_pda_specs to
return Result<Vec<Vec<&PdaSpec<V>>>, SomeError> and explicitly return an Err
when max_per_group == 0, but prefer debug_assert! for internal invariants.

Comment on lines +69 to +70
let path = "cli/test_bin/run test-validator";
let mut path = format!("{}/{}", project_root.trim(), path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: Variable name path is misleading.

The variable starts as a path but quickly becomes a full command string with arguments. Consider renaming to command or cmd_line for clarity.

💡 Suggested rename
-    let path = "cli/test_bin/run test-validator";
-    let mut path = format!("{}/{}", project_root.trim(), path);
+    let base_cmd = "cli/test_bin/run test-validator";
+    let mut command = format!("{}/{}", project_root.trim(), base_cmd);

Then update all subsequent path.push_str(...) to command.push_str(...).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 69 - 70, The local
variable named `path` is misleading because it becomes a full command string;
rename both occurrences (the initial `let path = "cli/test_bin/run
test-validator";` and the subsequent `let mut path = format!("{}/{}",
project_root.trim(), path);`) to `command` (e.g., `let command = ...` and `let
mut command = ...`) and update all subsequent uses like `path.push_str(...)` to
`command.push_str(...)` so the variable name reflects it holds a command/command
line rather than a filesystem path.

Comment on lines 113 to 126
let mut child = Command::new("sh")
.arg("-c")
.arg(path)
.stdin(Stdio::null())
.stdout(Stdio::inherit())
.stderr(Stdio::inherit())
.spawn()?;
let status = child.wait()?;
if !status.success() {
return Err(io::Error::other(format!(
"validator CLI exited with status {}",
status
)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Blocking wait() call inside async function will stall the tokio runtime.

When child.wait() is called on line 120, it performs a blocking syscall that halts the current tokio worker thread. This prevents other async tasks from making progress on that thread—particularly problematic if the validator takes time to start.

Consider using tokio::process::Command which provides an async-compatible wait():

🔧 Proposed fix using tokio::process

Update the import at the top:

 use std::{
     io,
-    process::{Command, Stdio},
+    process::Stdio,
 };
+use tokio::process::Command;

The spawn and wait calls remain the same syntactically, but wait() becomes .await-able:

         let mut child = Command::new("sh")
             .arg("-c")
             .arg(path)
             .stdin(Stdio::null())
             .stdout(Stdio::inherit())
             .stderr(Stdio::inherit())
             .spawn()?;
-        let status = child.wait()?;
+        let status = child.wait().await?;

As per coding guidelines: "Use async functions with #[tokio::main] for asynchronous Rust code in the Light Protocol client library"—this implies async functions should avoid blocking the runtime.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 113 - 126, The
blocking std::process::Command.wait() call in local_test_validator.rs stalls the
tokio runtime; replace std::process::Command and std::process::Stdio with
tokio::process::Command and tokio::process::Stdio, spawn the child the same way
you do now, and await the child's termination with child.wait().await (checking
the returned ExitStatus.success()). Update imports accordingly and remove the
blocking child.wait() call so the async function does not perform blocking IO.

Comment on lines +128 to +136
let child = Command::new("sh")
.arg("-c")
.arg(path)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn()?;
std::mem::drop(child);
tokio::time::sleep(tokio::time::Duration::from_secs(config.wait_time)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: Explicit drop is unnecessary; consider a brief comment on intent.

std::mem::drop(child) on line 135 is functionally equivalent to just letting child go out of scope. The explicit drop doesn't change behavior—it's stylistic. If the intent is to communicate "we're intentionally detaching this process," a comment would be clearer than the explicit drop.

That said, this is a nitpick—the code works correctly.

💡 Optional: Replace explicit drop with comment
         let child = Command::new("sh")
             .arg("-c")
             .arg(path)
             .stdin(Stdio::null())
             .stdout(Stdio::null())
             .stderr(Stdio::null())
             .spawn()?;
-        std::mem::drop(child);
+        // Child handle dropped intentionally; validator runs detached in background
+        drop(child);
         tokio::time::sleep(tokio::time::Duration::from_secs(config.wait_time)).await;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let child = Command::new("sh")
.arg("-c")
.arg(path)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn()?;
std::mem::drop(child);
tokio::time::sleep(tokio::time::Duration::from_secs(config.wait_time)).await;
let child = Command::new("sh")
.arg("-c")
.arg(path)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn()?;
// Child handle dropped intentionally; validator runs detached in background
drop(child);
tokio::time::sleep(tokio::time::Duration::from_secs(config.wait_time)).await;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 128 - 136, The
explicit std::mem::drop(child) after spawning the process
(Command::new(...).spawn() -> child) is unnecessary; remove the drop and instead
add a brief comment above the spawn or before awaiting
tokio::time::sleep(tokio::time::Duration::from_secs(config.wait_time)).await
explaining the intent to detach/ignore the child process (e.g., "intentionally
detaching spawned child; not awaiting its exit"), so the purpose is clear while
relying on scope-based drop for cleanup.

@sergeytimoshin sergeytimoshin force-pushed the sergey/runtime-safety-hardening branch from 131375b to 1012099 Compare March 14, 2026 21:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
sdk-libs/client/src/interface/load_accounts.rs (1)

213-213: ⚠️ Potential issue | 🟡 Minor

Replace panicing invariant check in group_pda_specs (already reported).

Line 213 uses assert!, which can panic in release and reintroduces panic-prone behavior in a runtime-safety hardening PR. Prefer debug_assert! for internal invariants, or return a Result for runtime validation.

#!/bin/bash
# Verify the panic-prone invariant is still present.
rg -n -C2 'assert!\(max_per_group > 0' sdk-libs/client/src/interface/load_accounts.rs
Suggested minimal fix
-    assert!(max_per_group > 0, "max_per_group must be non-zero");
+    debug_assert!(max_per_group > 0, "max_per_group must be non-zero");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/load_accounts.rs` at line 213, Replace the
panic-prone assert in group_pda_specs: instead of assert!(max_per_group > 0,
...), use debug_assert!(max_per_group > 0, ...) if this is an internal
invariant, or change group_pda_specs to return a Result and validate
max_per_group at runtime (returning an Err when <= 0) so callers can handle the
error; update callers accordingly if you opt for the Result approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Around line 252-259: The current try_join_all in fetch_individual_proofs and
fetch_proof_batches issues all RPCs at once; replace those unbounded fans with a
bounded concurrent stream using futures::stream::iter(...) +
TryStreamExt::try_buffer_unordered(CONCURRENCY_LIMIT).try_collect().await so
only CONCURRENCY_LIMIT in-flight get_validity_proof calls run at once; pick a
sane limit (e.g., introduce a constant MAX_CONCURRENT_PROOF_RPCS = 32 or make it
configurable) and keep the existing mapping that calls
indexer.get_validity_proof(...).await.map(|r| r.value) but wrap that in the
stream element (use cloned hashes or iterate over hashes.clone()) and use
try_buffer_unordered/try_collect to preserve Result error propagation; apply the
same change in both fetch_individual_proofs and fetch_proof_batches (referencing
those function names and reuse MAX_PDAS_PER_IX/MAX_ATAS_PER_IX only for
grouping, not for concurrency).

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 2510-2516: Replace the unsafe .unwrap() calls on
response_result.text().await and deserialize_gnark_proof_json(&body) with proper
error propagation consistent with the surrounding code: call
response_result.text().await? (or map_err into IndexerError::CustomError) to
assign body and then use deserialize_gnark_proof_json(&body).map_err(|e|
IndexerError::CustomError(e.to_string()))? to produce proof_json so failures
return IndexerError rather than panicking; update the function signature to
return Result where necessary to support the ? operator.

---

Duplicate comments:
In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Line 213: Replace the panic-prone assert in group_pda_specs: instead of
assert!(max_per_group > 0, ...), use debug_assert!(max_per_group > 0, ...) if
this is an internal invariant, or change group_pda_specs to return a Result and
validate max_per_group at runtime (returning an Err when <= 0) so callers can
handle the error; update callers accordingly if you opt for the Result approach.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ae0a0a1d-a04c-4e6c-bc59-d4b6c1826804

📥 Commits

Reviewing files that changed from the base of the PR and between 65676df and 1012099.

⛔ Files ignored due to path filters (4)
  • forester-utils/src/account_zero_copy.rs is excluded by none and included by none
  • program-tests/utils/src/e2e_test_env.rs is excluded by none and included by none
  • program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
  • program-tests/utils/src/test_batch_forester.rs is excluded by none and included by none
📒 Files selected for processing (2)
  • sdk-libs/client/src/interface/load_accounts.rs
  • sdk-libs/program-test/src/indexer/test_indexer.rs

@sergeytimoshin sergeytimoshin force-pushed the sergey/runtime-safety-hardening branch from 1012099 to 0b04dbc Compare March 14, 2026 21:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
sdk-libs/client/src/rpc/client.rs (2)

342-344: 🧹 Nitpick | 🔵 Trivial

Use tracing::warn! instead of println! for consistency.

The rest of this file uses tracing::warn for logging (see lines 213, 394, 466). Using println! here bypasses structured logging and won't appear in your observability pipeline. The same applies to line 471.

📝 Suggested fix
                         UiInstruction::Parsed(_) => {
-                            println!("Parsed instructions are not implemented yet");
+                            warn!("Parsed instructions are not implemented yet");
                         }

Apply the same change at line 471.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/rpc/client.rs` around lines 342 - 344, The match arm for
UiInstruction::Parsed(_) currently uses println! which bypasses structured
logging; replace that println! with tracing::warn!("Parsed instructions are not
implemented yet") and do the same for the other println! usage noted later in
this file; ensure the tracing crate is in scope (use tracing::warn) so the
warnings integrate with the existing tracing-based observability used by the
file.

906-906: ⚠️ Potential issue | 🟠 Major

Use .first() instead of unbounded indexing to handle potential empty event vectors defensively.

At line 906, e[0] will panic if the vector is empty. While the logic in event_from_light_transaction suggests non-empty vectors are returned when Some(...) is present, defensive programming—especially in a PR focused on hardening runtime safety—should avoid relying on this assumption.

Replace the unbounded index with .first() to handle the edge case explicitly:

Suggested fix
-        let event = parsed_event.map(|(e, signature, slot)| (e[0].event.clone(), signature, slot));
+        let event = match parsed_event {
+            Some((e, signature, slot)) => {
+                let first = e.first().ok_or_else(|| {
+                    RpcError::CustomError("No events found in transaction".to_string())
+                })?;
+                Some((first.event.clone(), signature, slot))
+            }
+            None => None,
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/rpc/client.rs` at line 906, The mapping that constructs
`event` uses `e[0]` which can panic on empty vectors; change the closure over
`parsed_event` to use `e.first()` and propagate the absence safely. Concretely,
replace the `parsed_event.map(|(e, signature, slot)| (e[0].event.clone(),
signature, slot))` logic with an approach that uses `e.first()` (e.g.
`parsed_event.and_then(|(e, signature, slot)| e.first().cloned().map(|ev|
(ev.event.clone(), signature, slot)))`) so empty `e` yields `None` instead of
panicking; update any downstream handling accordingly (this touches the
`parsed_event` -> `event` mapping and the `event_from_light_transaction` flow).
sdk-libs/program-test/src/indexer/test_indexer.rs (1)

2498-2531: ⚠️ Potential issue | 🟠 Major

Retry loop can spin forever on non-success HTTP responses.

When send() succeeds but status is non-2xx, retries is never decremented, so this can loop indefinitely and hammer the prover.

🛠️ Suggested fix
             let mut retries = 3;
             while retries > 0 {
                 let response_result = client
                     .post(format!("{}{}", SERVER_ADDRESS, PROVE_PATH))
                     .header("Content-Type", "text/plain; charset=utf-8")
                     .body(json_payload.clone())
                     .send()
                     .await;
                 if let Ok(response_result) = response_result {
                     if response_result.status().is_success() {
                         let body = response_result.text().await.unwrap();
                         let proof_json = deserialize_gnark_proof_json(&body).unwrap();
                         let (proof_a, proof_b, proof_c) = proof_from_json_struct(proof_json)
                             .map_err(|e| IndexerError::CustomError(e.to_string()))?;
                         let (proof_a, proof_b, proof_c) =
                             compress_proof(&proof_a, &proof_b, &proof_c)
                                 .map_err(|e| IndexerError::CustomError(e.to_string()))?;
                         return Ok(ValidityProofWithContext {
                             accounts: account_proof_inputs,
                             addresses: address_proof_inputs,
                             proof: CompressedProof {
                                 a: proof_a,
                                 b: proof_b,
                                 c: proof_c,
                             }
                             .into(),
                         });
+                    } else {
+                        tokio::time::sleep(Duration::from_secs(5)).await;
+                        retries -= 1;
                     }
                 } else {
                     println!("Error: {:#?}", response_result);
                     tokio::time::sleep(Duration::from_secs(5)).await;
                     retries -= 1;
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 2498 - 2531,
The retry loop using the mutable retries variable can spin forever when
client.post.send() returns Ok(response_result) but response_result.status() is
non-success because retries is only decremented in the Err branch; modify the
loop in test_indexer.rs so that whenever the response is Ok but
!response_result.status().is_success() you decrement retries and await a backoff
(e.g., tokio::time::sleep(Duration::from_secs(5))). Concretely, inside the while
loop after checking if let Ok(response_result) = response_result, add an else
branch for the non-success status that logs the status, sleeps, and decrements
retries (mirroring the current Err behavior) so both network errors and non-2xx
responses progress toward loop termination.
sdk-libs/client/src/interface/initialize_config.rs (1)

120-122: 🧹 Nitpick | 🔵 Trivial

Consider using Discriminator trait instead of hardcoded bytes.

The coding guidelines recommend using anchor_lang::Discriminator trait for instruction discriminators. While this is pre-existing code, if you're touching this area anyway:

-        // Anchor discriminator for "initialize_compression_config"
-        // SHA256("global:initialize_compression_config")[..8]
-        const DISCRIMINATOR: [u8; 8] = [133, 228, 12, 169, 56, 76, 222, 61];
+        // Use Anchor's Discriminator trait when the anchor feature is enabled
+        #[cfg(feature = "anchor")]
+        let discriminator = <YourInstructionType as anchor_lang::Discriminator>::DISCRIMINATOR;

This would make the code more maintainable if the instruction signature changes. As per coding guidelines: "Use anchor_lang::Discriminator trait to get 8-byte instruction discriminators from Anchor programs."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/initialize_config.rs` around lines 120 - 122,
Replace the hardcoded DISCRIMINATOR bytes for "initialize_compression_config"
with Anchor's Discriminator helper: remove the const DISCRIMINATOR and obtain
the 8-byte discriminator at runtime from the instruction type using the
anchor_lang::Discriminator implementation for the corresponding instruction
struct (e.g., the InitializeCompressionConfig instruction struct) and take the
first 8 bytes of that discriminator; update any usages that referenced
DISCRIMINATOR to call the Discriminator::discriminator method on the instruction
type instead so changes to the instruction signature automatically produce the
correct bytes.
forester/src/forester_status.rs (1)

674-685: ⚠️ Potential issue | 🟡 Minor

Queue parse failures are silently masked as empty queues.

At Line 674 and Line 729, deserialization errors are swallowed with .ok().unwrap_or((0, 0)), which can misreport a broken queue as “0 pending / 0 capacity”. Please at least emit a warning when parsing fails (or surface None stats) to avoid false operational signals.

Suggested patch (apply to both StateV1 and AddressV1 branches)
-                    parse_hash_set_from_bytes::<QueueAccount>(&acc.data)
-                        .ok()
-                        .map(|hs| {
-                            let len = hs
-                                .iter()
-                                .filter(|(_, cell)| cell.sequence_number.is_none())
-                                .count() as u64;
-                            let cap = hs.get_capacity() as u64;
-                            (len, cap)
-                        })
-                        .unwrap_or((0, 0))
+                    match parse_hash_set_from_bytes::<QueueAccount>(&acc.data) {
+                        Ok(hs) => {
+                            let len = hs
+                                .iter()
+                                .filter(|(_, cell)| cell.sequence_number.is_none())
+                                .count() as u64;
+                            let cap = hs.get_capacity() as u64;
+                            (len, cap)
+                        }
+                        Err(e) => {
+                            warn!("Failed to parse queue account {}: {:?}", tree.queue, e);
+                            (0, 0)
+                        }
+                    }

Also applies to: 729-740

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/forester_status.rs` around lines 674 - 685, The code currently
swallows deserialization errors from
parse_hash_set_from_bytes::<QueueAccount>(&acc.data) by using
.ok().unwrap_or((0,0)), causing broken queues to be reported as 0/0; change this
to detect Err and emit a warning (e.g. tracing::warn! or process logger)
including the error and account identity, or return None/Option for the stats so
callers can treat parse failures explicitly; apply the same change in both
StateV1 and AddressV1 branches where parse_hash_set_from_bytes and the
(.ok().map(...).unwrap_or((0,0))) pattern is used to compute (len, cap),
replacing the silent unwrap with explicit error logging or Option propagation.
prover/client/src/errors.rs (1)

6-6: ⚠️ Potential issue | 🟠 Major

This widens a public exhaustive enum, creating a breaking API change that needs explicit release communication.

ProverClientError is public and lacks #[non_exhaustive], so adding variants (IntegerConversion, JsonSerialization, ProcessStart, ProcessWait, ProjectRootNotFound, HealthCheckFailed) breaks exhaustive downstream match statements. Combined with the changed return types (e.g., to_string() now returns Result<String, ProverClientError>), this is a significant API shift.

While the version bump to 8.0.0 reflects major release intent, there's no CHANGELOG, README, or migration guide in prover/client/ documenting these breaking changes. Before release, add explicit release notes or migration guidance so downstream users understand what changed and how to adapt.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/errors.rs` at line 6, Public enum ProverClientError was
widened which breaks downstream exhaustive matches; either restore backwards
compatibility or mark the enum non-exhaustive and document the breaking changes:
add #[non_exhaustive] to ProverClientError (so consumers must add a fallback
arm), or revert the new variants until you can coordinate a major release, and
create clear release notes/migration guide in prover/client/ (CHANGELOG, README)
describing the new variants (IntegerConversion, JsonSerialization, ProcessStart,
ProcessWait, ProjectRootNotFound, HealthCheckFailed) and the changed return
types (e.g., to_string() -> Result<String, ProverClientError>) with examples of
how callers should update match statements and error handling.
prover/client/src/proof_types/batch_update/proof_inputs.rs (1)

122-132: ⚠️ Potential issue | 🔴 Critical

Validate all batch vector lengths before computing the updated root.

This function zips leaves, merkle_proofs, and path_indices, but Line 182 still indexes tx_hashes[i] and Lines 199-220 serialize the full vectors afterward. A short tx_hashes panics; short proofs or path indices silently compute a root for only a prefix of the batch.

Proposed fix
 pub fn get_batch_update_inputs<const HEIGHT: usize>(
     current_root: [u8; 32],
     tx_hashes: Vec<[u8; 32]>,
     leaves: Vec<[u8; 32]>,
@@
     batch_size: u32,
     previous_changelogs: &[ChangelogEntry<HEIGHT>],
 ) -> Result<(BatchUpdateCircuitInputs, Vec<ChangelogEntry<HEIGHT>>), ProverClientError> {
+    let expected = batch_size as usize;
+    if leaves.len() != expected
+        || old_leaves.len() != expected
+        || merkle_proofs.len() != expected
+        || tx_hashes.len() != expected
+        || path_indices.len() != expected
+    {
+        return Err(ProverClientError::GenericError(format!(
+            "Input vector length mismatch: leaves={}, old_leaves={}, merkle_proofs={}, tx_hashes={}, path_indices={}, expected batch_size={}",
+            leaves.len(),
+            old_leaves.len(),
+            merkle_proofs.len(),
+            tx_hashes.len(),
+            path_indices.len(),
+            batch_size,
+        )));
+    }
+
     let mut new_root = [0u8; 32];

Based on learnings: Batch nullify ZKP verification must validate that public_input_hash equals Hash([old_root, new_root, leaves_hash_chain]), verify nullifiers are correctly computed as Hash(Leaves[i], PathIndices[i], TxHashes[i]) for each position in batch.

Also applies to: 139-184, 199-220

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_update/proof_inputs.rs` around lines 122
- 132, The function get_batch_update_inputs must validate that all batch vectors
(tx_hashes, leaves, merkle_proofs, path_indices, old_leaves) have lengths
consistent with each other and with batch_size before any indexing or computing
the updated root; add an early check (and return an error) ensuring
tx_hashes.len() == leaves.len() == merkle_proofs.len() == path_indices.len() ==
old_leaves.len() and that these equal batch_size as usize (or at least are >=
batch_size) so subsequent zips/indices cannot panic or silently truncate. After
validation, compute nullifiers per position using the exact relationship
Hash(leaves[i], path_indices[i], tx_hashes[i]) and only then serialize/return
the full vectors (remove any partial-prefix logic). Also ensure the
public_input_hash computation uses Hash([old_root, new_root, leaves_hashchain])
after the validated new_root is produced. Reference get_batch_update_inputs,
BatchUpdateCircuitInputs, and ChangelogEntry<HEIGHT> when making these changes.
♻️ Duplicate comments (7)
sdk-libs/program-test/src/indexer/test_indexer.rs (1)

2508-2514: ⚠️ Potential issue | 🔴 Critical

Unresolved panic path remains in proof response parsing (Line 2508, Line 2509).

unwrap() is still used on response body extraction and GNARK JSON deserialization; malformed/proxy/error payloads will still panic here.

♻️ Suggested fix
-                        let body = response_result.text().await.unwrap();
-                        let proof_json = deserialize_gnark_proof_json(&body).unwrap();
+                        let body = response_result
+                            .text()
+                            .await
+                            .map_err(|e| IndexerError::CustomError(e.to_string()))?;
+                        let proof_json = deserialize_gnark_proof_json(&body)
+                            .map_err(|e| IndexerError::CustomError(e.to_string()))?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 2508 - 2514,
The code currently uses unwrap() on response_result.text().await and
deserialize_gnark_proof_json(&body) which can panic on malformed or error
payloads; replace these unwraps with proper error propagation by mapping
failures into IndexerError (e.g., use .await.map_err(|e|
IndexerError::CustomError(e.to_string())) for response_result.text(), and change
deserialize_gnark_proof_json(&body).unwrap() to
deserialize_gnark_proof_json(&body).map_err(|e|
IndexerError::CustomError(e.to_string())) so that the subsequent
proof_from_json_struct and compress_proof error handling remains consistent.
sdk-libs/client/src/local_test_validator.rs (1)

69-105: ⚠️ Potential issue | 🟠 Major

Replace sh -c command-string execution with argumentized async process spawning.

Line 69–105 builds one shell command string from dynamic values, then Line 113–115 executes it via sh -c; this is injection-prone and breaks on shell-special characters. Also, Line 120 uses blocking child.wait()? inside an async function, which can stall Tokio workers.

🔧 Proposed refactor (safe args + non-blocking wait)
 use std::{
     io,
-    process::{Command, Stdio},
+    process::Stdio,
 };
+use tokio::process::Command;

@@
-    let path = "cli/test_bin/run test-validator";
-    let mut path = format!("{}/{}", project_root.trim(), path);
+    let cli_path = format!("{}/cli/test_bin/run", project_root.trim());
+    let mut args = vec!["test-validator".to_string()];

     if !config.enable_indexer {
-        path.push_str(" --skip-indexer");
+        args.push("--skip-indexer".to_string());
     }

     if let Some(limit_ledger_size) = config.limit_ledger_size {
-        path.push_str(&format!(" --limit-ledger-size {}", limit_ledger_size));
+        args.push("--limit-ledger-size".to_string());
+        args.push(limit_ledger_size.to_string());
     }

     for sbf_program in config.sbf_programs.iter() {
-        path.push_str(&format!(
-            " --sbf-program {} {}",
-            sbf_program.0, sbf_program.1
-        ));
+        args.push("--sbf-program".to_string());
+        args.push(sbf_program.0.clone());
+        args.push(sbf_program.1.clone());
     }

     for upgradeable_program in config.upgradeable_programs.iter() {
-        path.push_str(&format!(
-            " --upgradeable-program {} {} {}",
-            upgradeable_program.program_id,
-            upgradeable_program.program_path,
-            upgradeable_program.upgrade_authority
-        ));
+        args.push("--upgradeable-program".to_string());
+        args.push(upgradeable_program.program_id.clone());
+        args.push(upgradeable_program.program_path.clone());
+        args.push(upgradeable_program.upgrade_authority.clone());
     }

     if !config.enable_prover {
-        path.push_str(" --skip-prover");
+        args.push("--skip-prover".to_string());
     }

     if config.use_surfpool {
-        path.push_str(" --use-surfpool");
+        args.push("--use-surfpool".to_string());
     }

     for arg in config.validator_args.iter() {
-        path.push_str(&format!(" {}", arg));
+        args.push(arg.clone());
     }

-    println!("Starting validator with command: {}", path);
+    println!("Starting validator: {} {}", cli_path, args.join(" "));

@@
-        let mut child = Command::new("sh")
-            .arg("-c")
-            .arg(path)
+        let mut child = Command::new(&cli_path)
+            .args(&args)
             .stdin(Stdio::null())
             .stdout(Stdio::inherit())
             .stderr(Stdio::inherit())
             .spawn()?;
-        let status = child.wait()?;
+        let status = child.wait().await?;
#!/bin/bash
# Verify no shell-string execution and no blocking wait in async function.
rg -n --type=rust 'Command::new\("sh"\)|\.arg\("-c"\)|child\.wait\(\)\?' sdk-libs/client/src/local_test_validator.rs

As per coding guidelines: "Use async functions with #[tokio::main] for asynchronous Rust code in the Light Protocol client library".

Also applies to: 113-121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 69 - 105, The code
builds a single shell command string in the variable path using config flags
(enable_indexer, limit_ledger_size, sbf_programs, upgradeable_programs,
enable_prover, use_surfpool, validator_args) and later executes it with sh -c
and uses blocking child.wait() inside an async context; change this to construct
a Vec<String>/Vec<&str> of argv entries (program path as the command and each
flag/value as separate args using the same config fields and
sbf_program/program_id/program_path/upgrade_authority values), spawn the process
with tokio::process::Command
(Command::new(program_path).args(&args).spawn().await?), and replace the
blocking child.wait()? usage with the async child.wait().await (or child.await)
to avoid shell injection and blocking the Tokio reactor. Ensure each sbf_program
and upgradeable_program adds separate args (no shell interpolation) and pass
validator_args elements as individual args.
prover/client/src/proof.rs (2)

167-170: ⚠️ Potential issue | 🟠 Major

Validate the G1 point before negating it.

Line 169 still deserializes prover-supplied bytes with Validate::No, so off-curve or wrong-subgroup points can pass through negate_g1.

Proposed fix
-    let g1: G1 = G1::deserialize_with_mode(g1_le.as_slice(), Compress::No, Validate::No)
+    let g1: G1 = G1::deserialize_with_mode(g1_le.as_slice(), Compress::No, Validate::Yes)
         .map_err(|e| ProverClientError::InvalidProofData(e.to_string()))?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof.rs` around lines 167 - 170, The function negate_g1
currently deserializes prover bytes using G1::deserialize_with_mode(...,
Validate::No) which allows off-curve or wrong-subgroup points; change the
deserialization call in negate_g1 to use a validating mode (e.g., Validate::Yes)
so the G1 point is validated during deserialization before any negation, and
propagate the validation error via ProverClientError::InvalidProofData as
already done.

72-89: ⚠️ Potential issue | 🟠 Major

Strip at most one 0x prefix when decoding proof coordinates.

Line 73 still uses trim_start_matches("0x"), which strips repeated 0x prefixes. Malformed input like 0x0x01 becomes 01 and parses successfully instead of failing.

Proposed fix
 pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
-    let trimmed_str = hex_str.trim_start_matches("0x");
-    let big_int = num_bigint::BigInt::from_str_radix(trimmed_str, 16)
+    let trimmed_str = hex_str.strip_prefix("0x").unwrap_or(hex_str);
+    let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
         .map_err(|e| ProverClientError::InvalidHexString(format!("{} ({})", hex_str, e)))?;
-    let big_int_bytes = big_int.to_bytes_be().1;
-    if big_int_bytes.len() < 32 {
+    let big_uint_bytes = big_uint.to_bytes_be();
+    if big_uint_bytes.len() < 32 {
         let mut result = [0u8; 32];
-        result[32 - big_int_bytes.len()..].copy_from_slice(&big_int_bytes);
+        result[32 - big_uint_bytes.len()..].copy_from_slice(&big_uint_bytes);
         Ok(result)
     } else {
-        let len = big_int_bytes.len();
-        big_int_bytes.try_into().map_err(|_| {
+        let len = big_uint_bytes.len();
+        big_uint_bytes.try_into().map_err(|_| {
             ProverClientError::InvalidHexString(format!(
                 "expected at most 32 bytes, got {} for {}",
                 len, hex_str
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof.rs` around lines 72 - 89, The function
deserialize_hex_string_to_be_bytes currently uses trim_start_matches("0x"),
which removes every leading "0x" and lets malformed inputs like "0x0x01" parse;
change the logic to strip at most one "0x" prefix by replacing
trim_start_matches with a single-strip approach (e.g. use
strip_prefix("0x").unwrap_or(hex_str) or equivalent) when computing trimmed_str
in deserialize_hex_string_to_be_bytes so only the first "0x" is removed and
inputs with repeated prefixes fail to parse.
prover/client/src/prover.rs (2)

57-65: ⚠️ Potential issue | 🟠 Major

Check the child exit status and HTTP status before returning healthy.

Line 58 only waits for the child process; it never verifies output.status.success(). Line 84 treats any Ok(Response) as healthy, including 4xx/5xx responses. That can make spawn_prover() return Ok(()) after a failed startup or a failing health endpoint.

Proposed fix
-        command
-            .wait_with_output()
-            .map_err(|error| ProverClientError::ProcessWait(error.to_string()))?;
+        let output = command
+            .wait_with_output()
+            .map_err(|error| ProverClientError::ProcessWait(error.to_string()))?;
+
+        if !output.status.success() {
+            return Err(ProverClientError::ProcessWait(format!(
+                "start-prover exited with status {}: {}",
+                output.status,
+                String::from_utf8_lossy(&output.stderr),
+            )));
+        }
@@
-            Ok(_) => {
+            Ok(response) if response.status().is_success() => {
                 result = true;
                 break;
             }
-            Err(_) => {
+            Ok(_) | Err(_) => {
                 sleep(Duration::from_secs(timeout as u64));
             }

Also applies to: 79-85

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 57 - 65, spawn_prover currently
waits for the child via command.wait_with_output() but doesn't verify the
child's exit status, and health_check treats any Ok(Response) as success; update
spawn_prover to inspect the output.status.success() from
command.wait_with_output() and return an error (ProverClientError::ProcessExit
or similar) if the process failed, and update health_check to examine the HTTP
response status (use response.status().is_success()) rather than just Ok/Err so
non-2xx responses are considered failures; key symbols to modify:
command.wait_with_output(), spawn_prover(), and health_check() (check Response
handling to require status.is_success()) ensuring callers only return Ok(())
when both the process exit is successful and health_check reports a 2xx status.

1-5: ⚠️ Potential issue | 🟠 Major

Use Tokio process/sleep APIs in this async startup path.

Lines 58 and 89 still use blocking std APIs inside async functions. With the 120-second retry window, a stalled prover startup can pin Tokio worker threads for a long time.

Proposed fix
 use std::{
-    process::Command,
     sync::atomic::{AtomicBool, Ordering},
-    thread::sleep,
     time::Duration,
 };
 
+use tokio::{process::Command, time::sleep};
+
@@
-        command
-            .wait_with_output()
-            .map_err(|error| ProverClientError::ProcessWait(error.to_string()))?;
+        command
+            .wait_with_output()
+            .await
+            .map_err(|error| ProverClientError::ProcessWait(error.to_string()))?;
@@
-                sleep(Duration::from_secs(timeout as u64));
+                sleep(Duration::from_secs(timeout as u64)).await;

Also applies to: 51-59, 75-90

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 1 - 5, The startup path is using
blocking std APIs (std::process::Command and thread::sleep) inside async code
which can block Tokio workers; replace uses of std::process::Command with
tokio::process::Command and call .output().await (or spawn and await) instead of
blocking output(), and replace thread::sleep/Duration with
tokio::time::sleep(Duration::from_secs(...)). Update any loops that poll with
AtomicBool to use async-friendly waits (await tokio::time::sleep) so the
120-second retry loop yields to the runtime; ensure you import
tokio::process::Command and tokio::time::sleep and change any synchronous calls
at the sites referencing Command and sleep (around the reported locations) to
their async equivalents and .await them.
prover/client/src/proof_types/batch_update/proof_inputs.rs (1)

115-117: ⚠️ Potential issue | 🔴 Critical

Return every batch public input instead of self.0[0].

Line 117 still panics on empty input and silently drops every BatchUpdateCircuitInputs after the first, even though this API returns a Vec<[u8; 32]>.

Proposed fix
 impl BatchUpdateInputs<'_> {
     pub fn public_inputs(&self) -> Result<Vec<[u8; 32]>, ProverClientError> {
-        // Concatenate all public inputs into a single flat vector
-        Ok(vec![self.0[0].public_inputs_arr()?])
+        if self.0.is_empty() {
+            return Err(ProverClientError::GenericError(
+                "batch update inputs cannot be empty".to_string(),
+            ));
+        }
+
+        self.0
+            .iter()
+            .map(BatchUpdateCircuitInputs::public_inputs_arr)
+            .collect()
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_update/proof_inputs.rs` around lines 115
- 117, The current public_inputs method only returns the first element
(self.0[0]) and will panic on an empty vector; change it to iterate over all
BatchUpdateCircuitInputs in self.0, call public_inputs_arr() on each, and
collect the results into a Vec<[u8; 32]> (propagating any error returned by
public_inputs_arr) instead of indexing [0]; also handle the empty-case by
returning an empty Vec rather than panicking. Use the public_inputs function,
the BatchUpdateCircuitInputs entries (self.0) and the public_inputs_arr() method
as the points to modify.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Around line 180-186: Before iterating with zip, validate that
old_leaves.len(), leaves.len(), and merkle_proofs.len() all equal batch_size and
return a hard error if any mismatch; specifically add a guard near the start of
the function that checks these lengths and returns
ProverClientError::InvalidProofData (or another appropriate error) when they
differ so you never silently build a partial batch (this covers the loop that
consumes old_leaves, leaves, and merkle_proofs and the subsequent
merkle_proof.as_slice().try_into() conversion).

In `@sdk-libs/client/src/indexer/types/queue.rs`:
- Around line 159-210: The method reconstruct_proof_with_lookup currently
redundantly calls validate_proof_height::<HEIGHT>() even though its only caller,
reconstruct_proofs, already performs that check; remove the
validate_proof_height::<HEIGHT>() invocation and its error propagation from
reconstruct_proof_with_lookup so reconstruct_proofs remains the single
validation point (leave reconstruct_proofs and validate_proof_height unchanged),
ensuring reconstruct_proof_with_lookup assumes the caller has validated the
height (or add a brief comment noting that responsibility).

In `@sdk-libs/client/src/interface/instructions.rs`:
- Line 104: Unify the builder return type by introducing a single module-level
alias or error enum (e.g., type BuilderResult = Result<Instruction, Box<dyn
std::error::Error + Send + Sync>> or a custom InstructionBuildError) and replace
the mixed signatures (the functions currently returning
std::io::Result<Instruction> and those returning Result<Instruction, Box<dyn
std::error::Error>>) with that alias; update all builder function signatures and
their internal error returns to use BuilderResult and adjust imports/usages
accordingly so callers only handle one consistent error type for Instruction
construction.

In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Line 334: Remove the redundant declaration that recomputes program_id (the
line using specs.first().map(|s| s.program_id()).unwrap_or_default()); instead
reuse the existing program_id binding already declared earlier in the same
scope; delete this duplicate statement so there is only a single program_id
binding referenced by the surrounding code (e.g., the later uses in
load_accounts-related logic).

---

Outside diff comments:
In `@forester/src/forester_status.rs`:
- Around line 674-685: The code currently swallows deserialization errors from
parse_hash_set_from_bytes::<QueueAccount>(&acc.data) by using
.ok().unwrap_or((0,0)), causing broken queues to be reported as 0/0; change this
to detect Err and emit a warning (e.g. tracing::warn! or process logger)
including the error and account identity, or return None/Option for the stats so
callers can treat parse failures explicitly; apply the same change in both
StateV1 and AddressV1 branches where parse_hash_set_from_bytes and the
(.ok().map(...).unwrap_or((0,0))) pattern is used to compute (len, cap),
replacing the silent unwrap with explicit error logging or Option propagation.

In `@prover/client/src/errors.rs`:
- Line 6: Public enum ProverClientError was widened which breaks downstream
exhaustive matches; either restore backwards compatibility or mark the enum
non-exhaustive and document the breaking changes: add #[non_exhaustive] to
ProverClientError (so consumers must add a fallback arm), or revert the new
variants until you can coordinate a major release, and create clear release
notes/migration guide in prover/client/ (CHANGELOG, README) describing the new
variants (IntegerConversion, JsonSerialization, ProcessStart, ProcessWait,
ProjectRootNotFound, HealthCheckFailed) and the changed return types (e.g.,
to_string() -> Result<String, ProverClientError>) with examples of how callers
should update match statements and error handling.

In `@prover/client/src/proof_types/batch_update/proof_inputs.rs`:
- Around line 122-132: The function get_batch_update_inputs must validate that
all batch vectors (tx_hashes, leaves, merkle_proofs, path_indices, old_leaves)
have lengths consistent with each other and with batch_size before any indexing
or computing the updated root; add an early check (and return an error) ensuring
tx_hashes.len() == leaves.len() == merkle_proofs.len() == path_indices.len() ==
old_leaves.len() and that these equal batch_size as usize (or at least are >=
batch_size) so subsequent zips/indices cannot panic or silently truncate. After
validation, compute nullifiers per position using the exact relationship
Hash(leaves[i], path_indices[i], tx_hashes[i]) and only then serialize/return
the full vectors (remove any partial-prefix logic). Also ensure the
public_input_hash computation uses Hash([old_root, new_root, leaves_hashchain])
after the validated new_root is produced. Reference get_batch_update_inputs,
BatchUpdateCircuitInputs, and ChangelogEntry<HEIGHT> when making these changes.

In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Around line 120-122: Replace the hardcoded DISCRIMINATOR bytes for
"initialize_compression_config" with Anchor's Discriminator helper: remove the
const DISCRIMINATOR and obtain the 8-byte discriminator at runtime from the
instruction type using the anchor_lang::Discriminator implementation for the
corresponding instruction struct (e.g., the InitializeCompressionConfig
instruction struct) and take the first 8 bytes of that discriminator; update any
usages that referenced DISCRIMINATOR to call the Discriminator::discriminator
method on the instruction type instead so changes to the instruction signature
automatically produce the correct bytes.

In `@sdk-libs/client/src/rpc/client.rs`:
- Around line 342-344: The match arm for UiInstruction::Parsed(_) currently uses
println! which bypasses structured logging; replace that println! with
tracing::warn!("Parsed instructions are not implemented yet") and do the same
for the other println! usage noted later in this file; ensure the tracing crate
is in scope (use tracing::warn) so the warnings integrate with the existing
tracing-based observability used by the file.
- Line 906: The mapping that constructs `event` uses `e[0]` which can panic on
empty vectors; change the closure over `parsed_event` to use `e.first()` and
propagate the absence safely. Concretely, replace the `parsed_event.map(|(e,
signature, slot)| (e[0].event.clone(), signature, slot))` logic with an approach
that uses `e.first()` (e.g. `parsed_event.and_then(|(e, signature, slot)|
e.first().cloned().map(|ev| (ev.event.clone(), signature, slot)))`) so empty `e`
yields `None` instead of panicking; update any downstream handling accordingly
(this touches the `parsed_event` -> `event` mapping and the
`event_from_light_transaction` flow).

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 2498-2531: The retry loop using the mutable retries variable can
spin forever when client.post.send() returns Ok(response_result) but
response_result.status() is non-success because retries is only decremented in
the Err branch; modify the loop in test_indexer.rs so that whenever the response
is Ok but !response_result.status().is_success() you decrement retries and await
a backoff (e.g., tokio::time::sleep(Duration::from_secs(5))). Concretely, inside
the while loop after checking if let Ok(response_result) = response_result, add
an else branch for the non-success status that logs the status, sleeps, and
decrements retries (mirroring the current Err behavior) so both network errors
and non-2xx responses progress toward loop termination.

---

Duplicate comments:
In `@prover/client/src/proof_types/batch_update/proof_inputs.rs`:
- Around line 115-117: The current public_inputs method only returns the first
element (self.0[0]) and will panic on an empty vector; change it to iterate over
all BatchUpdateCircuitInputs in self.0, call public_inputs_arr() on each, and
collect the results into a Vec<[u8; 32]> (propagating any error returned by
public_inputs_arr) instead of indexing [0]; also handle the empty-case by
returning an empty Vec rather than panicking. Use the public_inputs function,
the BatchUpdateCircuitInputs entries (self.0) and the public_inputs_arr() method
as the points to modify.

In `@prover/client/src/proof.rs`:
- Around line 167-170: The function negate_g1 currently deserializes prover
bytes using G1::deserialize_with_mode(..., Validate::No) which allows off-curve
or wrong-subgroup points; change the deserialization call in negate_g1 to use a
validating mode (e.g., Validate::Yes) so the G1 point is validated during
deserialization before any negation, and propagate the validation error via
ProverClientError::InvalidProofData as already done.
- Around line 72-89: The function deserialize_hex_string_to_be_bytes currently
uses trim_start_matches("0x"), which removes every leading "0x" and lets
malformed inputs like "0x0x01" parse; change the logic to strip at most one "0x"
prefix by replacing trim_start_matches with a single-strip approach (e.g. use
strip_prefix("0x").unwrap_or(hex_str) or equivalent) when computing trimmed_str
in deserialize_hex_string_to_be_bytes so only the first "0x" is removed and
inputs with repeated prefixes fail to parse.

In `@prover/client/src/prover.rs`:
- Around line 57-65: spawn_prover currently waits for the child via
command.wait_with_output() but doesn't verify the child's exit status, and
health_check treats any Ok(Response) as success; update spawn_prover to inspect
the output.status.success() from command.wait_with_output() and return an error
(ProverClientError::ProcessExit or similar) if the process failed, and update
health_check to examine the HTTP response status (use
response.status().is_success()) rather than just Ok/Err so non-2xx responses are
considered failures; key symbols to modify: command.wait_with_output(),
spawn_prover(), and health_check() (check Response handling to require
status.is_success()) ensuring callers only return Ok(()) when both the process
exit is successful and health_check reports a 2xx status.
- Around line 1-5: The startup path is using blocking std APIs
(std::process::Command and thread::sleep) inside async code which can block
Tokio workers; replace uses of std::process::Command with
tokio::process::Command and call .output().await (or spawn and await) instead of
blocking output(), and replace thread::sleep/Duration with
tokio::time::sleep(Duration::from_secs(...)). Update any loops that poll with
AtomicBool to use async-friendly waits (await tokio::time::sleep) so the
120-second retry loop yields to the runtime; ensure you import
tokio::process::Command and tokio::time::sleep and change any synchronous calls
at the sites referencing Command and sleep (around the reported locations) to
their async equivalents and .await them.

In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 69-105: The code builds a single shell command string in the
variable path using config flags (enable_indexer, limit_ledger_size,
sbf_programs, upgradeable_programs, enable_prover, use_surfpool, validator_args)
and later executes it with sh -c and uses blocking child.wait() inside an async
context; change this to construct a Vec<String>/Vec<&str> of argv entries
(program path as the command and each flag/value as separate args using the same
config fields and sbf_program/program_id/program_path/upgrade_authority values),
spawn the process with tokio::process::Command
(Command::new(program_path).args(&args).spawn().await?), and replace the
blocking child.wait()? usage with the async child.wait().await (or child.await)
to avoid shell injection and blocking the Tokio reactor. Ensure each sbf_program
and upgradeable_program adds separate args (no shell interpolation) and pass
validator_args elements as individual args.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 2508-2514: The code currently uses unwrap() on
response_result.text().await and deserialize_gnark_proof_json(&body) which can
panic on malformed or error payloads; replace these unwraps with proper error
propagation by mapping failures into IndexerError (e.g., use .await.map_err(|e|
IndexerError::CustomError(e.to_string())) for response_result.text(), and change
deserialize_gnark_proof_json(&body).unwrap() to
deserialize_gnark_proof_json(&body).map_err(|e|
IndexerError::CustomError(e.to_string())) so that the subsequent
proof_from_json_struct and compress_proof error handling remains consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0e88447c-7e72-4ee8-aac1-79ba2ed62bd2

📥 Commits

Reviewing files that changed from the base of the PR and between 1012099 and 713ac36.

⛔ Files ignored due to path filters (61)
  • Cargo.lock is excluded by !**/*.lock and included by none
  • forester-utils/src/account_zero_copy.rs is excluded by none and included by none
  • forester-utils/src/address_merkle_tree_config.rs is excluded by none and included by none
  • forester-utils/src/forester_epoch.rs is excluded by none and included by none
  • forester-utils/src/instructions/compress_and_close_mint.rs is excluded by none and included by none
  • forester-utils/src/rate_limiter.rs is excluded by none and included by none
  • forester-utils/src/registry.rs is excluded by none and included by none
  • program-tests/account-compression-test/tests/address_merkle_tree_tests.rs is excluded by none and included by none
  • program-tests/account-compression-test/tests/merkle_tree_tests.rs is excluded by none and included by none
  • program-tests/batched-merkle-tree-test/tests/merkle_tree.rs is excluded by none and included by none
  • program-tests/compressed-token-test/tests/freeze/functional.rs is excluded by none and included by none
  • program-tests/compressed-token-test/tests/v1.rs is excluded by none and included by none
  • program-tests/system-cpi-v2-test/tests/event.rs is excluded by none and included by none
  • program-tests/system-cpi-v2-test/tests/invoke_cpi_with_read_only.rs is excluded by none and included by none
  • program-tests/utils/src/account_zero_copy.rs is excluded by none and included by none
  • program-tests/utils/src/actions/legacy/instructions/transfer2.rs is excluded by none and included by none
  • program-tests/utils/src/address_tree_rollover.rs is excluded by none and included by none
  • program-tests/utils/src/assert_compressed_tx.rs is excluded by none and included by none
  • program-tests/utils/src/assert_merkle_tree.rs is excluded by none and included by none
  • program-tests/utils/src/assert_queue.rs is excluded by none and included by none
  • program-tests/utils/src/batched_address_tree.rs is excluded by none and included by none
  • program-tests/utils/src/e2e_test_env.rs is excluded by none and included by none
  • program-tests/utils/src/lib.rs is excluded by none and included by none
  • program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
  • program-tests/utils/src/state_tree_rollover.rs is excluded by none and included by none
  • program-tests/utils/src/test_batch_forester.rs is excluded by none and included by none
  • program-tests/utils/src/test_forester.rs is excluded by none and included by none
  • sdk-tests/anchor-manual-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/anchor-semi-manual-test/tests/shared/mod.rs is excluded by none and included by none
  • sdk-tests/anchor-semi-manual-test/tests/stress_test.rs is excluded by none and included by none
  • sdk-tests/client-test/tests/light_client.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/amm_stress_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/amm_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/basic_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d10_ata_idempotent_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d10_token_accounts_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d11_zero_copy_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/failing_tests.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/integration_tests.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/pinocchio-light-program-test/tests/shared/mod.rs is excluded by none and included by none
  • sdk-tests/pinocchio-light-program-test/tests/stress_test.rs is excluded by none and included by none
  • sdk-tests/pinocchio-manual-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-native-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-pinocchio-v1-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-pinocchio-v2-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/ctoken_pda.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/decompress_full_cpi.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/pda_ctoken.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_4_invocations.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_4_transfer2.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_deposit.rs is excluded by none and included by none
  • sdk-tests/sdk-v1-native-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-account-loader-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-ata-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-mint-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-pda-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-token-test/tests/test.rs is excluded by none and included by none
📒 Files selected for processing (53)
  • forester/src/config.rs
  • forester/src/forester_status.rs
  • forester/src/main.rs
  • forester/src/processor/v2/proof_worker.rs
  • forester/src/queue_helpers.rs
  • forester/tests/test_compressible_pda.rs
  • prover/client/src/errors.rs
  • prover/client/src/helpers.rs
  • prover/client/src/proof.rs
  • prover/client/src/proof_client.rs
  • prover/client/src/proof_types/batch_address_append/json.rs
  • prover/client/src/proof_types/batch_append/json.rs
  • prover/client/src/proof_types/batch_append/proof_inputs.rs
  • prover/client/src/proof_types/batch_update/json.rs
  • prover/client/src/proof_types/batch_update/proof_inputs.rs
  • prover/client/src/proof_types/combined/v1/json.rs
  • prover/client/src/proof_types/combined/v2/json.rs
  • prover/client/src/proof_types/combined/v2/proof_inputs.rs
  • prover/client/src/proof_types/inclusion/v1/json.rs
  • prover/client/src/proof_types/inclusion/v1/proof_inputs.rs
  • prover/client/src/proof_types/inclusion/v2/json.rs
  • prover/client/src/proof_types/inclusion/v2/proof_inputs.rs
  • prover/client/src/proof_types/non_inclusion/v1/json.rs
  • prover/client/src/proof_types/non_inclusion/v1/proof_inputs.rs
  • prover/client/src/proof_types/non_inclusion/v2/json.rs
  • prover/client/src/proof_types/non_inclusion/v2/proof_inputs.rs
  • prover/client/src/prover.rs
  • prover/client/tests/batch_append.rs
  • prover/client/tests/batch_update.rs
  • prover/client/tests/combined.rs
  • prover/client/tests/inclusion.rs
  • prover/client/tests/init_merkle_tree.rs
  • prover/client/tests/non_inclusion.rs
  • sdk-libs/client/src/fee.rs
  • sdk-libs/client/src/indexer/error.rs
  • sdk-libs/client/src/indexer/options.rs
  • sdk-libs/client/src/indexer/photon_indexer.rs
  • sdk-libs/client/src/indexer/types/proof.rs
  • sdk-libs/client/src/indexer/types/queue.rs
  • sdk-libs/client/src/indexer/types/token.rs
  • sdk-libs/client/src/interface/initialize_config.rs
  • sdk-libs/client/src/interface/instructions.rs
  • sdk-libs/client/src/interface/load_accounts.rs
  • sdk-libs/client/src/interface/mod.rs
  • sdk-libs/client/src/interface/pack.rs
  • sdk-libs/client/src/interface/serialize.rs
  • sdk-libs/client/src/local_test_validator.rs
  • sdk-libs/client/src/rpc/client.rs
  • sdk-libs/client/src/utils.rs
  • sdk-libs/macros/src/light_pdas/seeds/extract.rs
  • sdk-libs/program-test/src/indexer/test_indexer.rs
  • sdk-libs/program-test/src/program_test/compressible_setup.rs
  • sdk-libs/program-test/src/program_test/light_program_test.rs

Comment on lines +180 to +186
let merkle_proof_array = merkle_proof.as_slice().try_into().map_err(|_| {
ProverClientError::InvalidProofData(format!(
"invalid merkle proof length: expected {}, got {}",
HEIGHT,
merkle_proof.len()
))
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate full batch vector lengths before iterating.

At Line 135, zip truncates to the shortest input. If old_leaves, leaves, or merkle_proofs is shorter than batch_size, this function silently builds a partial batch while still emitting the original batch_size. Please hard-fail on length mismatch before the loop.

🛠️ Suggested guard
 pub fn get_batch_append_inputs<const HEIGHT: usize>(
@@
 ) -> Result<(BatchAppendsCircuitInputs, Vec<ChangelogEntry<HEIGHT>>), ProverClientError> {
+    let expected_len = batch_size as usize;
+    if old_leaves.len() != expected_len {
+        return Err(ProverClientError::InvalidProofData(format!(
+            "old_leaves length mismatch: expected {}, got {}",
+            expected_len,
+            old_leaves.len()
+        )));
+    }
+    if leaves.len() != expected_len {
+        return Err(ProverClientError::InvalidProofData(format!(
+            "leaves length mismatch: expected {}, got {}",
+            expected_len,
+            leaves.len()
+        )));
+    }
+    if merkle_proofs.len() != expected_len {
+        return Err(ProverClientError::InvalidProofData(format!(
+            "merkle_proofs length mismatch: expected {}, got {}",
+            expected_len,
+            merkle_proofs.len()
+        )));
+    }
+
     let mut new_root = [0u8; 32];

Based on learnings: Prover server (prover/server/) must generate ZK proofs for batch operations (append, nullify, address append).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` around lines 180
- 186, Before iterating with zip, validate that old_leaves.len(), leaves.len(),
and merkle_proofs.len() all equal batch_size and return a hard error if any
mismatch; specifically add a guard near the start of the function that checks
these lengths and returns ProverClientError::InvalidProofData (or another
appropriate error) when they differ so you never silently build a partial batch
(this covers the loop that consumes old_leaves, leaves, and merkle_proofs and
the subsequent merkle_proof.as_slice().try_into() conversion).

Comment on lines +159 to 210
fn reconstruct_proof_with_lookup<const HEIGHT: usize>(
&self,
address_idx: usize,
node_lookup: &HashMap<u64, usize>,
) -> Result<[[u8; 32]; HEIGHT], IndexerError> {
self.validate_proof_height::<HEIGHT>()?;
let leaf_index = *self.low_element_indices.get(address_idx).ok_or_else(|| {
IndexerError::MissingResult {
context: "reconstruct_proof".to_string(),
message: format!(
"address_idx {} out of bounds for low_element_indices (len {})",
address_idx,
self.low_element_indices.len(),
),
}
})?;
let mut proof = [[0u8; 32]; HEIGHT];
let mut pos = leaf_index;

for (level, proof_element) in proof.iter_mut().enumerate() {
let sibling_pos = if pos.is_multiple_of(2) {
pos + 1
} else {
pos - 1
};
let sibling_idx = Self::encode_node_index(level, sibling_pos);
let hash_idx = node_lookup.get(&sibling_idx).copied().ok_or_else(|| {
IndexerError::MissingResult {
context: "reconstruct_proof".to_string(),
message: format!(
"Missing proof node at level {} position {} (encoded: {})",
level, sibling_pos, sibling_idx
),
}
})?;
let hash =
self.node_hashes
.get(hash_idx)
.ok_or_else(|| IndexerError::MissingResult {
context: "reconstruct_proof".to_string(),
message: format!(
"node_hashes index {} out of bounds (len {})",
hash_idx,
self.node_hashes.len(),
),
})?;
*proof_element = *hash;
pos /= 2;
}

Ok(proof)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: redundant height validation in lookup path.

reconstruct_proof_with_lookup calls validate_proof_height::<HEIGHT>() at line 164, but its only caller reconstruct_proofs already validates at line 132. Consider removing the internal check to avoid O(n) redundant validations when reconstructing many proofs:

     fn reconstruct_proof_with_lookup<const HEIGHT: usize>(
         &self,
         address_idx: usize,
         node_lookup: &HashMap<u64, usize>,
     ) -> Result<[[u8; 32]; HEIGHT], IndexerError> {
-        self.validate_proof_height::<HEIGHT>()?;
         let leaf_index = *self.low_element_indices.get(address_idx).ok_or_else(|| {

That said, keeping it is acceptable as defensive programming if this method might be called directly in the future.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn reconstruct_proof_with_lookup<const HEIGHT: usize>(
&self,
address_idx: usize,
node_lookup: &HashMap<u64, usize>,
) -> Result<[[u8; 32]; HEIGHT], IndexerError> {
self.validate_proof_height::<HEIGHT>()?;
let leaf_index = *self.low_element_indices.get(address_idx).ok_or_else(|| {
IndexerError::MissingResult {
context: "reconstruct_proof".to_string(),
message: format!(
"address_idx {} out of bounds for low_element_indices (len {})",
address_idx,
self.low_element_indices.len(),
),
}
})?;
let mut proof = [[0u8; 32]; HEIGHT];
let mut pos = leaf_index;
for (level, proof_element) in proof.iter_mut().enumerate() {
let sibling_pos = if pos.is_multiple_of(2) {
pos + 1
} else {
pos - 1
};
let sibling_idx = Self::encode_node_index(level, sibling_pos);
let hash_idx = node_lookup.get(&sibling_idx).copied().ok_or_else(|| {
IndexerError::MissingResult {
context: "reconstruct_proof".to_string(),
message: format!(
"Missing proof node at level {} position {} (encoded: {})",
level, sibling_pos, sibling_idx
),
}
})?;
let hash =
self.node_hashes
.get(hash_idx)
.ok_or_else(|| IndexerError::MissingResult {
context: "reconstruct_proof".to_string(),
message: format!(
"node_hashes index {} out of bounds (len {})",
hash_idx,
self.node_hashes.len(),
),
})?;
*proof_element = *hash;
pos /= 2;
}
Ok(proof)
}
fn reconstruct_proof_with_lookup<const HEIGHT: usize>(
&self,
address_idx: usize,
node_lookup: &HashMap<u64, usize>,
) -> Result<[[u8; 32]; HEIGHT], IndexerError> {
let leaf_index = *self.low_element_indices.get(address_idx).ok_or_else(|| {
IndexerError::MissingResult {
context: "reconstruct_proof".to_string(),
message: format!(
"address_idx {} out of bounds for low_element_indices (len {})",
address_idx,
self.low_element_indices.len(),
),
}
})?;
let mut proof = [[0u8; 32]; HEIGHT];
let mut pos = leaf_index;
for (level, proof_element) in proof.iter_mut().enumerate() {
let sibling_pos = if pos.is_multiple_of(2) {
pos + 1
} else {
pos - 1
};
let sibling_idx = Self::encode_node_index(level, sibling_pos);
let hash_idx = node_lookup.get(&sibling_idx).copied().ok_or_else(|| {
IndexerError::MissingResult {
context: "reconstruct_proof".to_string(),
message: format!(
"Missing proof node at level {} position {} (encoded: {})",
level, sibling_pos, sibling_idx
),
}
})?;
let hash =
self.node_hashes
.get(hash_idx)
.ok_or_else(|| IndexerError::MissingResult {
context: "reconstruct_proof".to_string(),
message: format!(
"node_hashes index {} out of bounds (len {})",
hash_idx,
self.node_hashes.len(),
),
})?;
*proof_element = *hash;
pos /= 2;
}
Ok(proof)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/indexer/types/queue.rs` around lines 159 - 210, The
method reconstruct_proof_with_lookup currently redundantly calls
validate_proof_height::<HEIGHT>() even though its only caller,
reconstruct_proofs, already performs that check; remove the
validate_proof_height::<HEIGHT>() invocation and its error propagation from
reconstruct_proof_with_lookup so reconstruct_proofs remains the single
validation point (leave reconstruct_proofs and validate_proof_height unchanged),
ensuring reconstruct_proof_with_lookup assumes the caller has validated the
height (or add a brief comment noting that responsibility).

address_space: Vec<Pubkey>,
config_bump: Option<u8>,
) -> Instruction {
) -> std::io::Result<Instruction> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Unify builder error types across this module.

Line 104 and Line 154 now return std::io::Result<Instruction>, while other builders still use Result<Instruction, Box<dyn std::error::Error>>. Consider a single module-level error type to avoid split caller handling and lost context.

Also applies to: 154-154

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/instructions.rs` at line 104, Unify the builder
return type by introducing a single module-level alias or error enum (e.g., type
BuilderResult = Result<Instruction, Box<dyn std::error::Error + Send + Sync>> or
a custom InstructionBuildError) and replace the mixed signatures (the functions
currently returning std::io::Result<Instruction> and those returning
Result<Instruction, Box<dyn std::error::Error>>) with that alias; update all
builder function signatures and their internal error returns to use
BuilderResult and adjust imports/usages accordingly so callers only handle one
consistent error type for Instruction construction.

.collect();
.collect::<Result<Vec<_>, _>>()?;

let program_id = specs.first().map(|s| s.program_id()).unwrap_or_default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Redundant program_id computation.

This is identical to line 311. You can remove this line and reuse the existing program_id binding.

🧹 Proposed fix
-    let program_id = specs.first().map(|s| s.program_id()).unwrap_or_default();
-
     instructions::create_decompress_accounts_idempotent_instruction(
         &program_id,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/load_accounts.rs` at line 334, Remove the
redundant declaration that recomputes program_id (the line using
specs.first().map(|s| s.program_id()).unwrap_or_default()); instead reuse the
existing program_id binding already declared earlier in the same scope; delete
this duplicate statement so there is only a single program_id binding referenced
by the surrounding code (e.g., the later uses in load_accounts-related logic).

@sergeytimoshin sergeytimoshin force-pushed the sergey/runtime-safety-hardening branch from 713ac36 to 8ed76e5 Compare March 15, 2026 00:37
@sergeytimoshin sergeytimoshin force-pushed the sergey/runtime-safety-hardening branch from 8ed76e5 to f9a788c Compare March 15, 2026 15:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
sdk-libs/client/src/interface/load_accounts.rs (1)

336-359: ⚠️ Potential issue | 🟠 Major

Validate proof/account cardinality before instruction build.

build_pda_load trusts external proof payload shape. Add an explicit check that proof.accounts.len() matches specs.len() before constructing the decompress instruction; otherwise malformed/partial upstream responses fail late and opaquely.

Suggested guard
 fn build_pda_load<V>(
     specs: &[&PdaSpec<V>],
     proof: ValidityProofWithContext,
     fee_payer: Pubkey,
     compression_config: Pubkey,
 ) -> Result<Instruction, LoadAccountsError>
 where
     V: Pack<solana_instruction::AccountMeta> + Clone + std::fmt::Debug,
 {
+    if proof.accounts.len() != specs.len() {
+        return Err(LoadAccountsError::BuildInstruction(format!(
+            "proof account count mismatch: expected {}, got {}",
+            specs.len(),
+            proof.accounts.len()
+        )));
+    }
+
     let has_tokens = specs.iter().any(|s| {
         s.compressed()
             .map(|c| c.owner == LIGHT_TOKEN_PROGRAM_ID)
             .unwrap_or(false)
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/load_accounts.rs` around lines 336 - 359,
Before calling instructions::create_decompress_accounts_idempotent_instruction,
validate that proof.accounts.len() equals specs.len(); if not, return a
LoadAccountsError (e.g., add and use a variant like
LoadAccountsError::InvalidProofCardinality { expected: specs.len(), found:
proof.accounts.len() }) so malformed/partial proofs fail fast in build_pda_load
instead of producing opaque errors later. Ensure this check sits after computing
specs/cold_accounts/hot_addresses and before invoking
create_decompress_accounts_idempotent_instruction, updating LoadAccountsError if
necessary.
prover/client/src/proof_types/batch_update/proof_inputs.rs (1)

139-178: ⚠️ Potential issue | 🔴 Critical

Validate all parallel batch inputs before entering this loop.

Line 139 truncates to the shortest iterator, Line 176 can still panic on tx_hashes[i], and Line 171 still hard-panics on a malformed proof depth. Reuse BatchUpdateCircuitInputs::new() here, or mirror its guards up front, so bad indexer data returns Err instead of partially building a batch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_update/proof_inputs.rs` around lines 139
- 178, The loop relies on multiple parallel iterators and contains
unwraps/panics (merkle_proof.try_into().unwrap(), indexing tx_hashes[i], and
relying on proof depth) so first validate all batch inputs like
BatchUpdateCircuitInputs::new() would: check that leaves.len() ==
merkle_proofs.len() == path_indices.len() == tx_hashes.len(), verify each
merkle_proof can convert to the expected array length (avoid try_into().unwrap()
by attempting the conversion up front and returning a ProverClientError on
failure), and validate each path index is within valid bounds before entering
the for loop; replace panicking conversions/indexing (try_into().unwrap(),
tx_hashes[i], and any implicit assumptions used by
compute_root_from_merkle_proof/update_proof) with fallible checks that return
Err with clear messages so the function fails fast on bad indexer data instead
of partially building a batch.
sdk-libs/client/src/indexer/photon_indexer.rs (1)

1657-1695: ⚠️ Potential issue | 🟠 Major

Map tree_next_insertion_index from the actual queue insertion point, not the pagination offset.

Line 1694 assigns aq.start_index to tree_next_insertion_index. However, start_index is documented as a pagination offset for the returned queue slice, while tree_next_insertion_index represents the sparse tree's next insertion point used to initialize staging trees. These are not equivalent when pagination is applied—using the offset will cause incorrect proof context initialization.

The API response provides queue_indices (array of u64) which is not currently being used. If this represents insertion indices, derive the next insertion point from it (e.g., the maximum or last element). If the API does not provide an actual tree next-index field, make tree_next_insertion_index optional rather than injecting a potentially incorrect pagination offset.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/indexer/photon_indexer.rs` around lines 1657 - 1695, The
code incorrectly sets AddressQueueData.tree_next_insertion_index =
aq.start_index (pagination offset); instead, derive tree_next_insertion_index
from the API's queue_indices (e.g., take the max or last element of
aq.queue_indices) and assign that value, or if the API does not provide
meaningful insertion indices, change AddressQueueData.tree_next_insertion_index
to Option<u64> and set it to None when queue_indices is absent/empty; update the
mapping inside the address_queue construction (referencing address_queue,
tree_next_insertion_index, start_index, and aq.queue_indices) to implement this
logic.
♻️ Duplicate comments (4)
prover/client/src/prover.rs (1)

29-116: ⚠️ Potential issue | 🟠 Major

These probes still block the Tokio runtime.

health_check() is async, but Lines 43-63 do blocking TcpStream I/O and Lines 88-105 run Command::output() synchronously on every retry. Under load this can pin a worker thread for the full timeout window. Move the probe to Tokio/reqwest or offload the blocking parts with spawn_blocking.

Also applies to: 194-205

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 29 - 116, health_check_once,
health_check_once_with_curl and prover_listener_present perform blocking
TcpStream I/O and synchronous Command::output() which can block the Tokio
runtime; change these probes to async by using Tokio/async APIs
(tokio::net::TcpStream and reqwest for HTTP) or wrap the blocking work in
tokio::task::spawn_blocking so they run off the async runtime, and do the same
fix for the duplicated probe implementation elsewhere; update function
signatures and call sites accordingly (health_check_once,
health_check_once_with_curl, prover_listener_present) so the health check no
longer blocks reactor threads.
prover/client/src/proof_types/batch_append/proof_inputs.rs (1)

135-194: ⚠️ Potential issue | 🟠 Major

Don't let this loop silently shrink the batch.

Line 135 still zips to the shortest of old_leaves, leaves, and merkle_proofs, while batch_size and the emitted vectors keep the original size, and Line 180 still panics on bad proof depth. Add the same upfront guards here that BatchAppendsCircuitInputs::new() already has.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` around lines 135
- 194, The loop zips old_leaves, leaves and merkle_proofs which silently
truncates the batch and can later panic on bad proof depth; before the for over
(i, (old_leaf, (new_leaf, mut merkle_proof))) validate that old_leaves.len(),
leaves.len(), and merkle_proofs.len() all equal batch_size and that each
merkle_proof has expected HEIGHT (or correct depth), returning a
ProverClientError if not; then iterate by index 0..batch_size (using start_index
+ i) and access old_leaves[i], leaves[i], and merkle_proofs[i] (or otherwise
convert/validate merkle_proof into fixed-size array with a clear error instead
of unwrap) so the batch is not silently shrunk and bad proofs produce explicit
errors (referencing the loop tuple and compute_root_from_merkle_proof).
prover/client/src/proof_types/batch_address_append/proof_inputs.rs (1)

386-506: ⚠️ Potential issue | 🔴 Critical

Keep caller-owned state transactional across the batch.

After Line 386, Line 402, and Line 415, the function has already mutated changelog, indexed_changelog, and sparse_merkle_tree. Any later ? or explicit Err returns with those structures half-applied, so a retry starts from a corrupted snapshot. Stage the batch in temporary state and commit only after the whole loop succeeds, or add explicit rollback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs` around
lines 386 - 506, The function mutates caller-owned state (changelog,
indexed_changelog, sparse_merkle_tree and pushes via
patcher.push_changelog_entry/append) mid-loop which leaves them half-applied on
early returns; change to stage all changes in local temporaries (e.g., local
vectors temp_changelog_entries, temp_indexed_changelog_entries,
temp_new_element_circuit_merkle_proofs and a cloned sparse_merkle_tree or a list
of merkle updates) while iterating (use SparseMerkleTree::clone() or collect
merkle_proof_array results without mutating the original), and only call
patcher.push_changelog_entry::<HEIGHT>, append into sparse_merkle_tree, and
indexed_changelog.push after the whole batch loop succeeds (or implement
explicit rollback by reverting patcher and sparse_merkle_tree if you prefer);
reference the code paths around push_changelog_entry::<HEIGHT>,
indexed_changelog.push, sparse_merkle_tree.append, and merkle_proof_array usage
to locate where to stop direct mutations and instead collect pending changes.
sdk-libs/client/src/indexer/types/queue.rs (1)

179-185: 🧹 Nitpick | 🔵 Trivial

Inner height validation is still redundant in the lookup path.

reconstruct_proofs already validates HEIGHT before iterating; re-validating in reconstruct_proof_with_lookup adds avoidable per-proof overhead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/indexer/types/queue.rs` around lines 179 - 185, The call
to self.validate_proof_height::<HEIGHT>() inside reconstruct_proof_with_lookup
is redundant because reconstruct_proofs already validates HEIGHT before calling
it; remove that per-proof validation from reconstruct_proof_with_lookup (delete
the self.validate_proof_height::<HEIGHT>()? line) and rely on
reconstruct_proofs' validation to avoid unnecessary per-proof overhead while
keeping other logic (leaf_index lookup, node_lookup usage, and returning
Result<[[u8; 32]; HEIGHT], IndexerError>) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@forester/src/processor/v2/processor.rs`:
- Around line 134-137: Create a single helper method (e.g., ensure_worker_pool)
on the processor that encapsulates the fallible initialization currently
duplicated: check self.worker_pool.is_none(), call
spawn_proof_workers(&self.context.prover_config)?, and set self.worker_pool =
Some(WorkerPool { job_tx }); then replace the three duplicated blocks with calls
to ensure_worker_pool(). Reference the existing symbols spawn_proof_workers,
WorkerPool, self.worker_pool, and self.context.prover_config so the helper
performs the same behavior and returns the same error propagation as the
original code.

In `@forester/src/processor/v2/strategy/address.rs`:
- Around line 294-308: The code currently passes addresses.len() into
AddressStagingTree::process_batch which can desync next_index and request
unsupported ZKP circuits; validate that addresses.len() equals the deterministic
zkp_batch_size_usize obtained from get_batch_snapshot (and that
zkp_batch_size_usize is one of the supported sizes 10 or 250), then pass
zkp_batch_size_usize into staging_tree.process_batch instead of addresses.len();
reference the symbols staging_tree.process_batch, addresses,
zkp_batch_size_usize, and AddressStagingTree::next_index when making this
change.

In `@prover/client/src/prover.rs`:
- Around line 29-31: The prover_listener_present() check in
health_check_once(Duration) should only log/diagnose a bound port and must not
short-circuit success; change health_check_once so that
prover_listener_present() is used for diagnostics (log a warning) but the
function still performs the actual /health HTTP probe and only returns true when
the probe succeeds (do not return true immediately from
prover_listener_present()); make the same change in the other health probe block
referenced (the similar logic around spawn_prover() / the 81-98 region) so that
listener presence never bypasses the real health check.
- Around line 20-26: build_http_client currently forces .no_proxy() and omits
per-request timeouts; change it to accept a parameter indicating the target
endpoint (e.g., server_address: &str or is_loopback: bool) or a timeout value so
it can conditionally enable .no_proxy() only for loopback addresses and set a
request timeout via .timeout(...). Update the function signature
(build_http_client) and all callers (e.g., with_config / ProofClient
constructors) to pass the endpoint or a boolean; in the builder call use
conditional logic to include .no_proxy() only when the endpoint is loopback and
add .timeout(Duration::from_secs(...)) (make the timeout configurable or derive
from max_wait_time) and keep the existing error mapping to
ProverClientError::GenericError on build failures.
- Around line 146-188: The IS_LOADING flag can remain true if the async startup
block panics (e.g., at Command::new(prover_path).spawn().expect(...) or the
panic! branches); wrap the startup future so you always clear IS_LOADING on both
success and panic: run the existing async block (the code that builds Command,
spawns child, polls health_check, and sets child_exit_status) inside a
panic-capture mechanism (e.g., futures::FutureExt::catch_unwind or
std::panic::catch_unwind with AssertUnwindSafe around the future), await it into
a Result, then call IS_LOADING.store(false, Ordering::Release) in a finally-like
path before returning; if the panic was caught, re-panic (or convert to an Err)
after clearing IS_LOADING so behavior is unchanged but the flag is never left
set; reference symbols: IS_LOADING, spawn_result, Command::new(prover_path),
health_check.

In `@sdk-libs/client/src/indexer/types/queue.rs`:
- Around line 136-147: The code currently uses proof_count() which returns
min(addresses.len(), low_element_indices.len()), allowing silently
truncated/misaligned payloads; change the logic to fail fast by validating that
addresses.len() == low_element_indices.len() and return an
IndexerError::InvalidParameters (with a clear message referencing the two
lengths) when they differ instead of computing min. Update the implementation of
proof_count (or the callers that rely on it) to perform this equality check and
adjust the checks around address_range (the block using available) and the other
similar check at the later location (lines around 175-177) to use the new
validation so malformed vectors are rejected rather than truncated.

---

Outside diff comments:
In `@prover/client/src/proof_types/batch_update/proof_inputs.rs`:
- Around line 139-178: The loop relies on multiple parallel iterators and
contains unwraps/panics (merkle_proof.try_into().unwrap(), indexing
tx_hashes[i], and relying on proof depth) so first validate all batch inputs
like BatchUpdateCircuitInputs::new() would: check that leaves.len() ==
merkle_proofs.len() == path_indices.len() == tx_hashes.len(), verify each
merkle_proof can convert to the expected array length (avoid try_into().unwrap()
by attempting the conversion up front and returning a ProverClientError on
failure), and validate each path index is within valid bounds before entering
the for loop; replace panicking conversions/indexing (try_into().unwrap(),
tx_hashes[i], and any implicit assumptions used by
compute_root_from_merkle_proof/update_proof) with fallible checks that return
Err with clear messages so the function fails fast on bad indexer data instead
of partially building a batch.

In `@sdk-libs/client/src/indexer/photon_indexer.rs`:
- Around line 1657-1695: The code incorrectly sets
AddressQueueData.tree_next_insertion_index = aq.start_index (pagination offset);
instead, derive tree_next_insertion_index from the API's queue_indices (e.g.,
take the max or last element of aq.queue_indices) and assign that value, or if
the API does not provide meaningful insertion indices, change
AddressQueueData.tree_next_insertion_index to Option<u64> and set it to None
when queue_indices is absent/empty; update the mapping inside the address_queue
construction (referencing address_queue, tree_next_insertion_index, start_index,
and aq.queue_indices) to implement this logic.

In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Around line 336-359: Before calling
instructions::create_decompress_accounts_idempotent_instruction, validate that
proof.accounts.len() equals specs.len(); if not, return a LoadAccountsError
(e.g., add and use a variant like LoadAccountsError::InvalidProofCardinality {
expected: specs.len(), found: proof.accounts.len() }) so malformed/partial
proofs fail fast in build_pda_load instead of producing opaque errors later.
Ensure this check sits after computing specs/cold_accounts/hot_addresses and
before invoking create_decompress_accounts_idempotent_instruction, updating
LoadAccountsError if necessary.

---

Duplicate comments:
In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs`:
- Around line 386-506: The function mutates caller-owned state (changelog,
indexed_changelog, sparse_merkle_tree and pushes via
patcher.push_changelog_entry/append) mid-loop which leaves them half-applied on
early returns; change to stage all changes in local temporaries (e.g., local
vectors temp_changelog_entries, temp_indexed_changelog_entries,
temp_new_element_circuit_merkle_proofs and a cloned sparse_merkle_tree or a list
of merkle updates) while iterating (use SparseMerkleTree::clone() or collect
merkle_proof_array results without mutating the original), and only call
patcher.push_changelog_entry::<HEIGHT>, append into sparse_merkle_tree, and
indexed_changelog.push after the whole batch loop succeeds (or implement
explicit rollback by reverting patcher and sparse_merkle_tree if you prefer);
reference the code paths around push_changelog_entry::<HEIGHT>,
indexed_changelog.push, sparse_merkle_tree.append, and merkle_proof_array usage
to locate where to stop direct mutations and instead collect pending changes.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Around line 135-194: The loop zips old_leaves, leaves and merkle_proofs which
silently truncates the batch and can later panic on bad proof depth; before the
for over (i, (old_leaf, (new_leaf, mut merkle_proof))) validate that
old_leaves.len(), leaves.len(), and merkle_proofs.len() all equal batch_size and
that each merkle_proof has expected HEIGHT (or correct depth), returning a
ProverClientError if not; then iterate by index 0..batch_size (using start_index
+ i) and access old_leaves[i], leaves[i], and merkle_proofs[i] (or otherwise
convert/validate merkle_proof into fixed-size array with a clear error instead
of unwrap) so the batch is not silently shrunk and bad proofs produce explicit
errors (referencing the loop tuple and compute_root_from_merkle_proof).

In `@prover/client/src/prover.rs`:
- Around line 29-116: health_check_once, health_check_once_with_curl and
prover_listener_present perform blocking TcpStream I/O and synchronous
Command::output() which can block the Tokio runtime; change these probes to
async by using Tokio/async APIs (tokio::net::TcpStream and reqwest for HTTP) or
wrap the blocking work in tokio::task::spawn_blocking so they run off the async
runtime, and do the same fix for the duplicated probe implementation elsewhere;
update function signatures and call sites accordingly (health_check_once,
health_check_once_with_curl, prover_listener_present) so the health check no
longer blocks reactor threads.

In `@sdk-libs/client/src/indexer/types/queue.rs`:
- Around line 179-185: The call to self.validate_proof_height::<HEIGHT>() inside
reconstruct_proof_with_lookup is redundant because reconstruct_proofs already
validates HEIGHT before calling it; remove that per-proof validation from
reconstruct_proof_with_lookup (delete the
self.validate_proof_height::<HEIGHT>()? line) and rely on reconstruct_proofs'
validation to avoid unnecessary per-proof overhead while keeping other logic
(leaf_index lookup, node_lookup usage, and returning Result<[[u8; 32]; HEIGHT],
IndexerError>) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bf635eca-6c09-4841-9908-b2981799b35d

📥 Commits

Reviewing files that changed from the base of the PR and between 713ac36 and f9a788c.

⛔ Files ignored due to path filters (4)
  • Cargo.lock is excluded by !**/*.lock and included by none
  • program-tests/utils/src/e2e_test_env.rs is excluded by none and included by none
  • program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
  • program-tests/utils/src/test_batch_forester.rs is excluded by none and included by none
📒 Files selected for processing (19)
  • forester/src/processor/v2/helpers.rs
  • forester/src/processor/v2/processor.rs
  • forester/src/processor/v2/proof_worker.rs
  • forester/src/processor/v2/strategy/address.rs
  • prover/client/src/constants.rs
  • prover/client/src/helpers.rs
  • prover/client/src/proof_client.rs
  • prover/client/src/proof_types/batch_address_append/proof_inputs.rs
  • prover/client/src/proof_types/batch_append/proof_inputs.rs
  • prover/client/src/proof_types/batch_update/proof_inputs.rs
  • prover/client/src/prover.rs
  • prover/client/tests/batch_address_append.rs
  • sdk-libs/client/src/indexer/photon_indexer.rs
  • sdk-libs/client/src/indexer/types/queue.rs
  • sdk-libs/client/src/interface/load_accounts.rs
  • sdk-libs/program-test/src/indexer/test_indexer.rs
  • sdk-libs/sdk-types/src/interface/program/decompression/pda.rs
  • sdk-libs/sdk-types/src/interface/program/decompression/processor.rs
  • sdk-libs/sdk-types/src/interface/program/decompression/token.rs

Comment on lines 134 to 137
if self.worker_pool.is_none() {
let job_tx = spawn_proof_workers(&self.context.prover_config);
let job_tx = spawn_proof_workers(&self.context.prover_config)?;
self.worker_pool = Some(WorkerPool { job_tx });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Extract worker-pool bootstrap into a single helper.

The same fallible initialization block now exists in three paths. A small ensure_worker_pool() would keep future startup logging/retry behavior consistent and avoid one path drifting the next time worker bootstrap changes.

Also applies to: 534-537, 563-566

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/processor/v2/processor.rs` around lines 134 - 137, Create a
single helper method (e.g., ensure_worker_pool) on the processor that
encapsulates the fallible initialization currently duplicated: check
self.worker_pool.is_none(), call
spawn_proof_workers(&self.context.prover_config)?, and set self.worker_pool =
Some(WorkerPool { job_tx }); then replace the three duplicated blocks with calls
to ensure_worker_pool(). Reference the existing symbols spawn_proof_workers,
WorkerPool, self.worker_pool, and self.context.prover_config so the helper
performs the same behavior and returns the same error propagation as the
original code.

Comment on lines 294 to +308
let zkp_batch_size_actual = addresses.len();

if zkp_batch_size_actual == 0 {
return Err(anyhow!("Empty batch at start={}", start));
}

let low_element_values = &batch_data.low_element_values;
let low_element_next_values = &batch_data.low_element_next_values;
let low_element_indices = &batch_data.low_element_indices;
let low_element_next_indices = &batch_data.low_element_next_indices;

let low_element_proofs: Vec<Vec<[u8; 32]>> = {
let data = lock_recover(self.streaming_queue.data.as_ref(), "streaming_queue.data");
(start..start + zkp_batch_size_actual)
.map(|i| data.reconstruct_proof(i, DEFAULT_BATCH_ADDRESS_TREE_HEIGHT as u8))
.collect::<Result<Vec<_>, _>>()?
};

let hashchain_idx = start / zkp_batch_size_usize;
let leaves_hashchain = {
let data = lock_recover(self.streaming_queue.data.as_ref(), "streaming_queue.data");
get_leaves_hashchain(&data.leaves_hash_chains, hashchain_idx)?
};
let result = staging_tree
.process_batch(
&addresses,
&low_element_values,
&low_element_next_values,
&low_element_indices,
&low_element_next_indices,
&low_element_proofs,
leaves_hashchain,
zkp_batch_size_actual,
epoch,
tree,
)
.map_err(|err| map_address_staging_error(tree, err))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Pin this path to the configured ZKP batch size.

Once get_batch_snapshot succeeds, the circuit size should be deterministic. Passing addresses.len() into process_batch means a short snapshot would advance AddressStagingTree::next_index by the wrong amount and request an address circuit size the prover does not support. Validate the length once, then pass zkp_batch_size_usize.

♻️ Suggested guard
         let zkp_batch_size_actual = addresses.len();
+        if zkp_batch_size_actual != zkp_batch_size_usize {
+            return Err(anyhow!(
+                "Incomplete address batch: expected {}, got {}",
+                zkp_batch_size_usize,
+                zkp_batch_size_actual
+            ));
+        }
         let result = staging_tree
             .process_batch(
                 &addresses,
                 &low_element_values,
                 &low_element_next_values,
                 &low_element_indices,
                 &low_element_next_indices,
                 &low_element_proofs,
                 leaves_hashchain,
-                zkp_batch_size_actual,
+                zkp_batch_size_usize,
                 epoch,
                 tree,
             )

Based on learnings: ZKP batch sizes for address trees must be 10 or 250 in src/initialize_address_tree.rs. Only these circuit sizes are supported for address trees.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/processor/v2/strategy/address.rs` around lines 294 - 308, The
code currently passes addresses.len() into AddressStagingTree::process_batch
which can desync next_index and request unsupported ZKP circuits; validate that
addresses.len() equals the deterministic zkp_batch_size_usize obtained from
get_batch_snapshot (and that zkp_batch_size_usize is one of the supported sizes
10 or 250), then pass zkp_batch_size_usize into staging_tree.process_batch
instead of addresses.len(); reference the symbols staging_tree.process_batch,
addresses, zkp_batch_size_usize, and AddressStagingTree::next_index when making
this change.

Comment on lines +20 to +26
pub(crate) fn build_http_client() -> Result<reqwest::Client, ProverClientError> {
reqwest::Client::builder()
.no_proxy()
.build()
.map_err(|error| {
ProverClientError::GenericError(format!("failed to build HTTP client: {error}"))
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's see the structure of the prover.rs file
wc -l prover/client/src/prover.rs

Repository: Lightprotocol/light-protocol

Length of output: 103


🏁 Script executed:

# Read the relevant section and surrounding context
cat -n prover/client/src/prover.rs | head -60

Repository: Lightprotocol/light-protocol

Length of output: 2370


🏁 Script executed:

# Search for ProofClient struct and its variants
rg -A 10 "impl.*ProofClient" prover/client/src/prover.rs | head -100

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

# Look for usage of build_http_client
rg -B 3 -A 3 "build_http_client" prover/client/src/

Repository: Lightprotocol/light-protocol

Length of output: 2388


🏁 Script executed:

# Search for timeout configuration in the codebase
rg "timeout|max_wait_time" prover/client/src/ -A 2 -B 2

Repository: Lightprotocol/light-protocol

Length of output: 10095


🏁 Script executed:

# Check if there's a CLAUDE.md or similar documentation
fd -t f "CLAUDE.md|claude.md|README.md" prover/ | head -10

Repository: Lightprotocol/light-protocol

Length of output: 95


🏁 Script executed:

# Read proof_client.rs to understand the client variants
head -150 prover/client/src/proof_client.rs

Repository: Lightprotocol/light-protocol

Length of output: 4579


🏁 Script executed:

# See the with_config implementation
rg -A 20 "pub fn with_config" prover/client/src/proof_client.rs

Repository: Lightprotocol/light-protocol

Length of output: 714


🏁 Script executed:

# Check if there are any timeout configurations set elsewhere on the client
rg "timeout|connect" prover/client/src/proof_client.rs

Repository: Lightprotocol/light-protocol

Length of output: 164


🏁 Script executed:

# Look for constants and defaults
rg "DEFAULT_" prover/client/src/ -B 2 -A 2

Repository: Lightprotocol/light-protocol

Length of output: 3579


🏁 Script executed:

# Check if SERVER_ADDRESS is hardcoded to localhost
rg "SERVER_ADDRESS" prover/client/src/constants.rs -A 2 -B 2

Repository: Lightprotocol/light-protocol

Length of output: 210


🏁 Script executed:

# Look for any reqwest timeout configuration patterns elsewhere in the codebase
rg "\.timeout\(" . -t rust --max-count=10

Repository: Lightprotocol/light-protocol

Length of output: 794


🏁 Script executed:

# Check how send_proof_request is implemented to see if timeouts are configured there
rg -A 30 "async fn send_proof_request" prover/client/src/proof_client.rs

Repository: Lightprotocol/light-protocol

Length of output: 1040


Configure explicit timeouts and make proxy handling context-aware for the HTTP client.

Line 22 applies .no_proxy() globally to all ProofClient variants, including with_config() which accepts remote server_address values that might require proxy support. Additionally, the client has no per-request timeout—max_wait_time only bounds the overall polling/retry loop, not individual hung HTTP requests. Add .timeout() to the builder and parameterize proxy behavior based on whether the endpoint is loopback or remote.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 20 - 26, build_http_client
currently forces .no_proxy() and omits per-request timeouts; change it to accept
a parameter indicating the target endpoint (e.g., server_address: &str or
is_loopback: bool) or a timeout value so it can conditionally enable .no_proxy()
only for loopback addresses and set a request timeout via .timeout(...). Update
the function signature (build_http_client) and all callers (e.g., with_config /
ProofClient constructors) to pass the endpoint or a boolean; in the builder call
use conditional logic to include .no_proxy() only when the endpoint is loopback
and add .timeout(Duration::from_secs(...)) (make the timeout configurable or
derive from max_wait_time) and keep the existing error mapping to
ProverClientError::GenericError on build failures.

Comment on lines +146 to +188
let spawn_result = async {
let mut command = Command::new(prover_path);
command.arg("start-prover").stdout(Stdio::piped()).stderr(Stdio::piped());
let mut child = command.spawn().expect("Failed to start prover process");
let mut child_exit_status = None;

let command = Command::new(prover_path)
.arg("start-prover")
.spawn()
.expect("Failed to start prover process");
for _ in 0..120 {
if health_check(1, 1).await {
info!("Prover started successfully");
return;
}

let _ = command.wait_with_output();
if child_exit_status.is_none() {
match child.try_wait() {
Ok(Some(status)) => {
tracing::warn!(
?status,
"prover launcher exited before health check succeeded; continuing to poll for detached prover"
);
child_exit_status = Some(status);
}
Ok(None) => {}
Err(error) => {
tracing::error!(?error, "failed to poll prover child process");
}
}
}

let health_result = health_check(120, 1).await;
if health_result {
info!("Prover started successfully");
} else {
panic!("Failed to start prover, health check failed.");
sleep(Duration::from_secs(1)).await;
}

if let Some(status) = child_exit_status {
panic!(
"Failed to start prover, health check failed after launcher exited with status {status}."
);
}

panic!("Failed to start prover, health check failed.");
}
.await;

IS_LOADING.store(false, Ordering::Release);
spawn_result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Release IS_LOADING on panic paths too.

Any panic inside this block—Line 149's spawn().expect(...) or the later panic! branches—unwinds before Line 187, leaving the flag stuck true forever. After one failed startup attempt, every later caller goes down the “already loading” path until the process is restarted.

🔧 Safer pattern
+struct LoadingGuard;
+impl Drop for LoadingGuard {
+    fn drop(&mut self) {
+        IS_LOADING.store(false, Ordering::Release);
+    }
+}
+
         if IS_LOADING
             .compare_exchange(false, true, Ordering::AcqRel, Ordering::Acquire)
             .is_err()
         {
             if health_check(120, 1).await {
                 return;
             }
             panic!("Failed to start prover, health check failed.");
         }
 
-        let spawn_result = async {
+        let _loading_guard = LoadingGuard;
+        async {
             let mut command = Command::new(prover_path);
             command.arg("start-prover").stdout(Stdio::piped()).stderr(Stdio::piped());
             let mut child = command.spawn().expect("Failed to start prover process");
             let mut child_exit_status = None;
@@
-        }
-        .await;
-
-        IS_LOADING.store(false, Ordering::Release);
-        spawn_result
+        }
+        .await
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 146 - 188, The IS_LOADING flag can
remain true if the async startup block panics (e.g., at
Command::new(prover_path).spawn().expect(...) or the panic! branches); wrap the
startup future so you always clear IS_LOADING on both success and panic: run the
existing async block (the code that builds Command, spawns child, polls
health_check, and sets child_exit_status) inside a panic-capture mechanism
(e.g., futures::FutureExt::catch_unwind or std::panic::catch_unwind with
AssertUnwindSafe around the future), await it into a Result, then call
IS_LOADING.store(false, Ordering::Release) in a finally-like path before
returning; if the panic was caught, re-panic (or convert to an Err) after
clearing IS_LOADING so behavior is unchanged but the flag is never left set;
reference symbols: IS_LOADING, spawn_result, Command::new(prover_path),
health_check.

Comment on lines +136 to +147
let available = self.proof_count();
if address_range.start > address_range.end {
return Err(IndexerError::InvalidParameters(format!(
"invalid address proof range {}..{}",
address_range.start, address_range.end
)));
}
if address_range.end > available {
return Err(IndexerError::InvalidParameters(format!(
"address proof range {}..{} exceeds available proofs {}",
address_range.start, address_range.end, available
)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast on inconsistent queue vector lengths instead of truncating with min.

proof_count() currently returns min(addresses.len(), low_element_indices.len()), which can silently accept malformed/misaligned payloads and produce partial proofs. For runtime safety, this should be a hard error.

Proposed fix
-    fn proof_count(&self) -> usize {
-        self.addresses.len().min(self.low_element_indices.len())
-    }
+    fn proof_count(&self) -> Result<usize, IndexerError> {
+        if self.addresses.len() != self.low_element_indices.len() {
+            return Err(IndexerError::InvalidParameters(format!(
+                "inconsistent address queue data: addresses={} low_element_indices={}",
+                self.addresses.len(),
+                self.low_element_indices.len()
+            )));
+        }
+        Ok(self.addresses.len())
+    }
-        let available = self.proof_count();
+        let available = self.proof_count()?;

Also applies to: 175-177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/indexer/types/queue.rs` around lines 136 - 147, The code
currently uses proof_count() which returns min(addresses.len(),
low_element_indices.len()), allowing silently truncated/misaligned payloads;
change the logic to fail fast by validating that addresses.len() ==
low_element_indices.len() and return an IndexerError::InvalidParameters (with a
clear message referencing the two lengths) when they differ instead of computing
min. Update the implementation of proof_count (or the callers that rely on it)
to perform this equality check and adjust the checks around address_range (the
block using available) and the other similar check at the later location (lines
around 175-177) to use the new validation so malformed vectors are rejected
rather than truncated.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
sdk-libs/sdk-types/src/interface/account/token_seeds.rs (1)

265-283: ⚠️ Potential issue | 🟠 Major

Do not silently ignore output_queue_index in context construction.

Line 268 keeps the parameter but ignores it, while Line 280 uses only tree_info.queue_pubkey_index. That can mask caller-side queue/index mismatches and build proof input with unintended queue context. Either remove the parameter from the trait surface, or validate consistency and fail fast.

Suggested defensive fix (non-breaking)
 fn into_in_token_data(
     &self,
     tree_info: &PackedStateTreeInfo,
-    _output_queue_index: u8,
+    output_queue_index: u8,
 ) -> Result<MultiInputTokenDataWithContext, LightSdkTypesError> {
+    if output_queue_index != tree_info.queue_pubkey_index {
+        return Err(LightSdkTypesError::InvalidInstructionData);
+    }
+
     Ok(MultiInputTokenDataWithContext {
         amount: self.token_data.amount,
         mint: self.token_data.mint,
         owner: self.token_data.owner,
         version: self.token_data.version,
         has_delegate: self.token_data.has_delegate,
         delegate: self.token_data.delegate,
         root_index: tree_info.root_index,
         merkle_context: PackedMerkleContext {
             merkle_tree_pubkey_index: tree_info.merkle_tree_pubkey_index,
-            queue_pubkey_index: tree_info.queue_pubkey_index,
+            queue_pubkey_index: output_queue_index,
             leaf_index: tree_info.leaf_index,
             prove_by_index: tree_info.prove_by_index,
         },
     })
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/sdk-types/src/interface/account/token_seeds.rs` around lines 265 -
283, The method into_in_token_data is currently ignoring the output_queue_index
parameter and always using tree_info.queue_pubkey_index for PackedMerkleContext;
update into_in_token_data to either use output_queue_index when constructing
merkle_context.queue_pubkey_index or validate that output_queue_index ==
tree_info.queue_pubkey_index and return a LightSdkTypesError if they differ
(fail fast). Specifically, modify the function into_in_token_data to perform the
consistency check on output_queue_index against tree_info.queue_pubkey_index (or
assign output_queue_index into merkle_context.queue_pubkey_index) before
returning MultiInputTokenDataWithContext so callers cannot accidentally create
proofs with the wrong queue context.
sdk-libs/client/src/interface/instructions.rs (1)

248-265: ⚠️ Potential issue | 🟠 Major

Add validation that cold_accounts and proof.accounts maintain matching order.

The code indexes tree_infos (built from proof.accounts) using indices from cold_accounts, relying on an implicit invariant: cold_accounts[i] must correspond to proof.accounts[i]. While the reordering loop preserves these indices correctly, this correspondence is never validated.

If proof.accounts and cold_accounts somehow arrive in different orders—whether through a caller error or future refactoring—the code would silently use mismatched root_index, merkle_tree_pubkey_index, leaf_index, and prove_by_index values, causing compressed account state to be corrupted during decompression.

Add a length check and a comment clarifying the invariant:

let packed_tree_infos = proof.pack_tree_infos(&mut remaining_accounts);
let tree_infos = &packed_tree_infos
    .state_trees
    .as_ref()
    .ok_or("missing state_trees in packed_tree_infos")?
    .packed_tree_infos;

// Invariant: tree_infos[i] must correspond to cold_accounts[i]
if tree_infos.len() != cold_accounts.len() {
    return Err("proof.accounts and cold_accounts length mismatch".into());
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/instructions.rs` around lines 248 - 265, The
loop assumes tree_infos (from proof.pack_tree_infos) and cold_accounts are
aligned by index but never validates this; before the loop that iterates
pda_indices/token_indices, call proof.pack_tree_infos(&mut remaining_accounts)
to obtain packed_tree_infos and tree_infos (the
packed_tree_infos.state_trees.packed_tree_infos) and add an explicit length
check and short comment "Invariant: tree_infos[i] must correspond to
cold_accounts[i]" that returns an Err if tree_infos.len() !=
cold_accounts.len(); place this validation immediately after building tree_infos
and before using it in the loop so the subsequent access in the loop
(cold_accounts[i], tree_infos.get(i)) is safe.
♻️ Duplicate comments (4)
prover/client/src/prover.rs (4)

32-38: ⚠️ Potential issue | 🟠 Major

build_http_client is still over-constrained for remote deployments.

Line 34 forces .no_proxy() for all endpoints, and there is no request timeout configured. This can break with_config remote prover usage and allow hung requests to stall longer than intended.

Suggested refactor
-pub(crate) fn build_http_client() -> Result<reqwest::Client, ProverClientError> {
-    reqwest::Client::builder()
-        .no_proxy()
-        .build()
+pub(crate) fn build_http_client(
+    server_address: &str,
+    request_timeout: Duration,
+) -> Result<reqwest::Client, ProverClientError> {
+    let mut builder = reqwest::Client::builder().timeout(request_timeout);
+    if server_address.contains("127.0.0.1") || server_address.contains("localhost") {
+        builder = builder.no_proxy();
+    }
+    builder.build()
         .map_err(|error| {
             ProverClientError::GenericError(format!("failed to build HTTP client: {error}"))
         })
 }
#!/bin/bash
# Verify whether build_http_client is used by remote-config code paths.
rg -nP --type=rust -C3 '\bpub\(crate\)\s+fn\s+build_http_client\b|\bbuild_http_client\s*\(' prover/client/src
rg -nP --type=rust -C3 '\bwith_config\b|\bserver_address\b|SERVER_ADDRESS' prover/client/src
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 32 - 38, The build_http_client
function currently forces .no_proxy() and provides no request timeout which
breaks remote deployments (affecting calls from with_config/server_address) and
can hang; modify build_http_client to make proxy behavior configurable (remove
or conditionalize .no_proxy() based on an optional env/config flag) and set a
sensible request timeout on the reqwest::ClientBuilder (e.g. via timeout())
while preserving the existing error mapping to ProverClientError::GenericError
so callers like with_config continue to work reliably.

41-44: ⚠️ Potential issue | 🟠 Major

Port-listener presence must not short-circuit health success.

Line 42-Line 44 returns true before probing /health. A stale/conflicting listener can be misclassified as a healthy prover.

Suggested fix
 fn health_check_once(timeout: Duration) -> bool {
     if prover_listener_present() {
-        return true;
+        tracing::debug!("prover listener detected; continuing with HTTP health probe");
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 41 - 44, The current
health_check_once function short-circuits and returns true when
prover_listener_present() is true, which can misclassify a stale listener as
healthy; remove the early return and always perform the actual /health probe
(use the existing HTTP probe logic in health_check_once or call the helper that
queries /health with the provided timeout) even if prover_listener_present() is
true, so the function only returns true when the /health endpoint responds
successfully within timeout; keep prover_listener_present() checks only for
informational/logging if needed, but do not use it to determine success.

41-122: ⚠️ Potential issue | 🟠 Major

Blocking OS/network calls are on the async health-check path.

health_check() is async, but health_check_once() performs blocking TcpStream I/O and blocking Command::output() (lsof/curl). Under retries, this can pin Tokio worker threads and degrade unrelated async work.

Suggested direction
-fn health_check_once(timeout: Duration) -> bool {
+async fn health_check_once(client: &reqwest::Client, timeout: Duration) -> bool {
     // Prefer async reqwest health probing; keep command fallback only via
-    // std::process::Command
+    // tokio::process::Command (or tokio::task::spawn_blocking for legacy fallback)
 }
 
 pub async fn health_check(retries: usize, timeout: usize) -> bool {
+    let client = build_http_client(SERVER_ADDRESS, Duration::from_secs(timeout as u64))
+        .unwrap_or_else(|_| reqwest::Client::new());
     ...
-        if health_check_once(timeout) {
+        if health_check_once(&client, timeout).await {
             return true;
         }
#!/bin/bash
# Verify sync/blocking calls reachable from async health_check.
rg -nP --type=rust -C2 'pub async fn health_check|fn health_check_once|TcpStream::connect_timeout|Command::new\("curl"\)|Command::new\("lsof"\)' prover/client/src/prover.rs

Also applies to: 173-184

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 41 - 122, health_check_once (called
from the async health_check) uses blocking TcpStream I/O and blocking
Command::output() in prover_listener_present and health_check_once_with_curl,
which can block Tokio worker threads; change those synchronous calls to
async-aware equivalents: replace TcpStream::connect_timeout/read/write with
tokio::net::TcpStream async connect/read/write (with tokio::time::timeout)
inside health_check_once, and replace Command::new(...).output() usages in
prover_listener_present and health_check_once_with_curl with async alternatives
(tokio::process::Command::output or wrap the blocking lsof invocation in
tokio::task::spawn_blocking if lsof must be used), ensuring timeouts are applied
and errors are logged via tracing as before.

152-167: ⚠️ Potential issue | 🔴 Critical

IS_LOADING can remain stuck true on panic paths.

If any panic occurs between Line 152 and Line 164, Line 166 is skipped and future calls may permanently take the “already loading” branch.

Suggested fix (RAII guard)
+struct LoadingGuard;
+impl Drop for LoadingGuard {
+    fn drop(&mut self) {
+        IS_LOADING.store(false, Ordering::Release);
+    }
+}
+
-        let spawn_result = async {
+        let _loading_guard = LoadingGuard;
+        async {
             Command::new(&prover_path)
                 .arg("start-prover")
                 .spawn()
                 .unwrap_or_else(|error| panic!("Failed to start prover process: {error}"));
@@
-        }
-        .await;
-
-        IS_LOADING.store(false, Ordering::Release);
-        spawn_result
+        }
+        .await
#!/bin/bash
# Verify panic/early-unwind paths vs flag reset.
rg -nP -C3 'compare_exchange\(false,\s*true|panic!\(|unwrap_or_else\(|IS_LOADING\.store\(false' prover/client/src/prover.rs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 152 - 167, IS_LOADING can remain
true if the block panics before IS_LOADING.store(false, Ordering::Release) runs;
wrap the startup sequence in an RAII/scope guard that resets IS_LOADING to false
in its Drop implementation (or use catch_unwind to ensure reset on unwind) so
the flag is always cleared regardless of panics; create a small guard type
(e.g., ProverLoadingGuard) created before calling
Command::new(&prover_path).arg("start-prover").spawn() and health_check(...) and
implement Drop to call IS_LOADING.store(false, Ordering::Release), then remove
the direct store(false) call after spawn_result so the guard is authoritative.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@sdk-libs/client/src/interface/instructions.rs`:
- Around line 248-265: The loop assumes tree_infos (from proof.pack_tree_infos)
and cold_accounts are aligned by index but never validates this; before the loop
that iterates pda_indices/token_indices, call proof.pack_tree_infos(&mut
remaining_accounts) to obtain packed_tree_infos and tree_infos (the
packed_tree_infos.state_trees.packed_tree_infos) and add an explicit length
check and short comment "Invariant: tree_infos[i] must correspond to
cold_accounts[i]" that returns an Err if tree_infos.len() !=
cold_accounts.len(); place this validation immediately after building tree_infos
and before using it in the loop so the subsequent access in the loop
(cold_accounts[i], tree_infos.get(i)) is safe.

In `@sdk-libs/sdk-types/src/interface/account/token_seeds.rs`:
- Around line 265-283: The method into_in_token_data is currently ignoring the
output_queue_index parameter and always using tree_info.queue_pubkey_index for
PackedMerkleContext; update into_in_token_data to either use output_queue_index
when constructing merkle_context.queue_pubkey_index or validate that
output_queue_index == tree_info.queue_pubkey_index and return a
LightSdkTypesError if they differ (fail fast). Specifically, modify the function
into_in_token_data to perform the consistency check on output_queue_index
against tree_info.queue_pubkey_index (or assign output_queue_index into
merkle_context.queue_pubkey_index) before returning
MultiInputTokenDataWithContext so callers cannot accidentally create proofs with
the wrong queue context.

---

Duplicate comments:
In `@prover/client/src/prover.rs`:
- Around line 32-38: The build_http_client function currently forces .no_proxy()
and provides no request timeout which breaks remote deployments (affecting calls
from with_config/server_address) and can hang; modify build_http_client to make
proxy behavior configurable (remove or conditionalize .no_proxy() based on an
optional env/config flag) and set a sensible request timeout on the
reqwest::ClientBuilder (e.g. via timeout()) while preserving the existing error
mapping to ProverClientError::GenericError so callers like with_config continue
to work reliably.
- Around line 41-44: The current health_check_once function short-circuits and
returns true when prover_listener_present() is true, which can misclassify a
stale listener as healthy; remove the early return and always perform the actual
/health probe (use the existing HTTP probe logic in health_check_once or call
the helper that queries /health with the provided timeout) even if
prover_listener_present() is true, so the function only returns true when the
/health endpoint responds successfully within timeout; keep
prover_listener_present() checks only for informational/logging if needed, but
do not use it to determine success.
- Around line 41-122: health_check_once (called from the async health_check)
uses blocking TcpStream I/O and blocking Command::output() in
prover_listener_present and health_check_once_with_curl, which can block Tokio
worker threads; change those synchronous calls to async-aware equivalents:
replace TcpStream::connect_timeout/read/write with tokio::net::TcpStream async
connect/read/write (with tokio::time::timeout) inside health_check_once, and
replace Command::new(...).output() usages in prover_listener_present and
health_check_once_with_curl with async alternatives
(tokio::process::Command::output or wrap the blocking lsof invocation in
tokio::task::spawn_blocking if lsof must be used), ensuring timeouts are applied
and errors are logged via tracing as before.
- Around line 152-167: IS_LOADING can remain true if the block panics before
IS_LOADING.store(false, Ordering::Release) runs; wrap the startup sequence in an
RAII/scope guard that resets IS_LOADING to false in its Drop implementation (or
use catch_unwind to ensure reset on unwind) so the flag is always cleared
regardless of panics; create a small guard type (e.g., ProverLoadingGuard)
created before calling Command::new(&prover_path).arg("start-prover").spawn()
and health_check(...) and implement Drop to call IS_LOADING.store(false,
Ordering::Release), then remove the direct store(false) call after spawn_result
so the guard is authoritative.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8b2a06c2-57e8-4d97-871e-5018cf01747a

📥 Commits

Reviewing files that changed from the base of the PR and between f9a788c and 7ab1c90.

⛔ Files ignored due to path filters (1)
  • sdk-tests/csdk-anchor-full-derived-test/tests/basic_test.rs is excluded by none and included by none
📒 Files selected for processing (6)
  • prover/client/src/errors.rs
  • prover/client/src/prover.rs
  • sdk-libs/client/src/interface/instructions.rs
  • sdk-libs/program-test/src/indexer/test_indexer.rs
  • sdk-libs/sdk-types/src/interface/account/token_seeds.rs
  • sdk-libs/sdk-types/src/interface/program/decompression/pda.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant