-
Notifications
You must be signed in to change notification settings - Fork 3
fix: solana indexer hang #647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Mitigates Solana indexer “block height stuck” scenarios by adding bounded timeouts/retries around transaction fetches and adding a watchdog to force reconnect when the WebSocket log stream goes silent.
Changes:
- Added
get_tx_with_timeout_retrywith configurable timeout, retries, exponential backoff, and jitter forgetTransactioncalls. - Updated CPI event parsing to use the new timeout+retry transaction fetch helper.
- Reworked Solana WS log subscriptions to use a watchdog via
tokio::select!and reconnect on stalled/ended streams.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
13ba881 to
1d05d0b
Compare
e88685a to
b4f343e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // On any RPC error, retry until max_attempts is reached. | ||
| let e_anyhow = anyhow::anyhow!(e).context(format!( | ||
| "getTransaction failed (attempt {attempt}/{}) for {}", | ||
| cfg.max_attempts, signature | ||
| )); | ||
|
|
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says the retry decision is based on error content, but the implementation retries unconditionally for all Err(e) until max_attempts. Either implement error-based retry filtering (e.g., avoid retrying on "not found"/finalized errors) or adjust the comment to match behavior.
| // On any RPC error, retry until max_attempts is reached. | |
| let e_anyhow = anyhow::anyhow!(e).context(format!( | |
| "getTransaction failed (attempt {attempt}/{}) for {}", | |
| cfg.max_attempts, signature | |
| )); | |
| // For some RPC errors (e.g., transaction not found or already finalized), | |
| // further retries are not useful. Treat those as terminal based on the | |
| // error message content; otherwise, retry until max_attempts is reached. | |
| let err_msg = e.to_string(); | |
| let e_anyhow = anyhow::anyhow!(e).context(format!( | |
| "getTransaction failed (attempt {attempt}/{}) for {}", | |
| cfg.max_attempts, signature | |
| )); | |
| // Do not retry on clearly terminal conditions. | |
| if err_msg.contains("not found") || err_msg.contains("finalized") { | |
| return Err(e_anyhow); | |
| } |
7916c0c to
30633db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
30633db to
7ec4b4b
Compare
7ec4b4b to
66a95cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // stall watchdog | ||
| let stall_timeout = Duration::from_secs(60); | ||
| let mut last_ws_msg = Instant::now(); | ||
| let mut watchdog = tokio::time::interval(Duration::from_secs(5)); |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The watchdog currently bails out if no logsSubscribe notification is received for 60s. With RpcTransactionLogsFilter::Mentions, it’s normal to receive no messages during periods with no transactions mentioning this program, so this can force reconnect loops even when the WS connection is healthy. Consider basing the stall detection on a true WS keepalive/ping mechanism (or client-level health check), or making the stall timeout configurable and large enough to tolerate expected inactivity.
| // Watchdog: force reconnect if WS goes silent | ||
| let stall_timeout = Duration::from_secs(60); | ||
| let mut last_ws_msg = Instant::now(); | ||
| let mut watchdog = tokio::time::interval(Duration::from_secs(5)); |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as the CPI subscription: the 60s watchdog is keyed to receipt of logsSubscribe notifications, but no notifications is a normal state when the program has no activity. This can cause perpetual reconnect churn. Consider moving to WS-level keepalive detection or making the stall timeout configurable/disabled by default for low-traffic deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the first place where we use exponential backoff. Can we create utils and use them across the project?
the reason why solana indexer block height stuck could be:
in this PR, both will result in the function returning Err, then the solana client will resubscribe.
fix this issue: #648