fix(chain): add rpc retry logic by gacevicljubisa · Pull Request #5341 · ethersphere/bee

gacevicljubisa · 2026-01-30T15:42:41Z

Checklist

I have read the coding guide.
My change requires a documentation update, and I have done it.
I have added tests to cover my changes.
I have filled out the description and linked the related issues.

Description

The bee node was experiencing intermittent connection failures when communicating with blockchain RPC endpoints (especially HAProxy-backed services), resulting in errors like:

"could not get block number" "error"="Post \"http://rpc-sepolia-haproxy...\": EOF"

These EOF errors occur due to short timeout configurations on HAProxy which can prematurely close connections during RPC communication. Without retry logic, these transient network errors (EOF, connection resets) would cause operations to fail immediately, impacting node stability and reliability.

Implemented a custom retryRoundTripper that wraps the HTTP transport with retry logic

Key Features:

Automatic retries: Up to 3 retry attempts for transient errors
Exponential backoff: 100ms → 200ms → 400ms between retries (capped at 5s)
Error detection: Retries on EOF, connection reset, connection refused, and broken pipe errors
Request body handling: Rewinds request bodies between retries using GetBody() to avoid content-length mismatches
Uses http.DefaultTransport wrapped with retry logic

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

janos · 2026-01-30T16:48:15Z

pkg/node/chain.go

+	for attempt := 0; attempt <= r.maxRetries; attempt++ {
+		if attempt > 0 {
+			backoffDuration := min(r.backoff*time.Duration(1<<uint(attempt-1)), 5*time.Second)
+			time.Sleep(backoffDuration)


Instead of sleep, use time.After and req.Context in select to return as soon as possible if context is done.

janos · 2026-01-30T16:51:29Z

pkg/node/chain.go

+			// Reset request body for retry if possible
+			if req.Body != nil && req.GetBody != nil {
+				var err error
+				req.Body, err = req.GetBody()


This may be problematic if the Body is already read or closed. There is a check below for non-rewritable body, so it should be ok, I suppose.

In general, only idempotent requests should be retried, but in case of rpc, identifying idempotent requests are not easy. Getting information is ok, but posting information may present a problem by duplicating them.

Maybe it is the simpliest to increase the values for the timeouts on the RPC side?
These align with HAProxy timeouts to prevent errors, but they look low?

func newHTTPClient() *http.Client { return &http.Client{ Transport: &http.Transport{ DialContext: (&net.Dialer{ Timeout: 9 * time.Second, // < HAProxy timeout connect (10s) KeepAlive: 5 * time.Second, // Keep connections alive but close before idle timeout }).DialContext, IdleConnTimeout: 8 * time.Second, // < HAProxy timeout client (10s) TLSHandshakeTimeout: 9 * time.Second, // < HAProxy timeout connect (10s) ResponseHeaderTimeout: 9 * time.Second, // < HAProxy timeout server (10s) }, } }

I agree, even reties would make the rpc endpoint do more work in this case. Increasing the timeout would be better in this case.

akrem-chabchoub · 2026-02-02T03:19:13Z

pkg/node/chain.go

+}
+
+// isRetryableError determines if an error should trigger a retry.
+func isRetryableError(err error) bool {


Are these checks for error codes compatible with windows ?
AFAIK that windows have different error codes, maybe some unit tests can reveal it ?

akrem-chabchoub · 2026-02-02T03:35:27Z

pkg/node/chain.go

+		}
+	}
+
+	return nil, fmt.Errorf("request failed after %d retries: %w", r.maxRetries, lastErr)


The loop above runs for attempt in 0 to maxRetries, for example with maxRetries = 3, there are 4 attempts, but the error says 3 retries.

akrem-chabchoub · 2026-02-02T03:45:59Z

Take a look on this package, it may help simplifying the implementation:
https://github.com/hashicorp/go-retryablehttp

gacevicljubisa added 2 commits January 30, 2026 13:55

fix(chain): add retry logic for transient RPC connection errors

2ad833a

fix(node): add retry logic for blockchain RPC transient errors

0b379f1

gacevicljubisa requested review from acud, akrem-chabchoub, janos and martinconic January 30, 2026 16:12

janos reviewed Jan 30, 2026

View reviewed changes

akrem-chabchoub reviewed Feb 2, 2026

View reviewed changes

acud added this to the v2.8.0 milestone Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chain): add rpc retry logic#5341

fix(chain): add rpc retry logic#5341
gacevicljubisa wants to merge 2 commits intomasterfrom
fix/chain-rpc-retry-logic

gacevicljubisa commented Jan 30, 2026 •

edited

Loading

Uh oh!

janos Jan 30, 2026

Uh oh!

janos Jan 30, 2026

Uh oh!

gacevicljubisa Jan 30, 2026 •

edited

Loading

Uh oh!

janos Jan 30, 2026

Uh oh!

akrem-chabchoub Feb 2, 2026

Uh oh!

akrem-chabchoub Feb 2, 2026

Uh oh!

akrem-chabchoub commented Feb 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gacevicljubisa commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Description

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

Uh oh!

janos Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

janos Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

gacevicljubisa Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janos Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

akrem-chabchoub Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

akrem-chabchoub Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

akrem-chabchoub commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gacevicljubisa commented Jan 30, 2026 •

edited

Loading

gacevicljubisa Jan 30, 2026 •

edited

Loading

akrem-chabchoub commented Feb 2, 2026 •

edited

Loading