feat: RPC node mode for query-only block following#72
Conversation
Add a third node mode (RPC) where a node syncs finalized blocks from a trusted validator via HTTP and serves the full REST/WebSocket/gRPC API without participating in consensus or P2P networking. - NodeMode enum (Standalone/Validator/Rpc) with tri-state detection - BlockApplier: shared block-application logic extracted from 3 paths in NodeCoordinator (finalization, P2P sync, block payload) - BlockSyncService: HTTP polling sync loop with exponential backoff - TxForwarder: ITxForwarder interface + HttpTxForwarder for forwarding transactions from RPC nodes to validators - Sync endpoints: GET /v1/sync/status, GET /v1/sync/blocks - TxForwarderRef mutable wrapper for late binding in Program.cs - Enhanced /v1/health with mode field and syncLag (503 if lag > 50) - Docker: rpc-0 service in devnet and testnet compose files - Caddy: public API traffic routed through rpc-0 instead of validator-0 - 8 new NodeConfiguration tests for ResolvedMode (2789 total, 0 failures)
RPC nodes are the public-facing API layer — they need higher throughput than validators. Increase per-IP rate limit from 100 to 1000 req/min in RPC mode and allow any CORS origin (like debug mode) since the RPC node serves Explorer, Caldera, and third-party consumers.
Faucet and gRPC endpoints created transactions via mempool.Add(), bypassing the HttpTxForwarder wired into POST /v1/transactions. On RPC nodes this meant transactions stayed in the local mempool and never reached block-producing validators. - Move ITxForwarder/TxForwarderRef to Basalt.Execution (shared) - Add txForwarder param to FaucetEndpoint.MapFaucetEndpoint - Add ITxForwarder to BasaltNodeService via DI - Register TxForwarderRef as ITxForwarder singleton in Program.cs
UseWebSockets() defaulted to KeepAliveInterval=Zero (no pings). Reverse proxies drop idle WebSocket connections, causing repeated "network connection was lost" errors in the Explorer.
BlockHeader.Timestamp is stored in milliseconds but the price-history endpoint returned it as-is. TradingView Lightweight Charts expects Unix seconds, causing chart dates to render in year 50,000+.
There was a problem hiding this comment.
Pull request overview
Adds an RPC node mode that follows finalized blocks from a trusted validator over HTTP while serving the full API (without participating in consensus), and refactors shared block-application logic into a reusable BlockApplier.
Changes:
- Introduce tri-state node mode selection (
auto/validator/rpc/standalone) viaBASALT_MODE+BASALT_SYNC_SOURCE. - Add RPC sync + forwarding components (
BlockSyncService,HttpTxForwarder) and expose sync endpoints (/v1/sync/status,/v1/sync/blocks). - Refactor consensus/sync block application into
BlockApplier, wire RPC/devnet/testnet deployment updates, and fix DEX price-history timestamps (ms → s).
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/Basalt.Node.Tests/NodeConfigurationTests.cs | Adds tests for ResolvedMode and RPC-mode validation. |
| src/node/Basalt.Node/TxForwarder.cs | Introduces HTTP tx forwarding for RPC nodes plus a no-op implementation. |
| src/node/Basalt.Node/README.md | Documents the new runtime modes, especially RPC mode. |
| src/node/Basalt.Node/Program.cs | Wires RPC mode branch, higher rate limits/CORS in RPC, health sync-lag reporting, WS keep-alive, and forwarding hooks. |
| src/node/Basalt.Node/NodeCoordinator.cs | Refactors finalized/sync block application to use shared BlockApplier and wires epoch-transition handling. |
| src/node/Basalt.Node/NodeConfiguration.cs | Adds NodeMode, Mode/SyncSource, and ResolvedMode logic (plus env parsing). |
| src/node/Basalt.Node/BlockSyncService.cs | Implements HTTP polling sync loop with exponential backoff and lag tracking. |
| src/node/Basalt.Node/BlockApplier.cs | New shared block execution/application/persistence component for consensus + sync paths. |
| src/execution/Basalt.Execution/ITxForwarder.cs | Adds forwarding interface and a mutable TxForwarderRef used for late binding in RPC mode. |
| src/api/Basalt.Api.Rest/RestApiEndpoints.cs | Adds sync endpoints, tx forwarding hook, and fixes price-history timestamps (ms → s). |
| src/api/Basalt.Api.Rest/README.md | Documents new sync endpoints and updated faucet behavior in RPC mode. |
| src/api/Basalt.Api.Rest/FaucetEndpoint.cs | Forwards faucet-generated transactions upstream in RPC mode. |
| src/api/Basalt.Api.Grpc/BasaltNodeService.cs | Forwards gRPC-submitted transactions upstream in RPC mode. |
| docker-compose.yml | Adds rpc-0 service for devnet compose. |
| deploy/testnet/docker-compose.yml | Adds rpc-0 service and routes dependencies through it. |
| deploy/testnet/Caddyfile | Routes public traffic to rpc-0 instead of a validator. |
| README.md | Updates repo docs for added RPC service and new env vars. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Phase 2: Add executed blocks to chain and persist | ||
| foreach (var (block, raw, bitmap) in blocks) | ||
| { | ||
| if (block.Receipts == null && block.Transactions.Count > 0) | ||
| break; // This block wasn't executed (failed in phase 1) | ||
|
|
||
| var result = _chainManager.AddBlock(block); | ||
| if (!result.IsSuccess) |
There was a problem hiding this comment.
The Phase 2 guard uses block.Receipts == null && block.Transactions.Count > 0 to decide whether a block was executed. This allows blocks with 0 transactions (but still needing DEX settlement / state root changes) to be added even if Phase 1 bailed out earlier, and also can’t distinguish "executed successfully with 0 receipts" from "never executed". Track execution success explicitly (e.g., an executed-count or per-block flag) and only add/persist the executed prefix.
| var requestedCount = Math.Min(count ?? 100, 100); | ||
| var blocks = new List<SyncBlockEntry>(); | ||
|
|
||
| for (ulong n = from; n < from + (ulong)requestedCount; n++) | ||
| { |
There was a problem hiding this comment.
count can be negative here: Math.Min(count ?? 100, 100) will keep negative values, and (ulong)requestedCount will then wrap to a huge number. That can turn this endpoint into an unbounded/very large loop (DoS) depending on from/tip. Clamp count to a sane positive range (e.g., 1..100) before using it, and consider guarding from + requestedCount overflow.
| var requestedCount = Math.Min(count ?? 100, 100); | |
| var blocks = new List<SyncBlockEntry>(); | |
| for (ulong n = from; n < from + (ulong)requestedCount; n++) | |
| { | |
| // Clamp requested count to a sane positive range [1, 100] | |
| var requestedCount = count.HasValue | |
| ? Math.Clamp(count.Value, 1, 100) | |
| : 100; | |
| var blocks = new List<SyncBlockEntry>(); | |
| // Guard against overflow when computing the loop upper bound | |
| ulong requestedCountU = (ulong)requestedCount; | |
| ulong endExclusive; | |
| if (from > ulong.MaxValue - requestedCountU) | |
| { | |
| endExclusive = ulong.MaxValue; | |
| } | |
| else | |
| { | |
| endExclusive = from + requestedCountU; | |
| } | |
| for (ulong n = from; n < endExclusive; n++) | |
| { |
| await _httpClient.PostAsJsonAsync( | ||
| "/v1/transactions", | ||
| request, | ||
| BasaltApiJsonContext.Default.TransactionRequest, | ||
| cts.Token); |
There was a problem hiding this comment.
ForwardAsync logs only on exceptions, but PostAsJsonAsync does not throw on non-2xx responses. As written, 4xx/5xx from the sync source will be silently ignored, contradicting the "logs warnings on failure" behavior. Capture the HttpResponseMessage, check IsSuccessStatusCode (or call EnsureSuccessStatusCode), and log status code/body on failure.
| await _httpClient.PostAsJsonAsync( | |
| "/v1/transactions", | |
| request, | |
| BasaltApiJsonContext.Default.TransactionRequest, | |
| cts.Token); | |
| var response = await _httpClient.PostAsJsonAsync( | |
| "/v1/transactions", | |
| request, | |
| BasaltApiJsonContext.Default.TransactionRequest, | |
| cts.Token); | |
| if (!response.IsSuccessStatusCode) | |
| { | |
| string responseBody = string.Empty; | |
| try | |
| { | |
| responseBody = await response.Content.ReadAsStringAsync(cts.Token); | |
| } | |
| catch | |
| { | |
| // Ignore failures reading the response body; we still log the status code. | |
| } | |
| _logger?.LogWarning( | |
| "Sync source returned non-success status when forwarding tx {Hash}: {StatusCode} {ResponseBody}", | |
| tx.Hash.ToHexString()[..16], | |
| (int)response.StatusCode, | |
| responseBody); | |
| } |
| // Hook epoch transitions to rewire consensus-specific components | ||
| _blockApplier.OnEpochTransition += (newSet, blockNumber) => | ||
| { | ||
| var oldCount = _validatorSet?.Count ?? 0; |
There was a problem hiding this comment.
oldCount is computed but never used inside the epoch-transition handler. Remove it or use it in a log message to avoid dead code and keep the handler focused on the rewiring it performs.
| var oldCount = _validatorSet?.Count ?? 0; |
| handle /health { | ||
| reverse_proxy validator-0:5000 { | ||
| reverse_proxy rpc-0:5000 { | ||
| rewrite /v1/status |
There was a problem hiding this comment.
This /health handler rewrites to /v1/status, which always returns 200 and doesn't reflect the RPC-mode sync-lag health logic added at /v1/health. If the intention is for external health checks to fail when the RPC node is >50 blocks behind, consider rewriting to /v1/health instead of /v1/status.
| rewrite /v1/status | |
| rewrite /v1/health |
| if (string.Equals(Mode, "validator", StringComparison.OrdinalIgnoreCase) | ||
| || (Peers.Length > 0 && ValidatorIndex >= 0)) | ||
| return NodeMode.Validator; | ||
|
|
||
| if (string.Equals(Mode, "standalone", StringComparison.OrdinalIgnoreCase)) | ||
| return NodeMode.Standalone; | ||
|
|
There was a problem hiding this comment.
ResolvedMode currently treats any config with Peers + ValidatorIndex>=0 as Validator even when BASALT_MODE is explicitly set to "standalone" (because the auto-detection check runs before the standalone override). This makes explicit standalone ineffective in some env combinations. Consider only applying the Peers/ValidatorIndex auto-detection when Mode == "auto", and/or check for explicit "standalone" before the validator auto-detection.
| if (string.Equals(Mode, "validator", StringComparison.OrdinalIgnoreCase) | |
| || (Peers.Length > 0 && ValidatorIndex >= 0)) | |
| return NodeMode.Validator; | |
| if (string.Equals(Mode, "standalone", StringComparison.OrdinalIgnoreCase)) | |
| return NodeMode.Standalone; | |
| if (string.Equals(Mode, "validator", StringComparison.OrdinalIgnoreCase)) | |
| return NodeMode.Validator; | |
| if (string.Equals(Mode, "standalone", StringComparison.OrdinalIgnoreCase)) | |
| return NodeMode.Standalone; | |
| // auto / implicit: infer validator if peers + validator index are configured | |
| if (string.Equals(Mode, "auto", StringComparison.OrdinalIgnoreCase) || string.IsNullOrWhiteSpace(Mode)) | |
| { | |
| if (Peers.Length > 0 && ValidatorIndex >= 0) | |
| return NodeMode.Validator; | |
| } |
| var localTip = _chainManager.LatestBlockNumber; | ||
| var remoteTip = status.LatestBlock; | ||
| Volatile.Write(ref _syncLag, (int)Math.Min(remoteTip - localTip, int.MaxValue)); | ||
|
|
||
| if (remoteTip <= localTip) | ||
| { | ||
| // Caught up — sleep for one block time, then poll again | ||
| _backoffMs = 1000; // Reset backoff | ||
| await Task.Delay((int)_chainParams.BlockTimeMs, ct); |
There was a problem hiding this comment.
SyncLag is computed via (remoteTip - localTip) before checking remoteTip <= localTip. When the remote tip is behind (or chain reset), the ulong subtraction underflows and SyncLag becomes a huge value (clamped to int.MaxValue), which will incorrectly trip the health check. Compute lag only after confirming remoteTip > localTip, or use a conditional subtraction.
| // Phase 3: Atomically swap state only if ALL blocks succeeded | ||
| if (applied == blocks.Count && applied > 0) | ||
| { | ||
| stateDbRef.Swap(forkedState); | ||
| _logger.LogInformation("Synced {Count} blocks, now at #{Height}", | ||
| applied, _chainManager.LatestBlockNumber); | ||
| } |
There was a problem hiding this comment.
ApplyBatch adds blocks to ChainManager and persists them in Phase 2, but only swaps canonical state in Phase 3 if all blocks succeed. If a later block fails (AddBlock fails, or execution failed in Phase 1), this can leave ChainManager/block persistence ahead of canonical state (state swap skipped), causing chain/state divergence for API reads and future execution. Consider only mutating ChainManager/persistence once you know you will swap state (or implement a rollback to the pre-sync tip when applied != blocks.Count).
Summary
BlockApplierfrom NodeCoordinator to eliminate duplicated block-application logic across 3 code pathsBlockSyncService(HTTP polling with exponential backoff),HttpTxForwarder(fire-and-forget tx forwarding to validators), and sync endpoints (/v1/sync/status,/v1/sync/blocks)rpc-0service to both devnet and testnet Docker Compose, route public traffic through RPC via CaddyResolvedModetri-state detectionNew environment variables
BASALT_MODEautoauto,validator,rpc, orstandaloneBASALT_SYNC_SOURCErpcmode)Test plan
dotnet build— 0 warnings, 0 errorsdotnet test— 2,789 tests pass, 0 failuresResolvedModetests cover all mode combinations (auto, rpc, validator, standalone, missing SyncSource)docker compose up— 4 validators + 1 RPC, verify RPC syncs and serves API