Skip to content

feat: RPC node mode for query-only block following#72

Open
0xZunia wants to merge 6 commits intomainfrom
feature/rpc-node-mode
Open

feat: RPC node mode for query-only block following#72
0xZunia wants to merge 6 commits intomainfrom
feature/rpc-node-mode

Conversation

@0xZunia
Copy link
Contributor

@0xZunia 0xZunia commented Mar 2, 2026

Summary

  • Add a third node mode (RPC) alongside Validator and Standalone — syncs finalized blocks from a trusted source via HTTP and serves the full API without participating in consensus
  • Extract shared BlockApplier from NodeCoordinator to eliminate duplicated block-application logic across 3 code paths
  • Add BlockSyncService (HTTP polling with exponential backoff), HttpTxForwarder (fire-and-forget tx forwarding to validators), and sync endpoints (/v1/sync/status, /v1/sync/blocks)
  • Forward faucet and gRPC transactions to validators when running in RPC mode
  • Relax rate limits (1000 req/min/IP) and CORS for public-facing RPC nodes
  • Enable 30s WebSocket keep-alive to prevent proxy idle disconnects
  • Fix price-history endpoint returning millisecond timestamps instead of seconds
  • Add rpc-0 service to both devnet and testnet Docker Compose, route public traffic through RPC via Caddy
  • 8 new NodeConfiguration tests for ResolvedMode tri-state detection

New environment variables

Variable Default Description
BASALT_MODE auto Node mode: auto, validator, rpc, or standalone
BASALT_SYNC_SOURCE HTTP URL of sync source (required for rpc mode)

Test plan

  • dotnet build — 0 warnings, 0 errors
  • dotnet test — 2,789 tests pass, 0 failures
  • ResolvedMode tests cover all mode combinations (auto, rpc, validator, standalone, missing SyncSource)
  • Docker: docker compose up — 4 validators + 1 RPC, verify RPC syncs and serves API
  • Testnet: verify faucet transactions reach validators via tx forwarding
  • Testnet: verify WebSocket stays connected through Caddy reverse proxy

0xZunia added 6 commits March 2, 2026 11:46
Add a third node mode (RPC) where a node syncs finalized blocks from a
trusted validator via HTTP and serves the full REST/WebSocket/gRPC API
without participating in consensus or P2P networking.

- NodeMode enum (Standalone/Validator/Rpc) with tri-state detection
- BlockApplier: shared block-application logic extracted from 3 paths
  in NodeCoordinator (finalization, P2P sync, block payload)
- BlockSyncService: HTTP polling sync loop with exponential backoff
- TxForwarder: ITxForwarder interface + HttpTxForwarder for forwarding
  transactions from RPC nodes to validators
- Sync endpoints: GET /v1/sync/status, GET /v1/sync/blocks
- TxForwarderRef mutable wrapper for late binding in Program.cs
- Enhanced /v1/health with mode field and syncLag (503 if lag > 50)
- Docker: rpc-0 service in devnet and testnet compose files
- Caddy: public API traffic routed through rpc-0 instead of validator-0
- 8 new NodeConfiguration tests for ResolvedMode (2789 total, 0 failures)
RPC nodes are the public-facing API layer — they need higher throughput
than validators. Increase per-IP rate limit from 100 to 1000 req/min in
RPC mode and allow any CORS origin (like debug mode) since the RPC node
serves Explorer, Caldera, and third-party consumers.
Faucet and gRPC endpoints created transactions via mempool.Add(),
bypassing the HttpTxForwarder wired into POST /v1/transactions.
On RPC nodes this meant transactions stayed in the local mempool
and never reached block-producing validators.

- Move ITxForwarder/TxForwarderRef to Basalt.Execution (shared)
- Add txForwarder param to FaucetEndpoint.MapFaucetEndpoint
- Add ITxForwarder to BasaltNodeService via DI
- Register TxForwarderRef as ITxForwarder singleton in Program.cs
UseWebSockets() defaulted to KeepAliveInterval=Zero (no pings).
Reverse proxies drop idle WebSocket connections, causing repeated
"network connection was lost" errors in the Explorer.
BlockHeader.Timestamp is stored in milliseconds but the price-history
endpoint returned it as-is. TradingView Lightweight Charts expects Unix
seconds, causing chart dates to render in year 50,000+.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an RPC node mode that follows finalized blocks from a trusted validator over HTTP while serving the full API (without participating in consensus), and refactors shared block-application logic into a reusable BlockApplier.

Changes:

  • Introduce tri-state node mode selection (auto / validator / rpc / standalone) via BASALT_MODE + BASALT_SYNC_SOURCE.
  • Add RPC sync + forwarding components (BlockSyncService, HttpTxForwarder) and expose sync endpoints (/v1/sync/status, /v1/sync/blocks).
  • Refactor consensus/sync block application into BlockApplier, wire RPC/devnet/testnet deployment updates, and fix DEX price-history timestamps (ms → s).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/Basalt.Node.Tests/NodeConfigurationTests.cs Adds tests for ResolvedMode and RPC-mode validation.
src/node/Basalt.Node/TxForwarder.cs Introduces HTTP tx forwarding for RPC nodes plus a no-op implementation.
src/node/Basalt.Node/README.md Documents the new runtime modes, especially RPC mode.
src/node/Basalt.Node/Program.cs Wires RPC mode branch, higher rate limits/CORS in RPC, health sync-lag reporting, WS keep-alive, and forwarding hooks.
src/node/Basalt.Node/NodeCoordinator.cs Refactors finalized/sync block application to use shared BlockApplier and wires epoch-transition handling.
src/node/Basalt.Node/NodeConfiguration.cs Adds NodeMode, Mode/SyncSource, and ResolvedMode logic (plus env parsing).
src/node/Basalt.Node/BlockSyncService.cs Implements HTTP polling sync loop with exponential backoff and lag tracking.
src/node/Basalt.Node/BlockApplier.cs New shared block execution/application/persistence component for consensus + sync paths.
src/execution/Basalt.Execution/ITxForwarder.cs Adds forwarding interface and a mutable TxForwarderRef used for late binding in RPC mode.
src/api/Basalt.Api.Rest/RestApiEndpoints.cs Adds sync endpoints, tx forwarding hook, and fixes price-history timestamps (ms → s).
src/api/Basalt.Api.Rest/README.md Documents new sync endpoints and updated faucet behavior in RPC mode.
src/api/Basalt.Api.Rest/FaucetEndpoint.cs Forwards faucet-generated transactions upstream in RPC mode.
src/api/Basalt.Api.Grpc/BasaltNodeService.cs Forwards gRPC-submitted transactions upstream in RPC mode.
docker-compose.yml Adds rpc-0 service for devnet compose.
deploy/testnet/docker-compose.yml Adds rpc-0 service and routes dependencies through it.
deploy/testnet/Caddyfile Routes public traffic to rpc-0 instead of a validator.
README.md Updates repo docs for added RPC service and new env vars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +207 to +214
// Phase 2: Add executed blocks to chain and persist
foreach (var (block, raw, bitmap) in blocks)
{
if (block.Receipts == null && block.Transactions.Count > 0)
break; // This block wasn't executed (failed in phase 1)

var result = _chainManager.AddBlock(block);
if (!result.IsSuccess)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Phase 2 guard uses block.Receipts == null && block.Transactions.Count > 0 to decide whether a block was executed. This allows blocks with 0 transactions (but still needing DEX settlement / state root changes) to be added even if Phase 1 bailed out earlier, and also can’t distinguish "executed successfully with 0 receipts" from "never executed". Track execution success explicitly (e.g., an executed-count or per-block flag) and only add/persist the executed prefix.

Copilot uses AI. Check for mistakes.
Comment on lines +1121 to +1125
var requestedCount = Math.Min(count ?? 100, 100);
var blocks = new List<SyncBlockEntry>();

for (ulong n = from; n < from + (ulong)requestedCount; n++)
{
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count can be negative here: Math.Min(count ?? 100, 100) will keep negative values, and (ulong)requestedCount will then wrap to a huge number. That can turn this endpoint into an unbounded/very large loop (DoS) depending on from/tip. Clamp count to a sane positive range (e.g., 1..100) before using it, and consider guarding from + requestedCount overflow.

Suggested change
var requestedCount = Math.Min(count ?? 100, 100);
var blocks = new List<SyncBlockEntry>();
for (ulong n = from; n < from + (ulong)requestedCount; n++)
{
// Clamp requested count to a sane positive range [1, 100]
var requestedCount = count.HasValue
? Math.Clamp(count.Value, 1, 100)
: 100;
var blocks = new List<SyncBlockEntry>();
// Guard against overflow when computing the loop upper bound
ulong requestedCountU = (ulong)requestedCount;
ulong endExclusive;
if (from > ulong.MaxValue - requestedCountU)
{
endExclusive = ulong.MaxValue;
}
else
{
endExclusive = from + requestedCountU;
}
for (ulong n = from; n < endExclusive; n++)
{

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +68
await _httpClient.PostAsJsonAsync(
"/v1/transactions",
request,
BasaltApiJsonContext.Default.TransactionRequest,
cts.Token);
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ForwardAsync logs only on exceptions, but PostAsJsonAsync does not throw on non-2xx responses. As written, 4xx/5xx from the sync source will be silently ignored, contradicting the "logs warnings on failure" behavior. Capture the HttpResponseMessage, check IsSuccessStatusCode (or call EnsureSuccessStatusCode), and log status code/body on failure.

Suggested change
await _httpClient.PostAsJsonAsync(
"/v1/transactions",
request,
BasaltApiJsonContext.Default.TransactionRequest,
cts.Token);
var response = await _httpClient.PostAsJsonAsync(
"/v1/transactions",
request,
BasaltApiJsonContext.Default.TransactionRequest,
cts.Token);
if (!response.IsSuccessStatusCode)
{
string responseBody = string.Empty;
try
{
responseBody = await response.Content.ReadAsStringAsync(cts.Token);
}
catch
{
// Ignore failures reading the response body; we still log the status code.
}
_logger?.LogWarning(
"Sync source returned non-success status when forwarding tx {Hash}: {StatusCode} {ResponseBody}",
tx.Hash.ToHexString()[..16],
(int)response.StatusCode,
responseBody);
}

Copilot uses AI. Check for mistakes.
// Hook epoch transitions to rewire consensus-specific components
_blockApplier.OnEpochTransition += (newSet, blockNumber) =>
{
var oldCount = _validatorSet?.Count ?? 0;
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oldCount is computed but never used inside the epoch-transition handler. Remove it or use it in a log message to avoid dead code and keep the handler focused on the rewiring it performs.

Suggested change
var oldCount = _validatorSet?.Count ?? 0;

Copilot uses AI. Check for mistakes.
handle /health {
reverse_proxy validator-0:5000 {
reverse_proxy rpc-0:5000 {
rewrite /v1/status
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This /health handler rewrites to /v1/status, which always returns 200 and doesn't reflect the RPC-mode sync-lag health logic added at /v1/health. If the intention is for external health checks to fail when the RPC node is >50 blocks behind, consider rewriting to /v1/health instead of /v1/status.

Suggested change
rewrite /v1/status
rewrite /v1/health

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +79
if (string.Equals(Mode, "validator", StringComparison.OrdinalIgnoreCase)
|| (Peers.Length > 0 && ValidatorIndex >= 0))
return NodeMode.Validator;

if (string.Equals(Mode, "standalone", StringComparison.OrdinalIgnoreCase))
return NodeMode.Standalone;

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolvedMode currently treats any config with Peers + ValidatorIndex>=0 as Validator even when BASALT_MODE is explicitly set to "standalone" (because the auto-detection check runs before the standalone override). This makes explicit standalone ineffective in some env combinations. Consider only applying the Peers/ValidatorIndex auto-detection when Mode == "auto", and/or check for explicit "standalone" before the validator auto-detection.

Suggested change
if (string.Equals(Mode, "validator", StringComparison.OrdinalIgnoreCase)
|| (Peers.Length > 0 && ValidatorIndex >= 0))
return NodeMode.Validator;
if (string.Equals(Mode, "standalone", StringComparison.OrdinalIgnoreCase))
return NodeMode.Standalone;
if (string.Equals(Mode, "validator", StringComparison.OrdinalIgnoreCase))
return NodeMode.Validator;
if (string.Equals(Mode, "standalone", StringComparison.OrdinalIgnoreCase))
return NodeMode.Standalone;
// auto / implicit: infer validator if peers + validator index are configured
if (string.Equals(Mode, "auto", StringComparison.OrdinalIgnoreCase) || string.IsNullOrWhiteSpace(Mode))
{
if (Peers.Length > 0 && ValidatorIndex >= 0)
return NodeMode.Validator;
}

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +94
var localTip = _chainManager.LatestBlockNumber;
var remoteTip = status.LatestBlock;
Volatile.Write(ref _syncLag, (int)Math.Min(remoteTip - localTip, int.MaxValue));

if (remoteTip <= localTip)
{
// Caught up — sleep for one block time, then poll again
_backoffMs = 1000; // Reset backoff
await Task.Delay((int)_chainParams.BlockTimeMs, ct);
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyncLag is computed via (remoteTip - localTip) before checking remoteTip <= localTip. When the remote tip is behind (or chain reset), the ulong subtraction underflows and SyncLag becomes a huge value (clamped to int.MaxValue), which will incorrectly trip the health check. Compute lag only after confirming remoteTip > localTip, or use a conditional subtraction.

Copilot uses AI. Check for mistakes.
Comment on lines +233 to +239
// Phase 3: Atomically swap state only if ALL blocks succeeded
if (applied == blocks.Count && applied > 0)
{
stateDbRef.Swap(forkedState);
_logger.LogInformation("Synced {Count} blocks, now at #{Height}",
applied, _chainManager.LatestBlockNumber);
}
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplyBatch adds blocks to ChainManager and persists them in Phase 2, but only swaps canonical state in Phase 3 if all blocks succeed. If a later block fails (AddBlock fails, or execution failed in Phase 1), this can leave ChainManager/block persistence ahead of canonical state (state swap skipped), causing chain/state divergence for API reads and future execution. Consider only mutating ChainManager/persistence once you know you will swap state (or implement a rollback to the pre-sync tip when applied != blocks.Count).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants