Skip to content

Conversation

@MicBun
Copy link
Member

@MicBun MicBun commented Nov 21, 2025

Fixes a critical race condition where re-announcing unconfirmed transactions during block execution causes cascading nonce validation failures, leading to nodes falling out of sync with the network.

The issue occurred when:

  1. Block execution takes >1 second (e.g., 3.9s under heavy load)
  2. Re-announcement timer triggers during execution window
  3. Transactions validated against uncommitted DB state (stale nonces)
  4. All re-announced txs fail validation with "invalid nonce" errors
  5. Cascade of failures saturates node, preventing new block processing
  6. Node falls behind, requires catchup mechanism to recover

Solution:

  • Add CanReannounce() method to ConsensusEngine interface
  • Returns false when status is Proposed or Executed
  • Re-announcement logic checks CanReannounce() before proceeding
  • Skips re-announcement during block execution with debug log

Impact:

  • Eliminates invalid nonce cascades during block processing
  • Prevents nodes from falling out of sync under heavy load
  • Minimal performance impact (single mutex read per check)
  • Re-announcement delayed by at most block execution time (~4s max)

Testing:

  • Added comprehensive unit tests for CanReannounce()
  • Verified thread safety with concurrent access test
  • All existing consensus and node tests pass

Files changed:

  • node/interfaces.go: Add CanReannounce() to interface
  • node/consensus/engine.go: Implement CanReannounce()
  • node/nogossip.go: Add re-announcement check
  • node/node_test.go: Update test dummy
  • node/consensus/can_reannounce_test.go: New unit tests

resolves: https://github.com/trufnetwork/truf-network/issues/1305

This can be reviewed later as not urgent to fix for now

Summary by CodeRabbit

  • New Features

    • Added a safety mechanism that prevents transaction re-announcements while a block is being executed, reducing nonce/conflict issues.
  • Tests

    • Added unit tests covering re-announcement behavior across states, including state transition scenarios and concurrent/thread-safety verification.

✏️ Tip: You can customize this high-level summary in your review settings.

Fixes a critical race condition where re-announcing unconfirmed transactions
during block execution causes cascading nonce validation failures, leading to
nodes falling out of sync with the network.

The issue occurred when:
1. Block execution takes >1 second (e.g., 3.9s under heavy load)
2. Re-announcement timer triggers during execution window
3. Transactions validated against uncommitted DB state (stale nonces)
4. All re-announced txs fail validation with "invalid nonce" errors
5. Cascade of failures saturates node, preventing new block processing
6. Node falls behind, requires catchup mechanism to recover

Solution:
- Add CanReannounce() method to ConsensusEngine interface
- Returns false when status is Proposed or Executed
- Re-announcement logic checks CanReannounce() before proceeding
- Skips re-announcement during block execution with debug log

Impact:
- Eliminates invalid nonce cascades during block processing
- Prevents nodes from falling out of sync under heavy load
- Minimal performance impact (single mutex read per check)
- Re-announcement delayed by at most block execution time (~4s max)

Testing:
- Added comprehensive unit tests for CanReannounce()
- Verified thread safety with concurrent access test
- All existing consensus and node tests pass

Files changed:
- node/interfaces.go: Add CanReannounce() to interface
- node/consensus/engine.go: Implement CanReannounce()
- node/nogossip.go: Add re-announcement check
- node/node_test.go: Update test dummy
- node/consensus/can_reannounce_test.go: New unit tests

resolves: trufnetwork/truf-network#1305
@MicBun MicBun requested a review from outerlook November 21, 2025 02:07
@MicBun MicBun self-assigned this Nov 21, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

Adds a CanReannounce() bool method to the ConsensusEngine interface and its implementation (read-locked, true only when state is Committed), integrates the check into transaction re-announcement logic, and adds unit tests covering state behavior and concurrent access.

Changes

Cohort / File(s) Summary
ConsensusEngine interface & implementation
node/interfaces.go, node/consensus/engine.go, node/node_test.go
Added CanReannounce() bool to the ConsensusEngine interface; implemented CanReannounce on ConsensusEngine with a read lock on stateInfo returning true only when state == Committed; added dummyCE.CanReannounce() in tests.
Re-announcement control
node/nogossip.go
startTxAnns now queries ce.CanReannounce() and logs/returns early when re-announcement is unsafe.
Unit tests
node/consensus/can_reannounce_test.go
New tests: TestCanReannounce, TestCanReannounceThreadSafety (100 goroutines), and TestCanReannounceStateTransitions validating state-based results and concurrency.

Sequence Diagram(s)

sequenceDiagram
    participant Node as Node/Mempool
    participant Nog as startTxAnns
    participant CE as ConsensusEngine
    participant SI as stateInfo

    Node->>Nog: trigger re-announcement
    Nog->>CE: CanReannounce()
    CE->>SI: RLock state
    alt state == Committed
        SI-->>CE: true
        CE-->>Nog: true
        Nog->>Node: broadcast transactions
    else state == Proposed/Executed
        SI-->>CE: false
        CE-->>Nog: false
        Nog-->>Nog: debug log, return early
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Verify read-lock granularity and consistency with other callers of stateInfo.
  • Confirm that "Committed only" semantics align with consensus lifecycle and do not suppress needed broadcasts.
  • Review tests for flakiness and adequacy of concurrency coverage.

Possibly related PRs

Suggested reviewers

  • outerlook
  • williamrusdyputra

Poem

🐇 I nibble on locks and count each state,
Committed whispers: "Now's the safe date."
If Proposed or Executed, I quietly pause,
No nonce collisions, no frantic cause.
Tests hop around to prove my laws.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix—preventing re-announcement during block execution that was causing nodes to get stuck—which is the core change across all modified files.
Linked Issues check ✅ Passed The PR implements a safeguard to prevent re-announcements during block execution via CanReannounce(), addressing issue #1305's requirement to prevent re-announced transactions from blocking consensus, though it doesn't fully address all objectives like per-account nonce validation, exponential backoff, or comprehensive observability improvements.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing the CanReannounce() safeguard and integrating it into re-announcement logic; no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fixReanouncing

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba07f9c and f55d8df.

📒 Files selected for processing (1)
  • node/consensus/can_reannounce_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • node/consensus/can_reannounce_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@holdex
Copy link

holdex bot commented Nov 21, 2025

Time Submission Status

Member Status Time Action Last Update
MicBun ✅ Submitted 4h Update time Dec 13, 2025, 6:01 PM

@holdex
Copy link

holdex bot commented Nov 21, 2025

Bug Report Checklist

Status Commit Link Bug Author HR Card
❌ Not Submitted

@MicBun, please use git blame and specify the link to the commit link that has introduced this bug.

Send the following message in this PR: @pr-time-tracker bug commit <commit-url> && bug author @name

@MicBun
Copy link
Member Author

MicBun commented Nov 21, 2025

@pr-time-tracker bug commit not caused by previous contributor

@MicBun
Copy link
Member Author

MicBun commented Dec 17, 2025

Log: not needed for now as the issue have been resolved without additional touch. Might be needed if the same issue appears in the future. Will keep the PR open for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants