|
| 1 | +# Audit Report: Consensus & P2P Sync — Storage, Leaderboard, and State Replication |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +The platform uses a multi-layered consensus system: |
| 6 | +1. **PBFT consensus** (`p2p-consensus/consensus.rs`) for major state changes (submissions, evaluations, weight updates, epoch transitions) |
| 7 | +2. **Storage proposal/vote mechanism** (`p2p-consensus/state.rs` + `validator-node/main.rs`) for challenge storage writes |
| 8 | +3. **State root consensus** (`distributed-storage/state_consensus.rs`) for cross-validator state verification with fraud proofs |
| 9 | +4. **Validated storage** (`distributed-storage/validated_storage.rs`) as a layered consensus-before-write abstraction |
| 10 | + |
| 11 | +**Critical Finding**: WASM route handlers bypass consensus entirely for storage writes, creating a fundamental divergence risk between validators. |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## 1. Consensus Mechanism Architecture |
| 16 | + |
| 17 | +### 1.1 PBFT Consensus Engine |
| 18 | +- **File**: `crates/p2p-consensus/src/consensus.rs` |
| 19 | +- Classic PBFT: PrePrepare → Prepare → Commit phases |
| 20 | +- **Quorum**: 2f+1 count-based quorum **AND** 2/3 stake-weighted threshold |
| 21 | +- View changes on timeout (30s round timeout, 60s view change timeout) |
| 22 | +- All messages are cryptographically signed and verified |
| 23 | +- Leader election is view-based, rotating through validators |
| 24 | + |
| 25 | +### 1.2 State Change Types via PBFT |
| 26 | +``` |
| 27 | +ChallengeSubmission, EvaluationResult, WeightUpdate, |
| 28 | +ValidatorChange, ConfigUpdate (sudo only), EpochTransition |
| 29 | +``` |
| 30 | + |
| 31 | +### 1.3 P2P Network Layer |
| 32 | +- **File**: `crates/p2p-consensus/src/network.rs` |
| 33 | +- libp2p gossipsub + Kademlia DHT |
| 34 | +- All messages wrapped in `SignedP2PMessage` with replay protection (nonce tracking with 5-min expiry) and rate limiting (100 msg/s per signer) |
| 35 | +- Validator-only enforcement for consensus traffic |
| 36 | +- Signer identity verification against message sender field |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +## 2. Storage Write Path Analysis |
| 41 | + |
| 42 | +### 2.1 When WASM calls `host_storage_set()` — **CRITICAL FINDING** |
| 43 | + |
| 44 | +**Path**: `challenge-sdk-wasm/host_functions.rs` → `wasm-runtime-interface/storage.rs:handle_storage_set()` → `StorageBackend::propose_write()` |
| 45 | + |
| 46 | +**Finding**: In the validator node's `wasm_executor.rs`, all WASM execution contexts are configured with: |
| 47 | +```rust |
| 48 | +storage_host_config: StorageHostConfig { |
| 49 | + allow_direct_writes: true, |
| 50 | + require_consensus: false, |
| 51 | + .. |
| 52 | +} |
| 53 | +``` |
| 54 | +This is set at lines 204-205, 332-333, 811-812, and 969-970 of `wasm_executor.rs`. |
| 55 | + |
| 56 | +**Consequence**: When WASM code calls `host_storage_set()`, the `handle_storage_set` function checks: |
| 57 | +```rust |
| 58 | +if storage.config.require_consensus && !storage.config.allow_direct_writes { |
| 59 | + return StorageHostStatus::ConsensusRequired; |
| 60 | +} |
| 61 | +``` |
| 62 | +Since `allow_direct_writes=true` and `require_consensus=false`, this check passes and the write goes directly to the local `ChallengeStorageBackend` — **NO consensus, NO P2P proposal, NO replication to other validators**. |
| 63 | + |
| 64 | +The `ChallengeStorageBackend` (`challenge_storage.rs`) directly writes to the local sled database via `LocalStorage::put()`. |
| 65 | + |
| 66 | +### Severity: **HIGH** |
| 67 | +**Impact**: If two validators process the same WASM route handler request, each writes to its own local storage independently. There is no mechanism to reconcile these writes across validators. This means different validators can have completely different storage states for the same challenge. |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +### 2.2 P2P Storage Proposal/Vote Mechanism |
| 72 | + |
| 73 | +**Files**: `p2p-consensus/src/state.rs:StorageProposal`, `validator-node/main.rs:1912-2035` |
| 74 | + |
| 75 | +A separate consensus mechanism exists for storage writes propagated via P2P: |
| 76 | + |
| 77 | +1. A validator broadcasts `P2PMessage::StorageProposal` |
| 78 | +2. Other validators **auto-approve** the proposal (line ~1948 in main.rs): |
| 79 | + ```rust |
| 80 | + // Auto-vote approve (validator trusts other validators) |
| 81 | + // In production, could verify via WASM validate_storage_write |
| 82 | + ``` |
| 83 | +3. Votes are tallied using simple majority: `threshold = (total_validators / 2) + 1` |
| 84 | +4. On consensus approval, data is written to distributed storage |
| 85 | + |
| 86 | +### Severity: **MEDIUM** |
| 87 | +**Issue 1 — Auto-approval**: Validators auto-approve all storage proposals from known validators without running WASM validation. The comment explicitly says "In production, could verify via WASM validate_storage_write" — this validation is not implemented. |
| 88 | + |
| 89 | +**Issue 2 — Disconnected from WASM path**: The P2P storage proposal mechanism is separate from the WASM `host_storage_set()` path. WASM writes go directly to local storage; the P2P proposal path exists but is not invoked by WASM host functions. |
| 90 | + |
| 91 | +**Issue 3 — Simple majority vs PBFT**: Storage proposals use simple majority voting `(n/2)+1` in `ChainState::vote_storage_proposal()`, not the full PBFT 2f+1 + stake-weighted quorum used for main consensus. This is weaker Byzantine fault tolerance. |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +## 3. Leaderboard Consistency |
| 96 | + |
| 97 | +### Finding: Leaderboard is not actively synced |
| 98 | + |
| 99 | +**Observation**: `ChainState` has a `leaderboard: HashMap<ChallengeId, Vec<LeaderboardEntry>>` field, and `update_leaderboard()` exists to update it. However: |
| 100 | + |
| 101 | +- The validator node (`main.rs`) only **logs** `LeaderboardRequest` and `LeaderboardResponse` messages at debug level — it does not process them or update state |
| 102 | +- There is no code in the validator that calls `update_leaderboard()` |
| 103 | +- Leaderboard data in `ChainState` is part of the state that gets synced via `StateResponse`, but leaderboard updates themselves are never actively populated |
| 104 | + |
| 105 | +### Severity: **MEDIUM** |
| 106 | +**Impact**: Leaderboard data is effectively empty or stale across all validators. If any validator does populate it locally, there's no mechanism to propagate those updates consistently. |
| 107 | + |
| 108 | +--- |
| 109 | + |
| 110 | +## 4. State Replication & Sync |
| 111 | + |
| 112 | +### 4.1 State Sync via StateRequest/StateResponse |
| 113 | + |
| 114 | +The `ChainState` (in `p2p-consensus/state.rs`) is the canonical shared state. Sync works via: |
| 115 | +1. `StateRequest`: A validator requests full state, sending its current hash and sequence |
| 116 | +2. `StateResponse`: Another validator responds with full serialized state |
| 117 | +3. `StateManager::apply_sync_state()` accepts the new state only if: |
| 118 | + - New sequence > current sequence |
| 119 | + - Hash verification passes (recomputed hash matches claimed hash) |
| 120 | + |
| 121 | +### Severity: **LOW** |
| 122 | +**Issue**: The hash function in `ChainState::update_hash()` only hashes a summary (sequence, epoch, counts) — NOT the actual data. Two states with the same sequence/epoch/counts but different data would have the same hash. This weakens state verification. |
| 123 | + |
| 124 | +```rust |
| 125 | +struct HashInput { |
| 126 | + sequence: SequenceNumber, |
| 127 | + epoch: u64, |
| 128 | + validator_count: usize, |
| 129 | + challenge_count: usize, |
| 130 | + pending_count: usize, |
| 131 | + netuid: u16, |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +The hash doesn't include actual validator stakes, challenge configs, evaluation data, storage proposals, leaderboards, or any other substantive state fields. |
| 136 | + |
| 137 | +--- |
| 138 | + |
| 139 | +## 5. Epoch Transition & Storage State |
| 140 | + |
| 141 | +### How epoch transitions work: |
| 142 | +1. Triggered by `BlockSyncEvent::EpochTransition` from Bittensor block sync |
| 143 | +2. Calls `state.next_epoch()` which: |
| 144 | + - Increments epoch counter |
| 145 | + - Clears current `weight_votes` |
| 146 | + - Prunes historical weights older than 100 epochs |
| 147 | + - Increments sequence number |
| 148 | + |
| 149 | +### Finding: Epoch transitions are locally triggered |
| 150 | +Each validator detects epoch boundaries independently from Bittensor block sync. There's no PBFT consensus on when to transition epochs — validators rely on seeing the same Bittensor blocks. |
| 151 | + |
| 152 | +### Severity: **LOW** |
| 153 | +If validators have slightly different views of the Bittensor chain (e.g., different RPC endpoints, temporary fork), they could transition epochs at different times, causing temporary state divergence. |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## 6. Scenarios Where Validators Could Have Divergent State |
| 158 | + |
| 159 | +### 6.1 WASM Storage Writes (HIGH) |
| 160 | +- **Scenario**: Two validators both process a WASM route request that calls `host_storage_set()` |
| 161 | +- **Result**: Each writes to its own local storage; no P2P propagation occurs |
| 162 | +- **Divergence**: Permanent until full state sync overrides one validator's data |
| 163 | + |
| 164 | +### 6.2 Race Conditions in Storage Proposals (MEDIUM) |
| 165 | +- **Scenario**: Two validators simultaneously propose writes to the same key with different values |
| 166 | +- **Result**: Each proposal gets its own consensus round; both could succeed |
| 167 | +- **Divergence**: Last-write-wins at the distributed storage level, but ordering may differ |
| 168 | + |
| 169 | +### 6.3 Network Partition During Consensus (LOW) |
| 170 | +- **Scenario**: Network partition during PBFT round |
| 171 | +- **Result**: View change triggers new leader; prepared state carries forward |
| 172 | +- **Mitigation**: PBFT view change protocol properly handles this; prepared proofs are verified |
| 173 | + |
| 174 | +### 6.4 Stale State Sync (LOW) |
| 175 | +- **Scenario**: Validator rejoins after being offline |
| 176 | +- **Result**: `apply_sync_state()` accepts newer state but relies on weak hash verification |
| 177 | +- **Mitigation**: Sequence number ordering prevents accepting old state |
| 178 | + |
| 179 | +### 6.5 Leaderboard Inconsistency (MEDIUM) |
| 180 | +- **Scenario**: Leaderboard is never populated via consensus |
| 181 | +- **Result**: All validators have empty/stale leaderboards |
| 182 | +- **Impact**: Any leaderboard queries return inconsistent or empty data |
| 183 | + |
| 184 | +--- |
| 185 | + |
| 186 | +## 7. Conflict Resolution for Concurrent Writes |
| 187 | + |
| 188 | +### PBFT Path |
| 189 | +The PBFT consensus engine serializes state changes through sequence numbers. Only one proposal can be active per round, so concurrent writes are inherently serialized. |
| 190 | + |
| 191 | +### Storage Proposal Path |
| 192 | +Multiple storage proposals can be in flight simultaneously since each gets a unique `proposal_id`. The `ChainState` tracks them in `pending_storage_proposals`. If two proposals write the same key, both can be approved and applied — the last one applied wins. |
| 193 | + |
| 194 | +### Direct WASM Write Path |
| 195 | +No conflict resolution exists. Each validator writes independently to local storage. |
| 196 | + |
| 197 | +--- |
| 198 | + |
| 199 | +## 8. Summary of Findings |
| 200 | + |
| 201 | +| # | Finding | Severity | Component | |
| 202 | +|---|---------|----------|-----------| |
| 203 | +| 1 | WASM `host_storage_set()` bypasses consensus — writes directly to local storage with `allow_direct_writes=true, require_consensus=false` | **HIGH** | `wasm_executor.rs`, `storage.rs` | |
| 204 | +| 2 | Storage proposals auto-approved without WASM validation | **MEDIUM** | `validator-node/main.rs:1948` | |
| 205 | +| 3 | Leaderboard data never populated via consensus; request/response messages only logged | **MEDIUM** | `validator-node/main.rs:1666-1681` | |
| 206 | +| 4 | `ChainState` hash only covers summary counts, not actual state data | **MEDIUM** | `p2p-consensus/state.rs:update_hash()` | |
| 207 | +| 5 | Storage proposals use simple majority `(n/2)+1`, weaker than PBFT's 2f+1 + stake-weighted quorum | **LOW** | `p2p-consensus/state.rs:vote_storage_proposal()` | |
| 208 | +| 6 | Epoch transitions triggered locally per validator from Bittensor sync, not via PBFT consensus | **LOW** | `validator-node/main.rs:2123-2126` | |
| 209 | +| 7 | `P2PMessage::StorageProposal` path is disconnected from WASM `host_storage_set()` path | **HIGH** | Architecture gap | |
| 210 | +| 8 | No mechanism to reconcile divergent local storage across validators | **HIGH** | System-wide | |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +## 9. Positive Security Controls Observed |
| 215 | + |
| 216 | +- PBFT consensus properly implements cryptographic signature verification on all phases |
| 217 | +- Replay attack protection with nonce tracking and sliding window rate limiting |
| 218 | +- Validator-only enforcement for consensus-critical messages |
| 219 | +- View change protocol with prepared proof verification |
| 220 | +- ConfigUpdate proposals require sudo authorization |
| 221 | +- Weight vote hash verification |
| 222 | +- State deserialization size limits (256MB) |
| 223 | +- Evaluation scores use verified stakes from validator map (prevents stake inflation) |
| 224 | +- Fraud proof mechanism exists in `state_consensus.rs` (though integration unclear) |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## 10. Recommendations |
| 229 | + |
| 230 | +1. **Critical**: Route WASM `host_storage_set()` through the P2P storage proposal path, or set `require_consensus=true` and `allow_direct_writes=false` in production |
| 231 | +2. **High**: Implement actual WASM validation in storage proposal handling instead of auto-approval |
| 232 | +3. **Medium**: Include actual state data in `ChainState::update_hash()` (at minimum, hash the serialized state or use a merkle root) |
| 233 | +4. **Medium**: Implement leaderboard population and synchronization via consensus |
| 234 | +5. **Low**: Consider using the same 2f+1 + stake-weighted quorum for storage proposals as for PBFT consensus |
| 235 | +6. **Low**: Consider using a PBFT proposal for epoch transitions to ensure atomic, coordinated transitions |
0 commit comments