Skip to content

Commit 50d0a73

Browse files
committed
fix: security audit round 3 - storage key mismatch, WASM consensus bypass, RPC auth, crypto verification
- C1: Set allow_direct_writes=false in WASM executor to enforce consensus guard - C2: Fix storage key format mismatch between build_key/get_raw/count_category - H3: Webhook auth now verifies validator_hotkey is a registered validator - H4: JSON-RPC challenge_call now enforces requires_auth on routes - H5: Metagraph sync updates ChainState.validators for RPC auth lookups - H7: Fix double-hash in distributed storage vote signature verification - C3: Added TODO for wiring ValidatedStorage/StateRootConsensus set_valid_voters
1 parent e858033 commit 50d0a73

File tree

8 files changed

+340
-22
lines changed

8 files changed

+340
-22
lines changed

audit_consensus_p2p_sync.md

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Audit Report: Consensus & P2P Sync — Storage, Leaderboard, and State Replication
2+
3+
## Executive Summary
4+
5+
The platform uses a multi-layered consensus system:
6+
1. **PBFT consensus** (`p2p-consensus/consensus.rs`) for major state changes (submissions, evaluations, weight updates, epoch transitions)
7+
2. **Storage proposal/vote mechanism** (`p2p-consensus/state.rs` + `validator-node/main.rs`) for challenge storage writes
8+
3. **State root consensus** (`distributed-storage/state_consensus.rs`) for cross-validator state verification with fraud proofs
9+
4. **Validated storage** (`distributed-storage/validated_storage.rs`) as a layered consensus-before-write abstraction
10+
11+
**Critical Finding**: WASM route handlers bypass consensus entirely for storage writes, creating a fundamental divergence risk between validators.
12+
13+
---
14+
15+
## 1. Consensus Mechanism Architecture
16+
17+
### 1.1 PBFT Consensus Engine
18+
- **File**: `crates/p2p-consensus/src/consensus.rs`
19+
- Classic PBFT: PrePrepare → Prepare → Commit phases
20+
- **Quorum**: 2f+1 count-based quorum **AND** 2/3 stake-weighted threshold
21+
- View changes on timeout (30s round timeout, 60s view change timeout)
22+
- All messages are cryptographically signed and verified
23+
- Leader election is view-based, rotating through validators
24+
25+
### 1.2 State Change Types via PBFT
26+
```
27+
ChallengeSubmission, EvaluationResult, WeightUpdate,
28+
ValidatorChange, ConfigUpdate (sudo only), EpochTransition
29+
```
30+
31+
### 1.3 P2P Network Layer
32+
- **File**: `crates/p2p-consensus/src/network.rs`
33+
- libp2p gossipsub + Kademlia DHT
34+
- All messages wrapped in `SignedP2PMessage` with replay protection (nonce tracking with 5-min expiry) and rate limiting (100 msg/s per signer)
35+
- Validator-only enforcement for consensus traffic
36+
- Signer identity verification against message sender field
37+
38+
---
39+
40+
## 2. Storage Write Path Analysis
41+
42+
### 2.1 When WASM calls `host_storage_set()`**CRITICAL FINDING**
43+
44+
**Path**: `challenge-sdk-wasm/host_functions.rs``wasm-runtime-interface/storage.rs:handle_storage_set()``StorageBackend::propose_write()`
45+
46+
**Finding**: In the validator node's `wasm_executor.rs`, all WASM execution contexts are configured with:
47+
```rust
48+
storage_host_config: StorageHostConfig {
49+
allow_direct_writes: true,
50+
require_consensus: false,
51+
..
52+
}
53+
```
54+
This is set at lines 204-205, 332-333, 811-812, and 969-970 of `wasm_executor.rs`.
55+
56+
**Consequence**: When WASM code calls `host_storage_set()`, the `handle_storage_set` function checks:
57+
```rust
58+
if storage.config.require_consensus && !storage.config.allow_direct_writes {
59+
return StorageHostStatus::ConsensusRequired;
60+
}
61+
```
62+
Since `allow_direct_writes=true` and `require_consensus=false`, this check passes and the write goes directly to the local `ChallengeStorageBackend`**NO consensus, NO P2P proposal, NO replication to other validators**.
63+
64+
The `ChallengeStorageBackend` (`challenge_storage.rs`) directly writes to the local sled database via `LocalStorage::put()`.
65+
66+
### Severity: **HIGH**
67+
**Impact**: If two validators process the same WASM route handler request, each writes to its own local storage independently. There is no mechanism to reconcile these writes across validators. This means different validators can have completely different storage states for the same challenge.
68+
69+
---
70+
71+
### 2.2 P2P Storage Proposal/Vote Mechanism
72+
73+
**Files**: `p2p-consensus/src/state.rs:StorageProposal`, `validator-node/main.rs:1912-2035`
74+
75+
A separate consensus mechanism exists for storage writes propagated via P2P:
76+
77+
1. A validator broadcasts `P2PMessage::StorageProposal`
78+
2. Other validators **auto-approve** the proposal (line ~1948 in main.rs):
79+
```rust
80+
// Auto-vote approve (validator trusts other validators)
81+
// In production, could verify via WASM validate_storage_write
82+
```
83+
3. Votes are tallied using simple majority: `threshold = (total_validators / 2) + 1`
84+
4. On consensus approval, data is written to distributed storage
85+
86+
### Severity: **MEDIUM**
87+
**Issue 1 — Auto-approval**: Validators auto-approve all storage proposals from known validators without running WASM validation. The comment explicitly says "In production, could verify via WASM validate_storage_write" — this validation is not implemented.
88+
89+
**Issue 2 — Disconnected from WASM path**: The P2P storage proposal mechanism is separate from the WASM `host_storage_set()` path. WASM writes go directly to local storage; the P2P proposal path exists but is not invoked by WASM host functions.
90+
91+
**Issue 3 — Simple majority vs PBFT**: Storage proposals use simple majority voting `(n/2)+1` in `ChainState::vote_storage_proposal()`, not the full PBFT 2f+1 + stake-weighted quorum used for main consensus. This is weaker Byzantine fault tolerance.
92+
93+
---
94+
95+
## 3. Leaderboard Consistency
96+
97+
### Finding: Leaderboard is not actively synced
98+
99+
**Observation**: `ChainState` has a `leaderboard: HashMap<ChallengeId, Vec<LeaderboardEntry>>` field, and `update_leaderboard()` exists to update it. However:
100+
101+
- The validator node (`main.rs`) only **logs** `LeaderboardRequest` and `LeaderboardResponse` messages at debug level — it does not process them or update state
102+
- There is no code in the validator that calls `update_leaderboard()`
103+
- Leaderboard data in `ChainState` is part of the state that gets synced via `StateResponse`, but leaderboard updates themselves are never actively populated
104+
105+
### Severity: **MEDIUM**
106+
**Impact**: Leaderboard data is effectively empty or stale across all validators. If any validator does populate it locally, there's no mechanism to propagate those updates consistently.
107+
108+
---
109+
110+
## 4. State Replication & Sync
111+
112+
### 4.1 State Sync via StateRequest/StateResponse
113+
114+
The `ChainState` (in `p2p-consensus/state.rs`) is the canonical shared state. Sync works via:
115+
1. `StateRequest`: A validator requests full state, sending its current hash and sequence
116+
2. `StateResponse`: Another validator responds with full serialized state
117+
3. `StateManager::apply_sync_state()` accepts the new state only if:
118+
- New sequence > current sequence
119+
- Hash verification passes (recomputed hash matches claimed hash)
120+
121+
### Severity: **LOW**
122+
**Issue**: The hash function in `ChainState::update_hash()` only hashes a summary (sequence, epoch, counts) — NOT the actual data. Two states with the same sequence/epoch/counts but different data would have the same hash. This weakens state verification.
123+
124+
```rust
125+
struct HashInput {
126+
sequence: SequenceNumber,
127+
epoch: u64,
128+
validator_count: usize,
129+
challenge_count: usize,
130+
pending_count: usize,
131+
netuid: u16,
132+
}
133+
```
134+
135+
The hash doesn't include actual validator stakes, challenge configs, evaluation data, storage proposals, leaderboards, or any other substantive state fields.
136+
137+
---
138+
139+
## 5. Epoch Transition & Storage State
140+
141+
### How epoch transitions work:
142+
1. Triggered by `BlockSyncEvent::EpochTransition` from Bittensor block sync
143+
2. Calls `state.next_epoch()` which:
144+
- Increments epoch counter
145+
- Clears current `weight_votes`
146+
- Prunes historical weights older than 100 epochs
147+
- Increments sequence number
148+
149+
### Finding: Epoch transitions are locally triggered
150+
Each validator detects epoch boundaries independently from Bittensor block sync. There's no PBFT consensus on when to transition epochs — validators rely on seeing the same Bittensor blocks.
151+
152+
### Severity: **LOW**
153+
If validators have slightly different views of the Bittensor chain (e.g., different RPC endpoints, temporary fork), they could transition epochs at different times, causing temporary state divergence.
154+
155+
---
156+
157+
## 6. Scenarios Where Validators Could Have Divergent State
158+
159+
### 6.1 WASM Storage Writes (HIGH)
160+
- **Scenario**: Two validators both process a WASM route request that calls `host_storage_set()`
161+
- **Result**: Each writes to its own local storage; no P2P propagation occurs
162+
- **Divergence**: Permanent until full state sync overrides one validator's data
163+
164+
### 6.2 Race Conditions in Storage Proposals (MEDIUM)
165+
- **Scenario**: Two validators simultaneously propose writes to the same key with different values
166+
- **Result**: Each proposal gets its own consensus round; both could succeed
167+
- **Divergence**: Last-write-wins at the distributed storage level, but ordering may differ
168+
169+
### 6.3 Network Partition During Consensus (LOW)
170+
- **Scenario**: Network partition during PBFT round
171+
- **Result**: View change triggers new leader; prepared state carries forward
172+
- **Mitigation**: PBFT view change protocol properly handles this; prepared proofs are verified
173+
174+
### 6.4 Stale State Sync (LOW)
175+
- **Scenario**: Validator rejoins after being offline
176+
- **Result**: `apply_sync_state()` accepts newer state but relies on weak hash verification
177+
- **Mitigation**: Sequence number ordering prevents accepting old state
178+
179+
### 6.5 Leaderboard Inconsistency (MEDIUM)
180+
- **Scenario**: Leaderboard is never populated via consensus
181+
- **Result**: All validators have empty/stale leaderboards
182+
- **Impact**: Any leaderboard queries return inconsistent or empty data
183+
184+
---
185+
186+
## 7. Conflict Resolution for Concurrent Writes
187+
188+
### PBFT Path
189+
The PBFT consensus engine serializes state changes through sequence numbers. Only one proposal can be active per round, so concurrent writes are inherently serialized.
190+
191+
### Storage Proposal Path
192+
Multiple storage proposals can be in flight simultaneously since each gets a unique `proposal_id`. The `ChainState` tracks them in `pending_storage_proposals`. If two proposals write the same key, both can be approved and applied — the last one applied wins.
193+
194+
### Direct WASM Write Path
195+
No conflict resolution exists. Each validator writes independently to local storage.
196+
197+
---
198+
199+
## 8. Summary of Findings
200+
201+
| # | Finding | Severity | Component |
202+
|---|---------|----------|-----------|
203+
| 1 | WASM `host_storage_set()` bypasses consensus — writes directly to local storage with `allow_direct_writes=true, require_consensus=false` | **HIGH** | `wasm_executor.rs`, `storage.rs` |
204+
| 2 | Storage proposals auto-approved without WASM validation | **MEDIUM** | `validator-node/main.rs:1948` |
205+
| 3 | Leaderboard data never populated via consensus; request/response messages only logged | **MEDIUM** | `validator-node/main.rs:1666-1681` |
206+
| 4 | `ChainState` hash only covers summary counts, not actual state data | **MEDIUM** | `p2p-consensus/state.rs:update_hash()` |
207+
| 5 | Storage proposals use simple majority `(n/2)+1`, weaker than PBFT's 2f+1 + stake-weighted quorum | **LOW** | `p2p-consensus/state.rs:vote_storage_proposal()` |
208+
| 6 | Epoch transitions triggered locally per validator from Bittensor sync, not via PBFT consensus | **LOW** | `validator-node/main.rs:2123-2126` |
209+
| 7 | `P2PMessage::StorageProposal` path is disconnected from WASM `host_storage_set()` path | **HIGH** | Architecture gap |
210+
| 8 | No mechanism to reconcile divergent local storage across validators | **HIGH** | System-wide |
211+
212+
---
213+
214+
## 9. Positive Security Controls Observed
215+
216+
- PBFT consensus properly implements cryptographic signature verification on all phases
217+
- Replay attack protection with nonce tracking and sliding window rate limiting
218+
- Validator-only enforcement for consensus-critical messages
219+
- View change protocol with prepared proof verification
220+
- ConfigUpdate proposals require sudo authorization
221+
- Weight vote hash verification
222+
- State deserialization size limits (256MB)
223+
- Evaluation scores use verified stakes from validator map (prevents stake inflation)
224+
- Fraud proof mechanism exists in `state_consensus.rs` (though integration unclear)
225+
226+
---
227+
228+
## 10. Recommendations
229+
230+
1. **Critical**: Route WASM `host_storage_set()` through the P2P storage proposal path, or set `require_consensus=true` and `allow_direct_writes=false` in production
231+
2. **High**: Implement actual WASM validation in storage proposal handling instead of auto-approval
232+
3. **Medium**: Include actual state data in `ChainState::update_hash()` (at minimum, hash the serialized state or use a merkle root)
233+
4. **Medium**: Implement leaderboard population and synchronization via consensus
234+
5. **Low**: Consider using the same 2f+1 + stake-weighted quorum for storage proposals as for PBFT consensus
235+
6. **Low**: Consider using a PBFT proposal for epoch transitions to ensure atomic, coordinated transitions

bins/validator-node/src/main.rs

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1374,9 +1374,21 @@ fn update_validator_set_from_metagraph(
13741374
}
13751375

13761376
// Sync to ChainState
1377-
cs.registered_hotkeys.insert(hotkey);
1377+
cs.registered_hotkeys.insert(hotkey.clone());
1378+
1379+
// H5 fix: Also update cs.validators so RPC webhook auth can find metagraph-registered validators
1380+
// Use the same min_stake threshold as the P2P validator set (10000 TAO = 10e12 RAO)
1381+
if stake >= 10_000_000_000_000 {
1382+
cs.validators.entry(hotkey.clone()).or_insert_with(|| {
1383+
platform_core::ValidatorInfo::new(hotkey, platform_core::Stake(stake))
1384+
});
1385+
}
13781386
}
13791387

1388+
// TODO(C3): When ValidatedStorage and StateRootConsensus instances are wired into the main
1389+
// event loop, they should call `set_valid_voters()` with the validator hotkeys collected here
1390+
// to enable cryptographic vote verification in distributed storage consensus layers.
1391+
13801392
cs.update_hash();
13811393
}
13821394

bins/validator-node/src/wasm_executor.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ impl WasmChallengeExecutor {
201201
restart_id: String::new(),
202202
config_version: 0,
203203
storage_host_config: StorageHostConfig {
204-
allow_direct_writes: true,
204+
allow_direct_writes: false,
205205
require_consensus: true,
206206
..self.config.storage_host_config.clone()
207207
},
@@ -329,7 +329,7 @@ impl WasmChallengeExecutor {
329329
restart_id: String::new(),
330330
config_version: 0,
331331
storage_host_config: StorageHostConfig {
332-
allow_direct_writes: true,
332+
allow_direct_writes: false,
333333
require_consensus: true,
334334
..self.config.storage_host_config.clone()
335335
},
@@ -808,7 +808,7 @@ impl WasmChallengeExecutor {
808808
restart_id: String::new(),
809809
config_version: 0,
810810
storage_host_config: StorageHostConfig {
811-
allow_direct_writes: true,
811+
allow_direct_writes: false,
812812
require_consensus: true,
813813
..self.config.storage_host_config.clone()
814814
},
@@ -966,7 +966,7 @@ impl WasmChallengeExecutor {
966966
challenge_id: module_path.to_string(),
967967
validator_id: "validator".to_string(),
968968
storage_host_config: StorageHostConfig {
969-
allow_direct_writes: true,
969+
allow_direct_writes: false,
970970
require_consensus: true,
971971
..self.config.storage_host_config.clone()
972972
},

crates/distributed-storage/src/state_consensus.rs

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -643,9 +643,17 @@ impl StateRootConsensus {
643643

644644
// Verify vote signature cryptographically when valid_voters is configured
645645
if !self.valid_voters.is_empty() {
646-
let vote_hash = vote.compute_hash();
646+
// H7 fix: Use raw serialized vote fields instead of hash to match signing side
647+
let vote_bytes = bincode::serialize(&(
648+
vote.block_number,
649+
vote.voter.clone(),
650+
vote.state_root,
651+
vote.agrees_with_proposal,
652+
vote.timestamp,
653+
))
654+
.unwrap_or_default();
647655
let signed_msg = platform_core::SignedMessage {
648-
message: vote_hash.to_vec(),
656+
message: vote_bytes,
649657
signature: vote.signature.clone(),
650658
signer: vote.voter.clone(),
651659
};

crates/distributed-storage/src/validated_storage.rs

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -517,9 +517,16 @@ impl<S: DistributedStore + 'static> ValidatedStorage<S> {
517517
{
518518
let voters = self.valid_voters.read().await;
519519
if !voters.is_empty() {
520-
let vote_hash = vote.compute_hash();
520+
// H7 fix: Use raw serialized vote fields instead of hash to match signing side
521+
let vote_bytes = bincode::serialize(&(
522+
vote.proposal_id,
523+
vote.voter.clone(),
524+
vote.approved,
525+
vote.timestamp,
526+
))
527+
.unwrap_or_default();
521528
let signed_msg = platform_core::SignedMessage {
522-
message: vote_hash.to_vec(),
529+
message: vote_bytes,
523530
signature: vote.signature.clone(),
524531
signer: vote.voter.clone(),
525532
};

crates/rpc-server/src/jsonrpc.rs

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1203,6 +1203,48 @@ impl RpcHandler {
12031203
// Use resolved_id for routing
12041204
let challenge_id = resolved_id;
12051205

1206+
// Enforce auth on routes that require it (matching HTTP handler behavior)
1207+
{
1208+
let chain = self.chain_state.read();
1209+
let challenge_uuid_check = uuid::Uuid::parse_str(&challenge_id)
1210+
.ok()
1211+
.map(platform_core::ChallengeId);
1212+
1213+
if let Some(cid) = challenge_uuid_check.as_ref() {
1214+
if let Some(chain_routes) = chain.challenge_routes.get(cid) {
1215+
use platform_challenge_sdk::HttpMethod;
1216+
for r in chain_routes {
1217+
let http_method = match r.method.to_uppercase().as_str() {
1218+
"GET" => HttpMethod::Get,
1219+
"POST" => HttpMethod::Post,
1220+
"PUT" => HttpMethod::Put,
1221+
"DELETE" => HttpMethod::Delete,
1222+
"PATCH" => HttpMethod::Patch,
1223+
_ => HttpMethod::Get,
1224+
};
1225+
let mut route = platform_challenge_sdk::ChallengeRoute::new(
1226+
http_method,
1227+
r.path.clone(),
1228+
r.description.clone(),
1229+
);
1230+
if r.requires_auth {
1231+
route = route.with_auth();
1232+
}
1233+
if route.matches(&method, &path).is_some() {
1234+
if r.requires_auth && auth_hotkey.is_none() {
1235+
return JsonRpcResponse::error(
1236+
id,
1237+
UNAUTHORIZED,
1238+
"This route requires authentication",
1239+
);
1240+
}
1241+
break;
1242+
}
1243+
}
1244+
}
1245+
}
1246+
}
1247+
12061248
// Extract path params by matching against registered route definitions
12071249
let path_params = {
12081250
let chain = self.chain_state.read();

0 commit comments

Comments
 (0)