2020
2121## Introduction
2222
23- ** Platform** is a decentralized evaluation framework that enables trustless assessment of AI agents through configurable challenges on the Bittensor network. By connecting multiple validators through a Byzantine fault-tolerant consensus mechanism, Platform ensures honest and reproducible evaluation of agent submissions while preventing gaming and manipulation.
23+ ** Platform** is a decentralized evaluation framework that enables trustless assessment of miner submissions through configurable challenges on the Bittensor network. By connecting multiple validators through a Byzantine fault-tolerant consensus mechanism, Platform ensures honest and reproducible evaluation while preventing gaming and manipulation.
2424
2525> ** Want to run a validator?** See the [ Validator Guide] ( docs/validator.md ) for setup instructions.
2626
2727### Key Features
2828
29- - ** Decentralized Evaluation** : Multiple validators independently evaluate agent submissions
29+ - ** Decentralized Evaluation** : Multiple validators independently evaluate submissions
3030- ** Challenge-Based Architecture** : Modular Docker containers define custom evaluation logic
3131- ** Byzantine Fault Tolerance** : PBFT consensus with $2f+1$ threshold ensures correctness
32- - ** Commit-Reveal Weights ** : Cryptographic scheme prevents weight copying attacks
32+ - ** Secure Weight Submission ** : Weights submitted to Bittensor at epoch boundaries
3333- ** Merkle State Sync** : Verifiable distributed database with optimistic execution
3434- ** Multi-Mechanism Support** : Each challenge maps to a Bittensor mechanism for independent weight setting
3535- ** Stake-Weighted Security** : Minimum 1000 TAO stake required for validator participation
4040
4141Platform involves three main participants:
4242
43- - ** Miners** : Submit AI agents ( code/models) to challenges for evaluation
44- - ** Validators** : Run challenge containers, evaluate agents , and submit weights to Bittensor
43+ - ** Miners** : Submit code/models to challenges for evaluation
44+ - ** Validators** : Run challenge containers, evaluate submissions , and submit weights to Bittensor
4545- ** Sudo Owner** : Configures challenges via signed ` SudoAction ` messages
4646
4747The coordination between validators ensures that only verified, consensus-validated results influence the weight distribution on Bittensor.
@@ -78,7 +78,7 @@ The coordination between validators ensures that only verified, consensus-valida
7878β β Challenge C β β β β Challenge C β β β β Challenge C β β
7979β βββββββββββββββββ β β βββββββββββββββββ β β βββββββββββββββββ β
8080β β β β β β
81- β Evaluates agents β β Evaluates agents β β Evaluates agents β
81+ β Evaluates miners β β Evaluates miners β β Evaluates miners β
8282β Shares results β β Shares results β β Shares results β
8383βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ
8484 β β β
@@ -94,33 +94,33 @@ The coordination between validators ensures that only verified, consensus-valida
9494
9595---
9696
97- ## Miners (Agent Submitters)
97+ ## Miners
9898
9999### Operations
100100
101- 1 . ** Agent Development** :
102- - Miners develop AI agents that solve challenge-specific tasks
103- - Agents are packaged as code submissions with metadata
101+ 1 . ** Development** :
102+ - Miners develop solutions that solve challenge-specific tasks
103+ - Solutions are packaged as code submissions with metadata
104104
1051052 . ** Submission** :
106- - Submit agent to any validator via HTTP API
107- - Agent is stored in distributed database and synced across all validators
106+ - Submit to any validator via HTTP API
107+ - Submission is stored in distributed database and synced across all validators
108108 - Submission includes: source code, miner hotkey, metadata
109109
1101103 . ** Evaluation** :
111- - All validators independently evaluate the agent
112- - Evaluation runs in isolated Docker containers
111+ - All validators independently evaluate the submission
112+ - Evaluation runs in isolated Docker containers (challenge-specific logic)
113113 - Results are stored in Merkle-verified distributed database
114114
1151154 . ** Weight Distribution** :
116116 - At epoch end, validators aggregate scores
117- - Weights are submitted to Bittensor proportional to agent performance
117+ - Weights are submitted to Bittensor proportional to miner performance
118118
119119### Formal Definitions
120120
121- - ** Agent Submission** : $A_i = (code, hotkey_i, metadata)$
122- - ** Submission Hash** : $h_i = \text{SHA256}(A_i )$
123- - ** Evaluation Score** : $s_i^v \in [ 0, 1] $ - score from validator $v$ for agent $i$
121+ - ** Submission** : $S_i = (code, hotkey_i, metadata)$
122+ - ** Submission Hash** : $h_i = \text{SHA256}(S_i )$
123+ - ** Evaluation Score** : $s_i^v \in [ 0, 1] $ - score from validator $v$ for submission $i$
124124
125125---
126126
@@ -133,8 +133,8 @@ The coordination between validators ensures that only verified, consensus-valida
133133 - All validators run identical challenge containers
134134 - Health monitoring ensures container availability
135135
136- 2 . ** Agent Evaluation** :
137- - Receive agent submissions via P2P gossipsub
136+ 2 . ** Submission Evaluation** :
137+ - Receive submissions via P2P gossipsub
138138 - Execute evaluation in sandboxed Docker environment
139139 - Compute score $s \in [ 0, 1] $ based on challenge criteria
140140
@@ -144,19 +144,16 @@ The coordination between validators ensures that only verified, consensus-valida
144144 - Verify state root matches across validators
145145
1461464 . ** Score Aggregation** :
147- - Collect evaluations from all validators for each agent
147+ - Collect evaluations from all validators for each submission
148148 - Compute stake-weighted median to resist manipulation
149149 - Detect and exclude outlier validators
150150
1511515 . ** Weight Calculation** :
152152 - Convert aggregated scores to normalized weights
153153 - Apply softmax or linear normalization
154- - Cap maximum weight per agent to prevent dominance
155154
1561556 . ** Weight Submission** :
157- - Commit weight hash during commit window
158- - Reveal weights during reveal window
159- - Submit to Bittensor per-mechanism
156+ - Submit weights to Bittensor per-mechanism at epoch boundaries
160157
161158### Formal Definitions
162159
@@ -186,31 +183,17 @@ Converts scores to probability distribution using temperature-scaled softmax:
186183$$ w_i = \frac{\exp(s_i / T)}{\sum_{j \in \mathcal{M}} \exp(s_j / T)} $$
187184
188185Where:
189- - $w_i$ - normalized weight for agent $i$
190- - $s_i$ - aggregated score for agent $i$
186+ - $w_i$ - normalized weight for submission $i$
187+ - $s_i$ - aggregated score for submission $i$
191188- $T$ - temperature parameter (higher = more distributed)
192- - $\mathcal{M}$ - set of all evaluated agents
189+ - $\mathcal{M}$ - set of all evaluated submissions
193190
194191#### 2. Linear Normalization
195192
196193Simple proportional distribution:
197194
198195$$ w_i = \frac{s_i}{\sum_{j \in \mathcal{M}} s_j} $$
199196
200- ### Weight Capping
201-
202- To prevent single-agent dominance, weights are capped:
203-
204- $$ w_i' = \min(w_i, w_{max}) $$
205-
206- Excess weight is redistributed to uncapped agents:
207-
208- $$ \text{excess} = \sum_{i: w_i > w_{max}} (w_i - w_{max}) $$
209-
210- $$ w_j' = w_j + \frac{\text{excess}}{|\{k : w_k < w_{max}\}|} \quad \forall j : w_j < w_{max} $$
211-
212- Default $w_ {max} = 0.5$ (50% maximum per agent).
213-
214197### Final Weight Conversion
215198
216199Weights are converted to Bittensor u16 format:
@@ -219,43 +202,6 @@ $$W_i = \lfloor w_i' \times 65535 \rfloor$$
219202
220203---
221204
222- ## Commit-Reveal Protocol
223-
224- To prevent weight copying attacks, Platform uses a commit-reveal scheme compatible with Subtensor:
225-
226- ### Commit Phase
227-
228- 1 . Generate random salt: ` salt β random 128 bits `
229-
230- 2 . Compute commit hash using SCALE encoding + Blake2b-256:
231-
232- ```
233- H = Blake2b_256(SCALE(account, netuid_index, uids, weights, salt, version_key))
234- ```
235-
236- Where:
237- ```
238- netuid_index = mechanism_id Γ 4096 + netuid
239- ```
240-
241- 3 . Submit commit: ` commit_mechanism_weights(netuid, mechanism_id, H) `
242-
243- ### Reveal Phase
244-
245- After commit window closes:
246-
247- 1 . Submit reveal: ` reveal_mechanism_weights(netuid, mechanism_id, uids, weights, salt, version_key) `
248-
249- 2 . Subtensor verifies: ` H == Blake2b_256(SCALE(...)) `
250-
251- ### Security Properties
252-
253- - ** Hiding** : Hash reveals nothing about weights before reveal
254- - ** Binding** : Cannot change weights after commit
255- - ** Fairness** : All validators must commit before any reveal
256-
257- ---
258-
259205## Consensus Mechanism
260206
261207### PBFT (Practical Byzantine Fault Tolerance)
@@ -313,7 +259,7 @@ Validators with anomalous scores are detected using z-score:
313259
314260$$ z_v = \frac{s_i^v - \mu_i}{\sigma_i} $$
315261
316- Where $\mu_i$ and $\sigma_i$ are mean and standard deviation of scores for agent $i$.
262+ Where $\mu_i$ and $\sigma_i$ are mean and standard deviation of scores for submission $i$.
317263
318264Validators with $|z_v| > z_ {threshold}$ (default 2.0) are excluded from aggregation.
319265
@@ -323,7 +269,7 @@ Agreement among validators determines confidence:
323269
324270$$ \text{confidence}_i = \exp\left( -\frac{\sigma_i}{0.1} \right) $$
325271
326- Low confidence (high variance) may exclude agent from weights.
272+ Low confidence (high variance) may exclude submission from weights.
327273
328274---
329275
@@ -372,28 +318,24 @@ Subject to:
372318
373319### Miner Utility Maximization
374320
375- Miners maximize reward by submitting high-performing agents :
321+ Miners maximize reward by submitting high-performing solutions :
376322
377- $$ \max_{A_i } \quad w_i = \frac{\exp(\bar{s}_i / T)}{\sum_j \exp(\bar{s}_j / T)} $$
323+ $$ \max_{S_i } \quad w_i = \frac{\exp(\bar{s}_i / T)}{\sum_j \exp(\bar{s}_j / T)} $$
378324
379325Subject to:
380- - Agent must pass validation
326+ - Submission must pass validation
381327- Score determined by challenge criteria
382328
383329### Security Guarantees
384330
3853311 . ** Byzantine Tolerance** : System remains correct with up to $f = \lfloor(n-1)/3\rfloor$ faulty validators
386332
387- 2 . ** Weight Integrity** : Commit-reveal prevents:
388- - Weight copying before deadline
389- - Post-hoc weight modification
390-
391- 3 . ** Evaluation Fairness** :
333+ 2 . ** Evaluation Fairness** :
392334 - Deterministic Docker execution
393335 - Outlier detection excludes manipulators
394336 - Stake weighting resists Sybil attacks
395337
396- 4 . ** Liveness** : System progresses if $> 2/3$ validators are honest and connected
338+ 3 . ** Liveness** : System progresses if $> 2/3$ validators are honest and connected
397339
398340---
399341
@@ -404,39 +346,14 @@ Each Bittensor epoch (~360 blocks, ~72 minutes):
404346### Continuous Evaluation
405347
406348** Evaluation runs continuously** throughout the entire epoch. Validators constantly:
407- - Receive and process agent submissions
349+ - Receive and process submissions from challenges
408350- Execute evaluations in Docker containers
409351- Sync results via P2P to distributed database
410352- Aggregate scores from all validators
411353
412- ### Weight Submission (At Epoch Boundary)
413-
414- At the end of each epoch, weights are submitted to Bittensor:
354+ ### Weight Submission
415355
416- | Event | Timing | Activity |
417- | -------| --------| ----------|
418- | Commit Window Opens | Epoch end | Validators commit weight hashes |
419- | Reveal Window Opens | After commit closes | Validators reveal actual weights |
420-
421- ```
422- βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
423- β CONTINUOUS EVALUATION β
424- β (Agents evaluated, results synced throughout) β
425- βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
426- β Epoch End
427- βΌ
428- βββββββββββββββββββββββββββ
429- β COMMIT (weight hash) β
430- ββββββββββββββ¬βββββββββββββ
431- β
432- βΌ
433- βββββββββββββββββββββββββββ
434- β REVEAL (weights) β
435- ββββββββββββββ¬βββββββββββββ
436- β
437- βΌ
438- Next Epoch Starts
439- ```
356+ At the end of each epoch, validators submit weights to Bittensor based on aggregated scores.
440357
441358---
442359
@@ -487,15 +404,14 @@ If validators run different versions:
487404
488405## Conclusion
489406
490- Platform creates a trustless, decentralized framework for evaluating AI agents on Bittensor. By combining:
407+ Platform creates a trustless, decentralized framework for evaluating miner submissions on Bittensor. By combining:
491408
492409- ** PBFT Consensus** for Byzantine fault tolerance
493- - ** Commit-Reveal** for weight submission integrity
494410- ** Stake-Weighted Aggregation** for Sybil resistance
495- - ** Docker Isolation** for deterministic evaluation
411+ - ** Docker Isolation** for deterministic evaluation (challenge-specific logic)
496412- ** Merkle State Sync** for verifiable distributed storage
497413
498- The system ensures that only genuine, high-performing agents receive rewards, while making manipulation economically infeasible. Validators are incentivized to provide accurate evaluations through reputation mechanics, and miners are incentivized to submit quality agents through the weight distribution mechanism.
414+ The system ensures that only genuine, high-performing submissions receive rewards, while making manipulation economically infeasible. Validators are incentivized to provide accurate evaluations through reputation mechanics, and miners are incentivized to submit quality solutions through the weight distribution mechanism.
499415
500416---
501417
0 commit comments