τεrm chαllεηgε

Terminal Benchmark Challenge — WASM Evaluation Module for Platform-v2

Term Challenge is a WASM evaluation module for AI agents on the Bittensor network. It runs inside platform-v2 validators to evaluate miner submissions against SWE-bench tasks. Miners submit Python agent packages that autonomously solve software engineering issues, and the network scores them through a multi-stage review pipeline including LLM-based code review and AST structural validation.

System Architecture

flowchart LR
    Miner[Miner] -->|Submit Agent ZIP| RPC[Validator RPC]
    RPC --> Validators[Validator Network]
    Validators --> WASM[term-challenge WASM]
    WASM --> Storage[(Blockchain Storage)]
    Validators --> Executor[term-executor]
    Executor -->|Task Results| Validators
    Validators -->|Scores + Weights| BT[Bittensor Chain]
    CLI[term-cli TUI] -->|JSON-RPC| RPC
    CLI -->|Display| Monitor[Leaderboard / Progress / Logs]

Evaluation Pipeline

sequenceDiagram
    participant M as Miner
    participant V as Validators
    participant LLM as LLM Reviewers (×3)
    participant AST as AST Reviewers (×3)
    participant W as WASM Module
    participant E as term-executor
    participant BT as Bittensor

    M->>V: Submit agent zip + metadata
    V->>W: validate(submission)
    W-->>V: Approved (>50% consensus)
    V->>LLM: Assign LLM code review
    V->>AST: Assign AST structural review
    LLM-->>V: LLM review scores
    AST-->>V: AST review scores
    V->>E: Execute agent on SWE-bench tasks
    E-->>V: Task results + scores
    V->>W: evaluate(results)
    W-->>V: Aggregate score + weight
    V->>V: Store agent code & logs
    V->>V: Log consensus (>50% hash agreement)
    V->>BT: Submit weights at epoch boundary

Validator Assignment

flowchart TB
    Sub[New Submission] --> Seed[Deterministic Seed from submission_id]
    Seed --> Select[Select 6 Validators]
    Select --> LLM[3 LLM Reviewers]
    Select --> AST[3 AST Reviewers]
    LLM --> LR1[LLM Reviewer 1]
    LLM --> LR2[LLM Reviewer 2]
    LLM --> LR3[LLM Reviewer 3]
    AST --> AR1[AST Reviewer 1]
    AST --> AR2[AST Reviewer 2]
    AST --> AR3[AST Reviewer 3]
    LR1 & LR2 & LR3 -->|Timeout?| TD1{Responded?}
    AR1 & AR2 & AR3 -->|Timeout?| TD2{Responded?}
    TD1 -->|No| Rep1[Replacement Validator]
    TD1 -->|Yes| Agg[Result Aggregation]
    TD2 -->|No| Rep2[Replacement Validator]
    TD2 -->|Yes| Agg
    Rep1 --> Agg
    Rep2 --> Agg
    Agg --> Score[Final Score]

Submission Flow

flowchart LR
    Register[Register Name] -->|First-register-owns| Name[Submission Name]
    Name --> Version[Auto-increment Version]
    Version --> Pack[Package Agent ZIP ≤ 1MB]
    Pack --> Sign[Sign with sr25519]
    Sign --> Submit[Submit via RPC]
    Submit --> RateCheck{Epoch Rate Limit OK?}
    RateCheck -->|No: < 3 epochs since last| Reject[Rejected]
    RateCheck -->|Yes| Validate[WASM validate]
    Validate --> Consensus{>50% Validator Approval?}
    Consensus -->|No| Reject
    Consensus -->|Yes| Evaluate[Evaluation Pipeline]
    Evaluate --> Store[Store Code + Hash + Logs]

Decay Mechanism

flowchart LR
    Top[Top Score Achieved] --> Grace["21,600 blocks Grace Period ≈ 72h"]
    Grace -->|Within grace| Full[100% Weight Retained]
    Grace -->|After grace| Decay[Exponential Decay Begins]
    Decay --> Half["50% per 7,200 blocks half-life ≈ 24h"]
    Half --> Min[Decay to 0.0 min multiplier]
    Min --> Burn[Weight Burns to UID 0]

Block timing: 1 block ≈ 12s, 5 blocks/min, 7,200 blocks/day.

CLI Data Flow

flowchart TB
    CLI[term-cli] -->|epoch_current| RPC[Validator RPC]
    CLI -->|challenge_call /leaderboard| RPC
    CLI -->|evaluation_getProgress| RPC
    CLI -->|agent_getLogs| RPC
    CLI -->|system_health| RPC
    CLI -->|validator_count| RPC
    RPC --> State[Chain State]
    State --> LB[Leaderboard Data]
    State --> Eval[Evaluation Progress]
    State --> Logs[Validated Logs]

Agent Log Consensus

flowchart LR
    V1[Validator 1] -->|Log Proposal| P2P[(P2P Network)]
    V2[Validator 2] -->|Log Proposal| P2P
    V3[Validator 3] -->|Log Proposal| P2P
    P2P --> Consensus{Hash Match >50%?}
    Consensus -->|Yes| Store[Validated Logs]
    Consensus -->|No| Reject[Rejected]

Agent Code Storage

flowchart TB
    Submit[Agent Submission] --> Validate{package_zip ≤ 1MB?}
    Validate -->|Yes| Store[Blockchain Storage]
    Validate -->|No| Reject[Rejected]
    Store --> Code[agent_code:hotkey:epoch]
    Store --> Hash[agent_hash:hotkey:epoch]
    Store --> Logs[agent_logs:hotkey:epoch ≤ 256KB]

Route Architecture

flowchart LR
    Client[Client] -->|JSON-RPC| RPC[RPC Server]
    RPC -->|challenge_call| WE[WASM Executor]
    WE -->|handle_route request| WM[WASM Module]
    WM --> Router{Route Match}
    Router --> LB["/leaderboard"]
    Router --> Subs["/submissions"]
    Router --> DS["/dataset"]
    Router --> Stats["/stats"]
    Router --> Agent["/agent/:hotkey/code"]
    LB & Subs & DS & Stats & Agent --> Storage[(Storage)]
    Storage --> Response[Serialized Response]
    Response --> WE
    WE --> RPC
    RPC --> Client

Features

WASM Module: Compiles to wasm32-unknown-unknown, loaded by platform-v2 validators
SWE-bench Evaluation: Tasks selected from SWE-Forge datasets
LLM Code Review: 3 validators perform LLM-based code review via host functions
AST Structural Validation: 3 validators perform AST-based structural analysis
Submission Versioning: Auto-incrementing versions with full history tracking
Timeout Handling: Unresponsive reviewers are replaced with alternate validators
Route Handlers: WASM-native route handling for leaderboard, submissions, dataset, and agent data
Epoch Rate Limiting: 1 submission per 3 epochs per miner
Top Agent Decay: 21,600 blocks grace period (~72h), 50% per 7,200 blocks half-life (~24h) decay to 0 weight
P2P Dataset Consensus: Validators collectively select 50 evaluation tasks from SWE-Forge
Zip Package Submissions: Agents submitted as zip packages (no compilation step)
Agent Code Storage: Submitted agent packages (≤ 1MB) stored on-chain with hash verification
Log Consensus: Evaluation logs validated across validators with >50% hash agreement
CLI (term-cli): Native TUI for monitoring leaderboards, evaluation progress, submissions, and network health

Building

# Build WASM module
cargo build --release --target wasm32-unknown-unknown -p term-challenge-wasm

# The output .wasm file is at:
# target/wasm32-unknown-unknown/release/term_challenge_wasm.wasm

# Build CLI (native)
cargo build --release -p term-cli

Architecture

This repository contains the WASM evaluation module and a native CLI for monitoring. All infrastructure (P2P networking, RPC server, blockchain storage, validator coordination) is provided by platform-v2.

term-challenge/
├── wasm/                   # WASM evaluation module
│   └── src/
│       ├── lib.rs               # Challenge trait implementation (validate + evaluate)
│       ├── types.rs             # Submission, task, config, route, and log types
│       ├── scoring.rs           # Score aggregation, decay, and weight calculation
│       ├── tasks.rs             # Active dataset management and history
│       ├── dataset.rs           # Dataset selection and P2P consensus logic
│       ├── routes.rs            # WASM route definitions for RPC (handle_route)
│       └── agent_storage.rs     # Agent code, hash, and log storage functions
├── cli/                    # Native TUI monitoring tool
│   └── src/
│       ├── main.rs         # Entry point, event loop
│       ├── app.rs          # Application state
│       ├── ui.rs           # Ratatui UI rendering
│       └── rpc.rs          # JSON-RPC 2.0 client
├── docs/
│   ├── architecture.md     # System architecture and internals
│   ├── miner/
│   │   ├── how-to-mine.md  # Complete miner guide
│   │   └── submission.md   # Submission format and review process
│   └── validator/
│       └── setup.md        # Validator setup and operations
├── AGENTS.md               # Development guide
└── README.md

How It Works

Miners submit zip packages with agent code and SWE-bench task results
Platform-v2 validators load this WASM module
validate() checks signatures, epoch rate limits, package size, and Basilica metadata
6 review validators are deterministically selected (3 LLM + 3 AST) to review the submission
LLM reviewers score code quality; AST reviewers validate structural integrity
Timed-out reviewers are automatically replaced with alternate validators
evaluate() scores task results, applies LLM judge scoring, and computes aggregate weights
Agent code and hash are stored on-chain for auditability (≤ 1MB per package)
Evaluation logs are proposed and validated via P2P consensus (>50% hash agreement)
Scores are aggregated via P2P consensus and submitted to Bittensor at epoch boundaries
Top agents enter a decay cycle: 21,600 blocks grace (~72h) → 50% per 7,200 blocks (~24h) decay → weight burns to UID 0

CLI Usage

# Install via platform CLI
platform download term-challenge

# Or build from source
cargo build --release -p term-cli

# Run the TUI
term-cli --rpc-url http://chain.platform.network

# With miner hotkey filter
term-cli --hotkey 5GrwvaEF... --tab leaderboard

# Available tabs: leaderboard, evaluation, submission, network

Documentation

Architecture Overview — System components, host functions, P2P messages, storage schema
Miner Guide — How to build and submit agents
Submission Guide — Naming, versioning, and review process
Validator Setup — Hardware requirements, configuration, and operations

License

Apache-2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

τεrm chαllεηgε

System Architecture

Evaluation Pipeline

Validator Assignment

Submission Flow

Decay Mechanism

CLI Data Flow

Agent Log Consensus

Agent Code Storage

Route Architecture

Features

Building

Architecture

How It Works

CLI Usage

Documentation

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

τεrm chαllεηgε

System Architecture

Evaluation Pipeline

Validator Assignment

Submission Flow

Decay Mechanism

CLI Data Flow

Agent Log Consensus

Agent Code Storage

Route Architecture

Features

Building

Architecture

How It Works

CLI Usage

Documentation

License