Term Challenge is a WASM evaluation module for AI agents on the Bittensor network via platform-v2. Miners submit Python agent packages (as zip files) that solve SWE-bench tasks. The WASM module runs inside platform-v2 validators to validate submissions, evaluate task results, and compute scores. A companion native CLI (term-cli) provides a TUI for monitoring leaderboards, evaluation progress, and network health. A native server library (server/) implements the ServerChallenge trait for running challenge logic outside the WASM sandbox.
term-challenge/
├── Cargo.toml # workspace with members = [".", "wasm", "cli", "lib", "server", "storage"]
├── src/
│ ├── lib.rs # Root library crate entry point
│ └── dataset/
│ ├── mod.rs # Dataset module re-exports
│ ├── types.rs # DatasetEntry struct (SWE-forge schema)
│ └── huggingface.rs # HuggingFaceDataset: download, list, cache
├── wasm/
│ ├── Cargo.toml # cdylib, depends on platform-challenge-sdk-wasm
│ └── src/
│ ├── lib.rs # Challenge impl + register_challenge!
│ ├── types.rs # Submission, TaskDefinition, AgentLogs, etc.
│ ├── scoring.rs # Aggregate scoring, decay, weight calculation
│ ├── tasks.rs # Active dataset storage (SWE-bench tasks)
│ ├── dataset.rs # Dataset selection, consensus, and random index generation
│ ├── routes.rs # Challenge route definitions and handlers for RPC
│ ├── agent_storage.rs # Agent code, log, and evaluation status storage
│ ├── ast_validation.rs # Python AST whitelist validation (imports, builtins, patterns)
│ ├── llm_review.rs # LLM-based code review, reviewer selection, aggregation
│ ├── submission.rs # Named submission registry and version tracking
│ └── timeout_handler.rs # Review assignment timeout tracking and replacement
├── lib/
│ ├── Cargo.toml # native library, depends on platform-core & platform-challenge-sdk
│ └── src/
│ ├── lib.rs # Module declarations, re-exports dataset/ChallengeId/Hotkey
│ ├── dataset.rs # Validator-side dataset management types
│ ├── synthetic/mod.rs # Synthetic task generation
│ ├── validation/mod.rs # Validation result types
│ ├── admin/mod.rs # Admin action types
│ ├── util/mod.rs # Utility module
│ ├── util/hotkey.rs # Hotkey parsing helpers (wraps platform_core::Hotkey)
│ ├── cache/mod.rs # Score caching layer
│ ├── chain/mod.rs # Chain submission types
│ ├── worker/mod.rs # Worker job processing types
│ └── bin/term-sudo.rs # Admin CLI binary
├── server/
│ ├── Cargo.toml # lib + bin, depends on platform-challenge-sdk (server mode)
│ └── src/
│ ├── lib.rs # TerminalBenchChallenge implementing ServerChallenge trait
│ ├── main.rs # Binary entry point: CLI args, ChallengeServer::builder().run()
│ ├── server.rs # ChallengeServerState axum HTTP wrapper
│ ├── types.rs # Shared types (std port of wasm/src/types.rs)
│ ├── scoring.rs # Aggregate scoring, decay (uses ChallengeDatabase)
│ ├── tasks.rs # Active dataset storage (uses ChallengeDatabase)
│ ├── dataset.rs # Dataset selection, consensus, random indices
│ ├── routes.rs # Route definitions + handlers (SDK ChallengeRoute/RouteRequest)
│ ├── agent_storage.rs # Agent code, log, status storage (uses ChallengeDatabase)
│ ├── ast_validation.rs # Python AST whitelist validation
│ ├── llm_review.rs # Async LLM review (reqwest HTTP client)
│ ├── submission.rs # Named submission registry and version tracking
│ └── timeout_handler.rs # Review assignment timeout tracking
├── storage/
│ ├── Cargo.toml # native library, depends on platform-core + platform-challenge-sdk
│ └── src/
│ ├── lib.rs # Root module, re-exports, From impls for StorageError
│ ├── traits.rs # ChallengeStorage trait, StorageError, Result alias
│ ├── chain.rs # Chain storage (sled)
│ └── local.rs # Local storage (SQLite)
├── cli/
│ ├── Cargo.toml # native binary, ratatui TUI
│ └── src/
│ ├── main.rs # Entry point, event loop
│ ├── app.rs # Application state
│ ├── ui.rs # Ratatui UI rendering
│ └── rpc.rs # JSON-RPC 2.0 client
├── docs/
│ ├── architecture.md
│ ├── miner/
│ │ ├── how-to-mine.md
│ │ └── submission.md
│ └── validator/
│ └── setup.md
├── .github/
│ └── workflows/
│ ├── ci.yml # Build, clippy, test, WASM build, release on tags
│ └── release.yml # release-please + artifact publishing
├── AGENTS.md
├── README.md
├── LICENSE
├── CHANGELOG.md
└── .githooks/
- Miner submits a zip package with agent code and task results
- RPC receives submission, verifies signature, relays to validators
- Validators run WASM
validate()— checks signature, epoch rate limit, Basilica metadata, package size - 50% validator approval → submission stored in blockchain
- Validators run WASM
evaluate(): a. AST validation — checks Python code against import whitelist, forbidden builtins, and dangerous patterns b. LLM review — optional LLM-based security review viahost_http_post()(if enabled) c. Task scoring — scores task results, optionally applies LLM judge per task d. Aggregate & decay — computes pass rate, applies epoch-based decay - Agent code & logs stored on-chain for auditability (code ≤ 1MB, logs ≤ 256KB)
- Log consensus — validators propose logs, >50% hash agreement required
- Consensus aggregates scores, applies decay, submits weights to Bittensor
- Dual mode: Challenge logic is available as both a WASM module (
wasm/) and a native server library (server/) implementingServerChallenge - WASM mode: The
wasm32-unknown-unknownmodule is loaded by platform-v2 validators - Server mode: The
server/crate implementsServerChallengeusingChallengeDatabase(sled KV store) for storage andreqwestfor HTTP - Host functions (WASM): WASM interacts with the outside world via
host_http_post(),host_storage_get(),host_storage_set(),host_consensus_get_epoch(),host_consensus_get_submission_count(),host_random_seed(),host_get_timestamp() - Server storage: The server crate uses
ChallengeDatabase(sled-backed KV store) viadb.kv_get::<T>(key)/db.kv_set(key, &value)andreqwest::Clientfor HTTP - SWE-bench datasets: Tasks are selected from HuggingFace CortexLM/swe-bench via P2P consensus
- Epoch rate limiting: 1 submission per 3 epochs per miner
- Top agent decay: 60-epoch grace period, then exponential decay with 20-epoch half-life
Agent submissions are stored on-chain for auditability and retrieval. The agent_storage module manages three storage categories:
| Storage Key Format | Content | Max Size |
|---|---|---|
agent_code:<hotkey>:<epoch> |
Raw zip package bytes | 1 MB (1,048,576 bytes) |
agent_hash:<hotkey>:<epoch> |
Hash of the agent package | — |
agent_logs:<hotkey>:<epoch> |
Serialized AgentLogs struct |
256 KB (262,144 bytes) |
- Package size limit: Submissions with
package_zipexceeding 1 MB are rejected at the storage layer. - Log size limit: Serialized logs exceeding 256 KB are rejected. Individual task output previews are truncated to 4 KB (4,096 bytes) before storage.
- Key format: Keys are constructed as
<prefix><hotkey_bytes>:<epoch_le_bytes>using little-endian encoding for the epoch.
The term-cli crate is a native binary (NOT no_std) that provides a terminal user interface for monitoring the term-challenge network.
- Framework: Built with ratatui for TUI rendering
- Transport: Connects to validators via JSON-RPC 2.0 over HTTP
- Target: Standard
x86_64/aarch64native targets (not WASM)
| Tab | Description |
|---|---|
| Leaderboard | Current scores, ranks, and miner hotkeys |
| Evaluation | Live evaluation progress for pending submissions |
| Submission | Recent submission history and status |
| Network | Validator count, epoch info, system health |
| Key | Action |
|---|---|
Tab / Shift+Tab |
Switch between tabs |
↑ / ↓ |
Navigate rows |
r |
Refresh data |
q |
Quit |
epoch_current— Current epoch info (epochNumber,currentBlock,phase,blocksPerEpoch,blockInEpoch,progress)system_health— Node health statusvalidator_count— Number of active validatorschallenge_list— Auto-detect challenge ID (UUID) when only one exists; response format:{ challenges: [{ id, name, ... }] }challenge_callwithchallengeId(UUID) param and paths:/leaderboard— Leaderboard data (includes f64weightper miner)/stats— Total submissions and active miners/decay— Top agent decay status/agent/:hotkey/journey— Evaluation status journey (hotkey is SS58 address)/agent/:hotkey/logs— Evaluation logs for a miner
evaluation_getProgress— Evaluation progress for a submission
# Build library (native)
cargo build --release -p term-challenge-lib
# Build CLI (native)
cargo build --release -p term-cli
# Build storage library (native)
cargo build --release -p term-challenge-storage
# Build WASM module
cargo build --release --target wasm32-unknown-unknown -p term-challenge-wasm
# Build server library
cargo build --release -p term-challenge-server
# Check (no target needed for workspace check)
cargo check -p term-challenge-wasm
cargo check -p term-challenge-serverGit hooks live in .githooks/ and are activated with git config core.hooksPath .githooks.
| Hook | What it does |
|---|---|
pre-commit |
Runs cargo fmt --all, stages formatted files. Skippable with SKIP_GIT_HOOKS=1. |
pre-push |
Full quality gate: format check → cargo check → cargo clippy → cargo test. Skippable with SKIP_GIT_HOOKS=1 or git push --no-verify. |
- No
stdin WASM code. Thewasm/module compiles with#![no_std]. Usealloc::equivalents. Theserver/crate uses standardstd. - Cryptographic signatures use sr25519. SS58 prefix 42. Do NOT switch schemes.
- Conventional commits required. The project uses
release-please. - No
.unwrap()or.expect()in library paths. Use pattern matching orunwrap_or_default(). - Host functions are the ONLY external interface. No direct HTTP, no filesystem, no std::net.
- Do NOT add
#[allow(dead_code)]broadly. Fix unused code or remove it.
Note: The
cli/,core/,lib/,server/, andstorage/crates are exempt from theno_stdrule (rule 1) and the host-functions-only rule (rule 5) since they are native code that runs outside the WASM sandbox. Rules 2, 3, 4, and 6 still apply to all crates.
- Use
alloc::string::String,alloc::vec::Vec,alloc::collections::BTreeMap(WASM code) - Use
serdewithdefault-features = false, features = ["derive", "alloc"](WASM code) - Use
bincodewithdefault-features = falsefor serialization (WASM code) - Use host functions for all I/O:
host_storage_get/set,host_http_post,host_consensus_get_epoch,host_consensus_get_submission_count,host_random_seed,host_get_timestamp(WASM code) - Keep the
register_challenge!macro ABI contract intact - Use standard
stdlibrary features in thecli/andserver/crates - Use
ChallengeDatabaseKV store (kv_get/kv_set) for all storage in theserver/crate
- Do NOT use
std::,println!,std::collections::HashMapin WASM code - Do NOT add heavy dependencies — the WASM module must stay minimal
- Do NOT break the WASM ABI (evaluate, validate, get_name, get_version, get_tasks, configure, alloc, get_routes, handle_route)
- Do NOT store sensitive data in plain text in blockchain storage