AGENTS.md — Term Challenge

Project Purpose

Term Challenge is a WASM evaluation module for AI agents on the Bittensor network via platform-v2. Miners submit Python agent packages (as zip files) that solve SWE-bench tasks. The WASM module runs inside platform-v2 validators to validate submissions, evaluate task results, and compute scores. A companion native CLI (term-cli) provides a TUI for monitoring leaderboards, evaluation progress, and network health. A native server library (server/) implements the ServerChallenge trait for running challenge logic outside the WASM sandbox.

Architecture Overview

term-challenge/
├── Cargo.toml          # workspace with members = [".", "wasm", "cli", "lib", "server", "storage"]
├── src/
│   ├── lib.rs                  # Root library crate entry point
│   └── dataset/
│       ├── mod.rs              # Dataset module re-exports
│       ├── types.rs            # DatasetEntry struct (SWE-forge schema)
│       └── huggingface.rs      # HuggingFaceDataset: download, list, cache
├── wasm/
│   ├── Cargo.toml      # cdylib, depends on platform-challenge-sdk-wasm
│   └── src/
│       ├── lib.rs              # Challenge impl + register_challenge!
│       ├── types.rs            # Submission, TaskDefinition, AgentLogs, etc.
│       ├── scoring.rs          # Aggregate scoring, decay, weight calculation
│       ├── tasks.rs            # Active dataset storage (SWE-bench tasks)
│       ├── dataset.rs          # Dataset selection, consensus, and random index generation
│       ├── routes.rs           # Challenge route definitions and handlers for RPC
│       ├── agent_storage.rs    # Agent code, log, and evaluation status storage
│       ├── ast_validation.rs   # Python AST whitelist validation (imports, builtins, patterns)
│       ├── llm_review.rs       # LLM-based code review, reviewer selection, aggregation
│       ├── submission.rs       # Named submission registry and version tracking
│       └── timeout_handler.rs  # Review assignment timeout tracking and replacement
├── lib/
│   ├── Cargo.toml      # native library, depends on platform-core & platform-challenge-sdk
│   └── src/
│       ├── lib.rs              # Module declarations, re-exports dataset/ChallengeId/Hotkey
│       ├── dataset.rs          # Validator-side dataset management types
│       ├── synthetic/mod.rs    # Synthetic task generation
│       ├── validation/mod.rs   # Validation result types
│       ├── admin/mod.rs        # Admin action types
│       ├── util/mod.rs         # Utility module
│       ├── util/hotkey.rs      # Hotkey parsing helpers (wraps platform_core::Hotkey)
│       ├── cache/mod.rs        # Score caching layer
│       ├── chain/mod.rs        # Chain submission types
│       ├── worker/mod.rs       # Worker job processing types
│       └── bin/term-sudo.rs    # Admin CLI binary
├── server/
│   ├── Cargo.toml      # lib + bin, depends on platform-challenge-sdk (server mode)
│   └── src/
│       ├── lib.rs              # TerminalBenchChallenge implementing ServerChallenge trait
│       ├── main.rs             # Binary entry point: CLI args, ChallengeServer::builder().run()
│       ├── server.rs           # ChallengeServerState axum HTTP wrapper
│       ├── types.rs            # Shared types (std port of wasm/src/types.rs)
│       ├── scoring.rs          # Aggregate scoring, decay (uses ChallengeDatabase)
│       ├── tasks.rs            # Active dataset storage (uses ChallengeDatabase)
│       ├── dataset.rs          # Dataset selection, consensus, random indices
│       ├── routes.rs           # Route definitions + handlers (SDK ChallengeRoute/RouteRequest)
│       ├── agent_storage.rs    # Agent code, log, status storage (uses ChallengeDatabase)
│       ├── ast_validation.rs   # Python AST whitelist validation
│       ├── llm_review.rs       # Async LLM review (reqwest HTTP client)
│       ├── submission.rs       # Named submission registry and version tracking
│       └── timeout_handler.rs  # Review assignment timeout tracking
├── storage/
│   ├── Cargo.toml      # native library, depends on platform-core + platform-challenge-sdk
│   └── src/
│       ├── lib.rs              # Root module, re-exports, From impls for StorageError
│       ├── traits.rs           # ChallengeStorage trait, StorageError, Result alias
│       ├── chain.rs            # Chain storage (sled)
│       └── local.rs            # Local storage (SQLite)
├── cli/
│   ├── Cargo.toml      # native binary, ratatui TUI
│   └── src/
│       ├── main.rs     # Entry point, event loop
│       ├── app.rs      # Application state
│       ├── ui.rs       # Ratatui UI rendering
│       └── rpc.rs      # JSON-RPC 2.0 client
├── docs/
│   ├── architecture.md
│   ├── miner/
│   │   ├── how-to-mine.md
│   │   └── submission.md
│   └── validator/
│       └── setup.md
├── .github/
│   └── workflows/
│       ├── ci.yml          # Build, clippy, test, WASM build, release on tags
│       └── release.yml     # release-please + artifact publishing
├── AGENTS.md
├── README.md
├── LICENSE
├── CHANGELOG.md
└── .githooks/

Data Flow

Miner submits a zip package with agent code and task results
RPC receives submission, verifies signature, relays to validators
Validators run WASM validate() — checks signature, epoch rate limit, Basilica metadata, package size
50% validator approval → submission stored in blockchain
Validators run WASM evaluate(): a. AST validation — checks Python code against import whitelist, forbidden builtins, and dangerous patterns b. LLM review — optional LLM-based security review via host_http_post() (if enabled) c. Task scoring — scores task results, optionally applies LLM judge per task d. Aggregate & decay — computes pass rate, applies epoch-based decay
Agent code & logs stored on-chain for auditability (code ≤ 1MB, logs ≤ 256KB)
Log consensus — validators propose logs, >50% hash agreement required
Consensus aggregates scores, applies decay, submits weights to Bittensor

Key Concepts

Dual mode: Challenge logic is available as both a WASM module (wasm/) and a native server library (server/) implementing ServerChallenge
WASM mode: The wasm32-unknown-unknown module is loaded by platform-v2 validators
Server mode: The server/ crate implements ServerChallenge using ChallengeDatabase (sled KV store) for storage and reqwest for HTTP
Host functions (WASM): WASM interacts with the outside world via host_http_post(), host_storage_get(), host_storage_set(), host_consensus_get_epoch(), host_consensus_get_submission_count(), host_random_seed(), host_get_timestamp()
Server storage: The server crate uses ChallengeDatabase (sled-backed KV store) via db.kv_get::<T>(key) / db.kv_set(key, &value) and reqwest::Client for HTTP
SWE-bench datasets: Tasks are selected from HuggingFace CortexLM/swe-bench via P2P consensus
Epoch rate limiting: 1 submission per 3 epochs per miner
Top agent decay: 60-epoch grace period, then exponential decay with 20-epoch half-life

Agent Code Storage

Agent submissions are stored on-chain for auditability and retrieval. The agent_storage module manages three storage categories:

Storage Key Format	Content	Max Size
`agent_code:<hotkey>:<epoch>`	Raw zip package bytes	1 MB (1,048,576 bytes)
`agent_hash:<hotkey>:<epoch>`	Hash of the agent package	—
`agent_logs:<hotkey>:<epoch>`	Serialized `AgentLogs` struct	256 KB (262,144 bytes)

Package size limit: Submissions with package_zip exceeding 1 MB are rejected at the storage layer.
Log size limit: Serialized logs exceeding 256 KB are rejected. Individual task output previews are truncated to 4 KB (4,096 bytes) before storage.
Key format: Keys are constructed as <prefix><hotkey_bytes>:<epoch_le_bytes> using little-endian encoding for the epoch.

CLI

The term-cli crate is a native binary (NOT no_std) that provides a terminal user interface for monitoring the term-challenge network.

Design

Framework: Built with ratatui for TUI rendering
Transport: Connects to validators via JSON-RPC 2.0 over HTTP
Target: Standard x86_64 / aarch64 native targets (not WASM)

Available Tabs

Tab	Description
Leaderboard	Current scores, ranks, and miner hotkeys
Evaluation	Live evaluation progress for pending submissions
Submission	Recent submission history and status
Network	Validator count, epoch info, system health

Keyboard Shortcuts

Key	Action
`Tab` / `Shift+Tab`	Switch between tabs
`↑` / `↓`	Navigate rows
`r`	Refresh data
`q`	Quit

RPC Methods Used (platform-v2 protocol)

epoch_current — Current epoch info (epochNumber, currentBlock, phase, blocksPerEpoch, blockInEpoch, progress)
system_health — Node health status
validator_count — Number of active validators
challenge_list — Auto-detect challenge ID (UUID) when only one exists; response format: { challenges: [{ id, name, ... }] }
challenge_call with challengeId (UUID) param and paths:
- /leaderboard — Leaderboard data (includes f64 weight per miner)
- /stats — Total submissions and active miners
- /decay — Top agent decay status
- /agent/:hotkey/journey — Evaluation status journey (hotkey is SS58 address)
- /agent/:hotkey/logs — Evaluation logs for a miner
evaluation_getProgress — Evaluation progress for a submission

Build Commands

# Build library (native)
cargo build --release -p term-challenge-lib

# Build CLI (native)
cargo build --release -p term-cli

# Build storage library (native)
cargo build --release -p term-challenge-storage

# Build WASM module
cargo build --release --target wasm32-unknown-unknown -p term-challenge-wasm

# Build server library
cargo build --release -p term-challenge-server

# Check (no target needed for workspace check)
cargo check -p term-challenge-wasm
cargo check -p term-challenge-server

Git Hooks

Git hooks live in .githooks/ and are activated with git config core.hooksPath .githooks.

Hook	What it does
`pre-commit`	Runs `cargo fmt --all`, stages formatted files. Skippable with `SKIP_GIT_HOOKS=1`.
`pre-push`	Full quality gate: format check → `cargo check` → `cargo clippy` → `cargo test`. Skippable with `SKIP_GIT_HOOKS=1` or `git push --no-verify`.

CRITICAL RULES

No std in WASM code. The wasm/ module compiles with #![no_std]. Use alloc:: equivalents. The server/ crate uses standard std.
Cryptographic signatures use sr25519. SS58 prefix 42. Do NOT switch schemes.
Conventional commits required. The project uses release-please.
No .unwrap() or .expect() in library paths. Use pattern matching or unwrap_or_default().
Host functions are the ONLY external interface. No direct HTTP, no filesystem, no std::net.
Do NOT add #[allow(dead_code)] broadly. Fix unused code or remove it.

Note: The cli/, core/, lib/, server/, and storage/ crates are exempt from the no_std rule (rule 1) and the host-functions-only rule (rule 5) since they are native code that runs outside the WASM sandbox. Rules 2, 3, 4, and 6 still apply to all crates.

DO / DO NOT

DO

Use alloc::string::String, alloc::vec::Vec, alloc::collections::BTreeMap (WASM code)
Use serde with default-features = false, features = ["derive", "alloc"] (WASM code)
Use bincode with default-features = false for serialization (WASM code)
Use host functions for all I/O: host_storage_get/set, host_http_post, host_consensus_get_epoch, host_consensus_get_submission_count, host_random_seed, host_get_timestamp (WASM code)
Keep the register_challenge! macro ABI contract intact
Use standard std library features in the cli/ and server/ crates
Use ChallengeDatabase KV store (kv_get/kv_set) for all storage in the server/ crate

DO NOT

Do NOT use std::, println!, std::collections::HashMap in WASM code
Do NOT add heavy dependencies — the WASM module must stay minimal
Do NOT break the WASM ABI (evaluate, validate, get_name, get_version, get_tasks, configure, alloc, get_routes, handle_route)
Do NOT store sensitive data in plain text in blockchain storage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md — Term Challenge

Project Purpose

Architecture Overview

Data Flow

Key Concepts

Agent Code Storage

CLI

Design

Available Tabs

Keyboard Shortcuts

RPC Methods Used (platform-v2 protocol)

Build Commands

Git Hooks

CRITICAL RULES

DO / DO NOT

DO

DO NOT

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md — Term Challenge

Project Purpose

Architecture Overview

Data Flow

Key Concepts

Agent Code Storage

CLI

Design

Available Tabs

Keyboard Shortcuts

RPC Methods Used (platform-v2 protocol)

Build Commands

Git Hooks

CRITICAL RULES

DO / DO NOT

DO

DO NOT