2.3.0 (2026-03-11)

Bug Fixes

add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
correct Basilica API types and SSH key support (63d8174)
enable apt/sudo in Basilica containers (d83cb8c)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
full clone for commit checkout, explicit pip/pytest symlinks (a0c1d6f)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install base tools, runtimes, and filter redundant deps for Basilica (80a3a0c)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
move workspace to /home/agent/sessions, fix node_modules permissions, improve agent code error handling (1ced355)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
pip 22 compatibility for base tools and install commands (68bb93f)
remove redundant into_iter() for clippy (eaf2a7c)
report task status incrementally during batch execution (4440fd8)
resolve all clippy warnings for CI (2b3ae9d)
revert Dockerfile git-lfs changes, add GIT_LFS_SKIP_SMUDGE to snapshot clone (7130823)
run agent from repo_dir CWD, use absolute path to agent.py (cc6bcde)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /code-hash endpoint for code integrity verification (0a8e01b)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add Basilica API client for container provisioning (8a0afca)
add install field from swe-forge dataset, fix default split to train, add openssh-client (737ab1f)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
auto-install language runtimes from install_config version fields (25b2e94)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
extract full agent project instead of concatenating files (3ac1023)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
replace per-file HF downloads with bulk git clone snapshot (6036b78)
run each task in its own Basilica container via SSH (432107b)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-11)

Bug Fixes

add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
correct Basilica API types and SSH key support (63d8174)
enable apt/sudo in Basilica containers (d83cb8c)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
full clone for commit checkout, explicit pip/pytest symlinks (a0c1d6f)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install base tools, runtimes, and filter redundant deps for Basilica (80a3a0c)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
move workspace to /home/agent/sessions, fix node_modules permissions, improve agent code error handling (1ced355)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
pip 22 compatibility for base tools and install commands (68bb93f)
remove redundant into_iter() for clippy (eaf2a7c)
report task status incrementally during batch execution (4440fd8)
resolve all clippy warnings for CI (2b3ae9d)
revert Dockerfile git-lfs changes, add GIT_LFS_SKIP_SMUDGE to snapshot clone (7130823)
run agent from repo_dir CWD, use absolute path to agent.py (cc6bcde)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /code-hash endpoint for code integrity verification (0a8e01b)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add Basilica API client for container provisioning (8a0afca)
add install field from swe-forge dataset, fix default split to train, add openssh-client (737ab1f)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
auto-install language runtimes from install_config version fields (25b2e94)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
extract full agent project instead of concatenating files (3ac1023)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
replace per-file HF downloads with bulk git clone snapshot (6036b78)
run each task in its own Basilica container via SSH (432107b)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-11)

Bug Fixes

add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
correct Basilica API types and SSH key support (63d8174)
enable apt/sudo in Basilica containers (d83cb8c)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
full clone for commit checkout, explicit pip/pytest symlinks (a0c1d6f)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install base tools, runtimes, and filter redundant deps for Basilica (80a3a0c)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
move workspace to /home/agent/sessions, fix node_modules permissions, improve agent code error handling (1ced355)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
pip 22 compatibility for base tools and install commands (68bb93f)
report task status incrementally during batch execution (4440fd8)
resolve all clippy warnings for CI (2b3ae9d)
run agent from repo_dir CWD, use absolute path to agent.py (cc6bcde)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /code-hash endpoint for code integrity verification (0a8e01b)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add Basilica API client for container provisioning (8a0afca)
add install field from swe-forge dataset, fix default split to train, add openssh-client (737ab1f)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
auto-install language runtimes from install_config version fields (25b2e94)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
extract full agent project instead of concatenating files (3ac1023)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
run each task in its own Basilica container via SSH (432107b)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-03)

Bug Fixes

add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
report task status incrementally during batch execution (4440fd8)
run agent from repo_dir CWD, use absolute path to agent.py (cc6bcde)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /code-hash endpoint for code integrity verification (0a8e01b)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
extract full agent project instead of concatenating files (3ac1023)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-02)

Bug Fixes

add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
run agent from repo_dir CWD, use absolute path to agent.py (cc6bcde)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /code-hash endpoint for code integrity verification (0a8e01b)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
extract full agent project instead of concatenating files (3ac1023)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-02)

Bug Fixes

add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /code-hash endpoint for code integrity verification (0a8e01b)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
extract full agent project instead of concatenating files (3ac1023)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-02)

Bug Fixes

allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /code-hash endpoint for code integrity verification (0a8e01b)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
extract full agent project instead of concatenating files (3ac1023)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-02)

Bug Fixes

auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /code-hash endpoint for code integrity verification (0a8e01b)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-02)

Bug Fixes

auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /code-hash endpoint for code integrity verification (0a8e01b)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-02)

Bug Fixes

auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /code-hash endpoint for code integrity verification (0a8e01b)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-02)

Bug Fixes

auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-03-02)

Bug Fixes

auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-02-28)

Bug Fixes

auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
increase clone/install timeout from 180s to 600s (95cecc3)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.3.0 (2026-02-28)

Bug Fixes

auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
config test race condition with env var mutex (2963325)
expose agent_output and agent_patch in TaskResult and API responses (348c251)
extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
normalize repo URL in parse_task (add github.com prefix) (398a6fd)
run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)

Features

/evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)

2.2.0 (2026-02-20)

Features

evaluation: add evaluation module using platform-challenge-sdk types (#6) (78a369e)

2.1.0 (2026-02-20)

Features

integrate HuggingFace dataset handler with task/evaluation system (db3ba95)

2.0.0 (2026-02-18)

Features

auth: replace static hotkey/API-key auth with Bittensor validator whitelisting and 50% consensus (#5) (a573ad0)

BREAKING CHANGES

auth: WORKER_API_KEY env var and X-Api-Key header no longer required. All validators on Bittensor netuid 100 with sufficient stake are auto-whitelisted.
ci: trigger CI run
fix(security): address auth bypass, input validation, and config issues

Move nonce consumption AFTER signature verification in verify_request() to prevent attackers from burning legitimate nonces via invalid signatures
Fix TOCTOU race in NonceStore::check_and_insert() using atomic DashMap entry API instead of separate contains_key + insert
Add input length limits for auth headers (hotkey 128B, nonce 256B, signature 256B) to prevent memory exhaustion via oversized values
Add consensus_threshold validation in Config::from_env() — must be in range (0.0, 1.0], panics at startup if invalid
Add saturating conversion for consensus required calculation to prevent integer overflow on f64→usize cast
Add tests for all security fixes

fix(dead-code): remove orphaned default_concurrent fn and unnecessary allow(dead_code)
fix: code quality issues in bittensor validator consensus

Extract magic number 100 to configurable MAX_PENDING_CONSENSUS
Restore #[allow(dead_code)] on DEFAULT_MAX_OUTPUT_BYTES constant
Use anyhow::Context instead of map_err(anyhow::anyhow!) in validator_whitelist

fix(security): address race condition, config panic, SS58 checksum, and container security

consensus.rs: Fix TOCTOU race condition in record_vote by using DashMap entry API (remove_entry) to atomically check votes and remove entry while holding the shard lock, preventing concurrent threads from inserting votes between drop and remove
config.rs: Replace assert! with proper Result<Self, String> return from Config::from_env() to avoid panicking in production on invalid CONSENSUS_THRESHOLD values
main.rs: Update Config::from_env() call to handle Result with expect
auth.rs: Add SS58 checksum verification using Blake2b-512 (correct Substrate algorithm) in ss58_to_public_key_bytes to reject addresses with corrupted checksums; previously only decoded base58 without validating the 2-byte checksum suffix
Dockerfile: Add non-root executor user for container runtime security

fix(dead-code): remove unused max_output_bytes config field and constant

Remove DEFAULT_MAX_OUTPUT_BYTES constant and max_output_bytes Config field that were defined and populated from env but never read anywhere outside config.rs. Both had #[allow(dead_code)] annotations suppressing warnings.

fix(quality): replace expect/unwrap with proper error handling, extract magic numbers to constants

main.rs: Replace .expect() on Config::from_env() with match + tracing::error! + process::exit(1)
validator_whitelist.rs: Extract retry count (3) and backoff base (2) to named constants
validator_whitelist.rs: Replace unwrap_or_else on Option with if-let pattern
consensus.rs: Extract reaper interval (30s) to REAPER_INTERVAL_SECS constant

fix(security): address multiple security vulnerabilities in PR files

consensus.rs: Remove archive_data storage from PendingConsensus to prevent memory exhaustion (up to 50GB with 100 pending × 500MB each). Callers now use their own archive bytes since all votes for the same hash have identical data.
handlers.rs: Stream multipart upload with per-chunk size enforcement instead of buffering entire archive before checking size limit. Sanitize error messages to not leak internal details (file paths, extraction errors) to clients; log details server-side instead.
auth.rs: Add nonce format validation requiring non-empty printable ASCII characters (defense-in-depth against log injection and empty nonce edge cases).
main.rs: Replace .unwrap() on TcpListener::bind and axum::serve with proper error logging and process::exit per AGENTS.md rules.
ws.rs: Replace .unwrap() on serde_json::to_string with unwrap_or_default() to comply with AGENTS.md no-unwrap rule.

fix(dead-code): rename misleading underscore-prefixed variable in consensus
fix(quality): replace unwrap/expect with proper error handling in production code

main.rs:21: Replace .parse().unwrap() on tracing directive with unwrap_or_else fallback to INFO level directive
main.rs:36: Replace .expect() on workspace dir creation with error log + process::exit(1) pattern
main.rs:110: Replace .expect() on ctrl_c handler with if-let-Err that logs and returns gracefully
executor.rs:189: Replace semaphore.acquire().unwrap() with match that handles closed semaphore by creating a failed TaskResult

All changes follow AGENTS.md rule: no .unwrap()/.expect() in production code paths. Test code is unchanged.

docs: refresh AGENTS.md

1.2.0 (2026-02-17)

Features

auth: add sr25519 signature + nonce verification (dc8d8d4)
auth: require API key alongside whitelisted hotkey (#3) (887f72b)

1.1.0 (2026-02-17)

Features

executor: add SWE-bench batch evaluation with hotkey auth and WebSocket streaming (#2) (8bfa8ee)

1.0.0 (2026-02-17)

Bug Fixes

bump Rust Docker image to 1.85 for edition2024 support (209f460)
lowercase GHCR image tags for Docker push (89449f9)
remove target-cpu=native to avoid SIGILL on Blacksmith runners (22bcb85)
use rust:1.93-bookworm Docker image (ddd1a24)

Features

initial term-executor — remote evaluation server for Basilica (18f4f67)
production-ready implementation with Basilica integration (5797025)

Performance Improvements

minimal Docker image - remove all language runtimes from executor (38058e8)

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

2.3.0 (2026-03-11)

Bug Fixes

Features

2.3.0 (2026-03-11)

Bug Fixes

Features

2.3.0 (2026-03-11)

Bug Fixes

Features

2.3.0 (2026-03-03)

Bug Fixes

Features

2.3.0 (2026-03-02)

Bug Fixes

Features

2.3.0 (2026-03-02)

Bug Fixes

Features

2.3.0 (2026-03-02)

Bug Fixes

Features

2.3.0 (2026-03-02)

Bug Fixes

Features

2.3.0 (2026-03-02)

Bug Fixes

Features

2.3.0 (2026-03-02)

Bug Fixes

Features

2.3.0 (2026-03-02)

Bug Fixes

Features

2.3.0 (2026-03-02)

Bug Fixes

Features

2.3.0 (2026-02-28)

Bug Fixes

Features

2.3.0 (2026-02-28)

Bug Fixes

Features

2.2.0 (2026-02-20)

Features

2.1.0 (2026-02-20)

Features

2.0.0 (2026-02-18)

Features

BREAKING CHANGES

1.2.0 (2026-02-17)

Features

1.1.0 (2026-02-17)

Features

1.0.0 (2026-02-17)

Bug Fixes

Features

Performance Improvements