2.3.0 (2026-03-11)
- add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
- allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- correct Basilica API types and SSH key support (63d8174)
- enable apt/sudo in Basilica containers (d83cb8c)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- full clone for commit checkout, explicit pip/pytest symlinks (a0c1d6f)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install base tools, runtimes, and filter redundant deps for Basilica (80a3a0c)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- move workspace to /home/agent/sessions, fix node_modules permissions, improve agent code error handling (1ced355)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- pip 22 compatibility for base tools and install commands (68bb93f)
- remove redundant into_iter() for clippy (eaf2a7c)
- report task status incrementally during batch execution (4440fd8)
- resolve all clippy warnings for CI (2b3ae9d)
- revert Dockerfile git-lfs changes, add GIT_LFS_SKIP_SMUDGE to snapshot clone (7130823)
- run agent from repo_dir CWD, use absolute path to agent.py (cc6bcde)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /code-hash endpoint for code integrity verification (0a8e01b)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add Basilica API client for container provisioning (8a0afca)
- add install field from swe-forge dataset, fix default split to train, add openssh-client (737ab1f)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- auto-install language runtimes from install_config version fields (25b2e94)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- extract full agent project instead of concatenating files (3ac1023)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
- replace per-file HF downloads with bulk git clone snapshot (6036b78)
- run each task in its own Basilica container via SSH (432107b)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-11)
- add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
- allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- correct Basilica API types and SSH key support (63d8174)
- enable apt/sudo in Basilica containers (d83cb8c)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- full clone for commit checkout, explicit pip/pytest symlinks (a0c1d6f)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install base tools, runtimes, and filter redundant deps for Basilica (80a3a0c)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- move workspace to /home/agent/sessions, fix node_modules permissions, improve agent code error handling (1ced355)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- pip 22 compatibility for base tools and install commands (68bb93f)
- remove redundant into_iter() for clippy (eaf2a7c)
- report task status incrementally during batch execution (4440fd8)
- resolve all clippy warnings for CI (2b3ae9d)
- revert Dockerfile git-lfs changes, add GIT_LFS_SKIP_SMUDGE to snapshot clone (7130823)
- run agent from repo_dir CWD, use absolute path to agent.py (cc6bcde)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /code-hash endpoint for code integrity verification (0a8e01b)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add Basilica API client for container provisioning (8a0afca)
- add install field from swe-forge dataset, fix default split to train, add openssh-client (737ab1f)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- auto-install language runtimes from install_config version fields (25b2e94)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- extract full agent project instead of concatenating files (3ac1023)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
- replace per-file HF downloads with bulk git clone snapshot (6036b78)
- run each task in its own Basilica container via SSH (432107b)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-11)
- add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
- allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- correct Basilica API types and SSH key support (63d8174)
- enable apt/sudo in Basilica containers (d83cb8c)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- full clone for commit checkout, explicit pip/pytest symlinks (a0c1d6f)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install base tools, runtimes, and filter redundant deps for Basilica (80a3a0c)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- move workspace to /home/agent/sessions, fix node_modules permissions, improve agent code error handling (1ced355)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- pip 22 compatibility for base tools and install commands (68bb93f)
- report task status incrementally during batch execution (4440fd8)
- resolve all clippy warnings for CI (2b3ae9d)
- run agent from repo_dir CWD, use absolute path to agent.py (cc6bcde)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /code-hash endpoint for code integrity verification (0a8e01b)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add Basilica API client for container provisioning (8a0afca)
- add install field from swe-forge dataset, fix default split to train, add openssh-client (737ab1f)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- auto-install language runtimes from install_config version fields (25b2e94)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- extract full agent project instead of concatenating files (3ac1023)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
- run each task in its own Basilica container via SSH (432107b)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-03)
- add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
- allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- report task status incrementally during batch execution (4440fd8)
- run agent from repo_dir CWD, use absolute path to agent.py (cc6bcde)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /code-hash endpoint for code integrity verification (0a8e01b)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- extract full agent project instead of concatenating files (3ac1023)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-02)
- add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
- allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- run agent from repo_dir CWD, use absolute path to agent.py (cc6bcde)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /code-hash endpoint for code integrity verification (0a8e01b)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- extract full agent project instead of concatenating files (3ac1023)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-02)
- add --break-system-packages for pip installs + pip.conf bypass PEP 668 (14430c4)
- allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /code-hash endpoint for code integrity verification (0a8e01b)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- extract full agent project instead of concatenating files (3ac1023)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-02)
- allow clippy too_many_arguments for run_task_pipeline (6eb69c2)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /code-hash endpoint for code integrity verification (0a8e01b)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- extract full agent project instead of concatenating files (3ac1023)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-02)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /code-hash endpoint for code integrity verification (0a8e01b)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- propagate agent_env to run_agent and pass --instruction arg to Python agents (d922264)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-02)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- upgrade Go to 1.23 and Node to 20 LTS in Dockerfile (67ca713)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /code-hash endpoint for code integrity verification (0a8e01b)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-02)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /code-hash endpoint for code integrity verification (0a8e01b)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-02)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- change default max_concurrent_tasks from 8 to 6, support CONCURRENTLY_TASKS env var (eaba581)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-03-02)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add /upload-agent-json endpoint for JSON-based agent upload (9cfa1da)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-02-28)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- increase clone/install timeout from 180s to 600s (95cecc3)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.3.0 (2026-02-28)
- auto-install deps, python3 symlink, detect full commands in fail_to_pass, language-aware test scripts (a38497f)
- config test race condition with env var mutex (2963325)
- expose agent_output and agent_patch in TaskResult and API responses (348c251)
- extract_agent_only for /evaluate - no tasks/ dir required (2b90ee1)
- filter out apt-get/system commands from install (Basilica blocks syscalls), keep project-level installs (e5365da)
- handle null test_patch from HuggingFace API (deserialize null as empty string) (492d068)
- install corepack/yarn/pnpm globally via npm in Dockerfile (b7183e8)
- normalize repo URL in parse_task (add github.com prefix) (398a6fd)
- run as root (Basilica blocks sudo), remove sudo prefix logic (477a433)
- sudo for apt-get in install commands, add golang/corepack/sudo to Dockerfile (1aceb88)
- use :id path params for Axum 0.7 (not {id} which is 0.8) (5dfa0c1)
- /evaluate endpoint using stored agent + TRUSTED_VALIDATORS whitelist (b6aee7a)
- add POST /submit_tasks endpoint + fix HuggingFace dataset compat (d92444c)
- agent user with sudo for apt-install, run all commands as non-root agent (e3f574a)
- agent ZIP upload frontend with env vars + SUDO_PASSWORD auth (3aa5184)
- fat Docker image with all language runtimes (java, rust, pnpm, unzip, etc.) (3855f2d)
- fetch task definitions from HF repo (workspace.yaml + tests/), remove auto_install hack (7162a39)
- swe-bench/swe-forge integration - extend WorkspaceConfig with fail_to_pass/pass_to_pass/install_config/difficulty fields - parse swe-forge workspace.yaml native fields as test script fallback - capture git diff (agent patch) after agent execution - add /dataset endpoint to fetch from HuggingFace CortexLM/swe-forge - wire fail_to_pass/pass_to_pass in dataset entry conversion (814259e)
2.2.0 (2026-02-20)
2.1.0 (2026-02-20)
- integrate HuggingFace dataset handler with task/evaluation system (db3ba95)
2.0.0 (2026-02-18)
- auth: replace static hotkey/API-key auth with Bittensor validator whitelisting and 50% consensus (#5) (a573ad0)
-
auth: WORKER_API_KEY env var and X-Api-Key header no longer required. All validators on Bittensor netuid 100 with sufficient stake are auto-whitelisted.
-
ci: trigger CI run
-
fix(security): address auth bypass, input validation, and config issues
- Move nonce consumption AFTER signature verification in verify_request() to prevent attackers from burning legitimate nonces via invalid signatures
- Fix TOCTOU race in NonceStore::check_and_insert() using atomic DashMap entry API instead of separate contains_key + insert
- Add input length limits for auth headers (hotkey 128B, nonce 256B, signature 256B) to prevent memory exhaustion via oversized values
- Add consensus_threshold validation in Config::from_env() — must be in range (0.0, 1.0], panics at startup if invalid
- Add saturating conversion for consensus required calculation to prevent integer overflow on f64→usize cast
- Add tests for all security fixes
-
fix(dead-code): remove orphaned default_concurrent fn and unnecessary allow(dead_code)
-
fix: code quality issues in bittensor validator consensus
- Extract magic number 100 to configurable MAX_PENDING_CONSENSUS
- Restore #[allow(dead_code)] on DEFAULT_MAX_OUTPUT_BYTES constant
- Use anyhow::Context instead of map_err(anyhow::anyhow!) in validator_whitelist
- fix(security): address race condition, config panic, SS58 checksum, and container security
- consensus.rs: Fix TOCTOU race condition in record_vote by using DashMap entry API (remove_entry) to atomically check votes and remove entry while holding the shard lock, preventing concurrent threads from inserting votes between drop and remove
- config.rs: Replace assert! with proper Result<Self, String> return from Config::from_env() to avoid panicking in production on invalid CONSENSUS_THRESHOLD values
- main.rs: Update Config::from_env() call to handle Result with expect
- auth.rs: Add SS58 checksum verification using Blake2b-512 (correct Substrate algorithm) in ss58_to_public_key_bytes to reject addresses with corrupted checksums; previously only decoded base58 without validating the 2-byte checksum suffix
- Dockerfile: Add non-root executor user for container runtime security
- fix(dead-code): remove unused max_output_bytes config field and constant
Remove DEFAULT_MAX_OUTPUT_BYTES constant and max_output_bytes Config field that were defined and populated from env but never read anywhere outside config.rs. Both had #[allow(dead_code)] annotations suppressing warnings.
- fix(quality): replace expect/unwrap with proper error handling, extract magic numbers to constants
- main.rs: Replace .expect() on Config::from_env() with match + tracing::error! + process::exit(1)
- validator_whitelist.rs: Extract retry count (3) and backoff base (2) to named constants
- validator_whitelist.rs: Replace unwrap_or_else on Option with if-let pattern
- consensus.rs: Extract reaper interval (30s) to REAPER_INTERVAL_SECS constant
- fix(security): address multiple security vulnerabilities in PR files
-
consensus.rs: Remove archive_data storage from PendingConsensus to prevent memory exhaustion (up to 50GB with 100 pending × 500MB each). Callers now use their own archive bytes since all votes for the same hash have identical data.
-
handlers.rs: Stream multipart upload with per-chunk size enforcement instead of buffering entire archive before checking size limit. Sanitize error messages to not leak internal details (file paths, extraction errors) to clients; log details server-side instead.
-
auth.rs: Add nonce format validation requiring non-empty printable ASCII characters (defense-in-depth against log injection and empty nonce edge cases).
-
main.rs: Replace .unwrap() on TcpListener::bind and axum::serve with proper error logging and process::exit per AGENTS.md rules.
-
ws.rs: Replace .unwrap() on serde_json::to_string with unwrap_or_default() to comply with AGENTS.md no-unwrap rule.
-
fix(dead-code): rename misleading underscore-prefixed variable in consensus
-
fix(quality): replace unwrap/expect with proper error handling in production code
- main.rs:21: Replace .parse().unwrap() on tracing directive with unwrap_or_else fallback to INFO level directive
- main.rs:36: Replace .expect() on workspace dir creation with error log + process::exit(1) pattern
- main.rs:110: Replace .expect() on ctrl_c handler with if-let-Err that logs and returns gracefully
- executor.rs:189: Replace semaphore.acquire().unwrap() with match that handles closed semaphore by creating a failed TaskResult
All changes follow AGENTS.md rule: no .unwrap()/.expect() in production code paths. Test code is unchanged.
- docs: refresh AGENTS.md
1.2.0 (2026-02-17)
- auth: add sr25519 signature + nonce verification (dc8d8d4)
- auth: require API key alongside whitelisted hotkey (#3) (887f72b)
1.1.0 (2026-02-17)
- bump Rust Docker image to 1.85 for edition2024 support (209f460)
- lowercase GHCR image tags for Docker push (89449f9)
- remove target-cpu=native to avoid SIGILL on Blacksmith runners (22bcb85)
- use rust:1.93-bookworm Docker image (ddd1a24)
- initial term-executor — remote evaluation server for Basilica (18f4f67)
- production-ready implementation with Basilica integration (5797025)
- minimal Docker image - remove all language runtimes from executor (38058e8)