Reviewer-first control tower for vibe coders: orchestrate Codex, Claude, Gemini, and more in one evidence-gated loop.
Run memory-aware, multi-agent consensus workflows to find real bugs, ship safer fixes, and continuously evolve your codebase with full observability.
Brand mode (low-risk rename): display name = AWE-AgentForge, runtime/package IDs stay awe-agentcheck / awe_agentcheck.
🇨🇳 中文文档 · Runbook · Architecture · Contributing · Security · Dashboard Guide · Stars · Quick Start
| Date | Daily Summary |
|---|---|
| 2026-02-22 | Added anti-drift hardening for autonomous loops: auto-merge scope guard now blocks meta-only policy/doc gate relaxations in discovery mode, structural mode now requires touching architecture-violation scope, and prompt guidance now explicitly forbids “policy bypass instead of code fix.” |
| 2026-02-21 | Landed integrated memory/runtime controls and then hardened proposal-review contracts end-to-end: reviewer issue IDs, author structured issue responses, reviewer issue-check closure gate, new observability events/artifacts, and full verification pass (ruff/mypy/pytest/bandit/pytest-cov). |
| 2026-02-20 | Adapter strategy/factory split, service-layer package split, prompt/template + LangGraph round-flow upgrades, dashboard modularization, and CI/governance/security baseline hardening. |
| 2026-02-19 | Reviewer-first/manual-consensus stabilization, preflight/precompletion/resume guardrails, benchmark + analytics loops, and project history/PR summary integrations. |
Detailed timeline is maintained in CHANGELOG.auto.md.
|
Multi-Agent Collaboration Run cross-agent workflows where one model authors, others review, and sessions challenge each other until the result is defensible. |
Bug Resolution Engine Turn vague failures into structured rounds: reproduce, patch, review, verify, and gate. Built for real bug-fixing throughput, not demo chats. |
Continuous Self-Evolution Run guided or proactive evolution loops so agents can propose, test, and refine improvements beyond the immediate bug ticket. |
|
Human + Policy Control Manual author approval, medium-gate decisions, and force-fail controls keep operators in charge when risk is high. |
Live Operations Console Monitor project tree, role sessions, and conversation flow in real time, then execute task controls from a single surface. |
Reliability + Observability Use watchdog timeouts, provider fallback, cooldowns, metrics, logs, and traces to keep long-running automation measurable and stable. |
Preview focus:
- Terminal pixel visual style.
- High-density multi-role session panel (not only 2-3 roles).
- Conversation-centric layout with operational controls visible.
Before diving into usage, here are the key concepts:
Every task has one author (who writes the code) and one or more reviewers (who evaluate it). Participants are identified using the provider#alias format:
| Format | Meaning |
|---|---|
claude#author-A |
Claude CLI acting as author, alias "author-A" |
codex#review-B |
Codex CLI acting as reviewer, alias "review-B" |
gemini#review-C |
Gemini CLI acting as second reviewer, alias "review-C" |
The provider determines which CLI tool is invoked (claude, codex, or gemini). The alias is a human-readable label for identification in the web console and logs.
Every task follows this lifecycle:
queued → running → passed / failed_gate / failed_system / canceled
In manual mode (self_loop_mode=0), an extra state is inserted:
queued → running → waiting_manual → (approve) → queued → running → passed/failed
→ (reject) → canceled
running in manual mode is now a proposal-consensus stage (reviewer-first when debate_mode=1) before pausing at waiting_manual.
| Control | Values | Default | What It Does |
|---|---|---|---|
sandbox_mode |
0 / 1 |
1 |
1 = run in an isolated *-lab copy of the workspace; 0 = run directly in main workspace |
self_loop_mode |
0 / 1 |
0 |
0 = run proposal consensus rounds, then pause for approval; 1 = run autonomous implementation/review loops |
auto_merge |
0 / 1 |
1 |
1 = on pass, auto-merge changes back + generate changelog; 0 = keep results in sandbox only |
Tip
Recommended defaults for safety: sandbox_mode=1 + self_loop_mode=0 + auto_merge=1 — sandbox execution with human sign-off and automatic artifact fusion on pass.
| Control | Values | Default | What It Does |
|---|---|---|---|
memory_mode |
off / basic / strict |
basic |
Controls memory recall/persistence aggressiveness for proposal/discussion/implementation/review prompts |
phase_timeout_seconds |
JSON map | {} |
Optional per-phase timeout override for proposal, discussion, implementation, review, command |
CLI mirrors these controls via --memory-mode and repeatable --phase-timeout phase=seconds.
- Python 3.10+
- Claude CLI installed and authenticated (for Claude participants)
- Codex CLI installed and authenticated (for Codex participants)
- Gemini CLI installed and authenticated (for Gemini participants)
- PostgreSQL (optional — falls back to in-memory database if unavailable)
git clone https://github.com/cloveric/awe-agentforge.git
cd awe-agentforge
pip install -e .[dev]
# Optional: copy baseline env and adjust
cp .env.example .envThe system needs to know where your tools are and how to connect. Set the following environment variables:
# Required: tell Python where the source is
$env:PYTHONPATH="src"
# Optional: database connection (omit for in-memory mode)
$env:AWE_DATABASE_URL="postgresql+psycopg://postgres:postgres@localhost:5432/awe_agentcheck?connect_timeout=2"
# Optional: where task artifacts (logs, reports, events) are stored
$env:AWE_ARTIFACT_ROOT=".agents"
# Optional: workflow orchestrator backend (langgraph/classic)
$env:AWE_WORKFLOW_BACKEND="langgraph"Or use .env.example as a starter and export the values in your shell.
All environment variables reference
| Variable | Default | Description |
|---|---|---|
PYTHONPATH |
(none) | Must include src/ directory |
AWE_DATABASE_URL |
postgresql+psycopg://...?...connect_timeout=2 |
PostgreSQL connection string. If DB is unavailable, fallback is faster and then switches to in-memory |
AWE_ARTIFACT_ROOT |
.agents |
Directory for task artifacts (threads, events, reports) |
AWE_CLAUDE_COMMAND |
claude -p --dangerously-skip-permissions --effort low --model claude-opus-4-6 |
Command template for invoking Claude CLI |
AWE_CODEX_COMMAND |
codex exec --skip-git-repo-check ... -c model_reasoning_effort=xhigh |
Command template for invoking Codex CLI |
AWE_GEMINI_COMMAND |
gemini --yolo |
Command template for invoking Gemini CLI |
AWE_PARTICIPANT_TIMEOUT_SECONDS |
3600 |
Max seconds a single participant (Claude/Codex/Gemini) can run per step |
AWE_COMMAND_TIMEOUT_SECONDS |
300 |
Max seconds for test/lint commands |
AWE_PARTICIPANT_TIMEOUT_RETRIES |
1 |
Retry count when a participant times out |
AWE_MAX_CONCURRENT_RUNNING_TASKS |
1 |
How many tasks can run simultaneously |
AWE_WORKFLOW_BACKEND |
langgraph |
Workflow backend (langgraph preferred, classic fallback) |
AWE_ARCH_AUDIT_MODE |
(auto by evolution level) | Architecture audit enforcement mode: off, warn, hard |
AWE_ARCH_PYTHON_FILE_LINES_MAX |
1200 |
Override max lines for a Python file in architecture audit |
AWE_ARCH_FRONTEND_FILE_LINES_MAX |
2500 |
Override max lines for frontend files in architecture audit |
AWE_ARCH_RESPONSIBILITY_KEYWORDS_MAX |
10 |
Override mixed-responsibility keyword threshold for large Python files |
AWE_ARCH_SERVICE_FILE_LINES_MAX |
4500 |
Override max lines for src/awe_agentcheck/service.py |
AWE_ARCH_WORKFLOW_FILE_LINES_MAX |
2600 |
Override max lines for src/awe_agentcheck/workflow.py |
AWE_ARCH_DASHBOARD_JS_LINES_MAX |
3800 |
Override max lines for web/assets/dashboard.js |
AWE_ARCH_PROMPT_BUILDER_COUNT_MAX |
14 |
Override prompt-builder hotspot threshold |
AWE_ARCH_ADAPTER_RUNTIME_RAISE_MAX |
0 |
Max allowed raw RuntimeError raises in adapter runtime path |
AWE_PROVIDER_ADAPTERS_JSON |
(none) | JSON map for extra providers, e.g. {"qwen":"qwen-cli --yolo"} |
AWE_PROMOTION_GUARD_ENABLED |
true |
Enable promotion guard checks before auto-merge/promote-round |
AWE_PROMOTION_ALLOWED_BRANCHES |
(empty) | Optional comma-separated allowed branches (empty = allow any branch) |
AWE_PROMOTION_REQUIRE_CLEAN |
false |
Require clean git worktree for promotion when guard is enabled |
AWE_SANDBOX_USE_PUBLIC_BASE |
false |
Use shared/public sandbox root only when explicitly set to 1/true |
AWE_API_ALLOW_REMOTE |
false |
Allow non-loopback API access (false keeps local-only default) |
AWE_API_TOKEN |
(none) | Optional bearer token for API protection |
AWE_API_TOKEN_HEADER |
Authorization |
Header name used for API token validation |
AWE_API_RATE_LIMIT_PER_MINUTE |
120 |
Per-client/per-path API quota for /api/* (0 disables quota) |
AWE_DRY_RUN |
false |
When true, participants are not actually invoked |
AWE_SERVICE_NAME |
awe-agentcheck |
Service name for observability |
AWE_OTEL_EXPORTER_OTLP_ENDPOINT |
(none) | OpenTelemetry collector endpoint |
[!NOTE] If
AWE_DATABASE_URLis unset and you start via provided scripts, runtime defaults to local SQLite (.agents/runtime/awe-agentcheck.sqlite3) so history survives restarts. Direct custom startup paths may still choose in-memory fallback.
pwsh -NoProfile -ExecutionPolicy Bypass -File "scripts/start_api.ps1" -ForceRestartbash scripts/start_api.sh --force-restartHealth check:
(Invoke-WebRequest -UseBasicParsing "http://127.0.0.1:8000/healthz").ContentExpected:
{"status":"ok"}Stop API safely:
pwsh -NoProfile -ExecutionPolicy Bypass -File "scripts/stop_api.ps1"bash scripts/stop_api.shOpen your browser and navigate to:
http://localhost:8000/
You'll see the monitor dashboard with:
- Left panel: project file tree + roles/sessions
- Right panel: task controls, conversation stream, and task creation form
If this is your first time, operate in this exact order:
- Confirm API is online (
API: ONLINEat the top right). - Click
Refresh. - In
Dialogue Scope, chooseProjectandTask. - Read
Conversationfirst, then decide start/approve/reject. - Use
Force Failonly when a task is stuck and cannot recover.
| Control | What it means | When to use |
|---|---|---|
Refresh |
Pull latest tasks/stats/tree/events immediately | Any time data looks stale |
Auto Poll: OFF/ON |
Toggle periodic refresh | Turn ON during active runs |
Theme |
Switch visual style (Neon Grid, Terminal Pixel, Executive Glass) |
Personal preference |
API: ONLINE/RETRY(n) |
Backend health indicator | If RETRY, check server logs first |
| Control | What it means | When to use |
|---|---|---|
Expand |
Open all currently loaded folders in the tree | Get full repository context quickly |
Collapse |
Close all folders | Reduce noise when tree is too dense |
Tree node ([D] / [F]) |
Directory or file item for selected project | Verify target repo and key files |
| Control | What it means | When to use |
|---|---|---|
all roles |
Show full mixed conversation stream | Default view for global context |
provider#alias role row |
Filter conversation to a single role/session | Debug one participant's behavior |
| Control | What it means | When to use |
|---|---|---|
Project |
Active project scope | Switch when multiple repos are tracked |
Task |
Active task scope | Move between tasks in selected project |
Force-fail reason |
Reason text sent if force-failing a task | Fill before pressing Force Fail |
Start |
Start selected queued task |
Normal start action |
Approve + Queue |
Approve proposal in waiting_manual, leave task queued |
Approve now, start later |
Approve + Start |
Approve proposal and immediately run | Fast path after proposal review |
Reject |
Reject proposal in waiting_manual and cancel task |
Proposal is risky or low quality |
Cancel |
Cancel current running/queued task | Stop work intentionally |
Force Fail |
Mark task failed_system with your reason |
Last resort for stuck/hung tasks |
Reload Dialogue |
Force re-fetch event stream for selected task | Dialogue appears incomplete |
| Area | What it means | How to read |
|---|---|---|
Actor label (e.g. claude#author-A) |
Who sent the event | Track accountability by role |
Event kind (e.g. discussion, review) |
Workflow stage marker | Detect where failures happen |
| Message body | Raw or summarized event payload | Validate claims before approving |
| Field | Meaning | Recommended beginner value |
|---|---|---|
Title |
Task name shown everywhere | Clear and short |
Workspace path |
Repository root path | Your actual project path |
Author |
Implementing participant | claude#author-A / codex#author-A / gemini#author-A |
Reviewers |
One or more reviewers, comma-separated | At least 1 reviewer |
Claude Model / Codex Model / Gemini Model |
Per-provider model pinning (dropdown + editable) | Start from defaults (claude-opus-4-6, gpt-5.3-codex, gemini-3-pro-preview) |
Claude/Codex/Gemini Model Params |
Optional extra args per provider | For Codex use -c model_reasoning_effort=xhigh |
Policy Template |
Preset execution posture (applies multiple controls at once) | Start with deep-discovery-first; use frontier-evolve for aggressive idea/framework/UI exploration |
Claude Team Agents |
Enable/disable Claude --agents mode |
0 (disabled) |
Evolution Level |
0 fix-only, 1 guided evolve, 2 proactive evolve, 3 frontier/aggressive evolve |
Start with 0 |
Repair Mode |
minimal / balanced / structural |
Start with balanced |
Max Rounds |
self_loop_mode=0: required consensus rounds; self_loop_mode=1: retry cap fallback when no deadline |
1 |
Evolve Until |
Optional deadline (YYYY-MM-DD HH:MM) |
Empty unless running overnight |
Max Rounds + Evolve Until |
Priority rule | If Evolve Until is set, deadline wins; if empty, Max Rounds is used |
Conversation Language |
Prompt language for agent outputs (en / zh) |
English for logs, 中文 for Chinese collaboration |
Plain Mode |
Beginner-friendly readable output (1 on / 0 off) |
Start with 1 |
Stream Mode |
Realtime stream chunks from participant stdout/stderr (1 on / 0 off) |
Start with 1 |
Debate Mode |
Enable reviewer-first debate/precheck stage (1 on / 0 off) |
Start with 1 |
Sandbox Mode |
1 sandbox / 0 main workspace |
Keep 1 for safety |
Sandbox Workspace Path |
Optional custom sandbox path | Leave blank (auto per-task path) |
Self Loop Mode |
0 manual approval / 1 autonomous |
Start with 0 |
Auto Merge |
1 auto-fusion on pass / 0 disable |
Keep 1 initially |
Merge Target Path |
Where pass results are merged | Project root |
Description |
Detailed requirement text | Include acceptance criteria |
UI policy note: when Sandbox Mode = 0, the dashboard forces Auto Merge = 0 and locks that selector.
Policy template quick map:
deep-discovery-first(default): audit-first, broad discovery,evolution_level=2.frontier-evolve: aggressive proactive evolution,evolution_level=3.deep-evolve: deep structural refactor posture,auto_merge=0.safe-review: conservative risk-first/manual bias.rapid-fix: fastest small-patch posture.
| Button | Behavior | Use case |
|---|---|---|
Create |
Create task only (stays queued) | You want to review settings first |
Create + Start |
Create and start immediately | You already trust current settings |
Use this default stack for lowest risk:
Sandbox Mode = 1Self Loop Mode = 0Auto Merge = 1- Reviewer count
>= 1
Then run this rhythm: Create + Start -> wait for waiting_manual -> inspect Conversation -> Approve + Start or Reject.
You can create a task via the Web UI (use the "Create Task" form at the bottom of the dashboard) or via the CLI:
py -m awe_agentcheck.cli run `
--task "Fix the login validation bug" `
--author "codex#author-A" `
--reviewer "claude#review-B" `
--conversation-language en `
--workspace-path "." `
--auto-startThis will:
- Create a task with title "Fix the login validation bug"
- Assign Codex as the author and Claude as the reviewer
- Use default policies (
sandbox_mode=1,self_loop_mode=0,auto_merge=1) - Automatically start the task (
--auto-start) - Since
self_loop_mode=0, the system will run reviewer-first proposal consensus rounds, then pause atwaiting_manualfor your approval
After the system pauses at waiting_manual, review the proposal in the web UI or via CLI, then approve:
# Approve the proposal and immediately start execution
py -m awe_agentcheck.cli decide <task-id> --approve --auto-startOr reject:
# Reject the proposal (task will be canceled)
py -m awe_agentcheck.cli decide <task-id>Important
In manual mode, the task will not proceed to implementation until you explicitly approve. This is by design — it ensures you have full control over what gets implemented.
The CLI communicates with the API server over HTTP. Make sure the server is running before using any CLI command.
py -m awe_agentcheck.cli [--api-base URL] <command> [options]
Global option: --api-base (default: http://127.0.0.1:8000) — the API server URL.
Creates a task and optionally starts it immediately.
py -m awe_agentcheck.cli run `
--task "Task title" `
--description "Detailed description of what to do" `
--author "claude#author-A" `
--reviewer "codex#review-B" `
--reviewer "claude#review-C" `
--conversation-language en `
--sandbox-mode 1 `
--self-loop-mode 0 `
--auto-merge `
--workspace-path "C:/path/to/your/project" `
--max-rounds 3 `
--test-command "py -m pytest -q" `
--lint-command "py -m ruff check ." `
--auto-start| Flag | Required | Default | Description |
|---|---|---|---|
--task |
Yes | — | Task title (shown in UI and logs) |
--description |
No | same as --task |
Detailed description for the AI participants |
--author |
Yes | — | Author participant in provider#alias format |
--reviewer |
Yes | — | Reviewer participant (repeatable for multiple reviewers) |
--sandbox-mode |
No | 1 |
1 = sandbox, 0 = main workspace |
--sandbox-workspace-path |
No | auto-generated | Custom sandbox directory path |
--self-loop-mode |
No | 0 |
0 = manual approval, 1 = autonomous |
--auto-merge / --no-auto-merge |
No | enabled | Enable/disable auto-fusion on pass |
--merge-target-path |
No | project root | Where to merge changes back to |
--workspace-path |
No | . |
Path to the target repository |
--max-rounds |
No | 3 |
Manual mode: required consensus rounds. Autonomous mode: max gate retries when no deadline |
--test-command |
No | py -m pytest -q |
Command to run tests |
--lint-command |
No | py -m ruff check . |
Command to run linter |
--evolution-level |
No | 0 |
0 = fix-only, 1 = guided evolve, 2 = proactive evolve, 3 = frontier/aggressive evolve |
--repair-mode |
No | balanced |
Repair policy (minimal / balanced / structural) |
--evolve-until |
No | — | Deadline for evolution (e.g. 2026-02-13 06:00) |
--conversation-language |
No | en |
Agent output language (en or zh) |
--plain-mode / --no-plain-mode |
No | enabled | Toggle beginner-readable output mode |
--stream-mode / --no-stream-mode |
No | enabled | Toggle realtime stream events |
--debate-mode / --no-debate-mode |
No | enabled | Toggle reviewer-first debate/precheck stage |
--provider-model |
No | — | Per-provider model override in provider=model format (repeatable) |
--provider-model-param |
No | — | Per-provider extra args in provider=args format (repeatable) |
--claude-team-agents |
No | 0 |
1 enables Claude --agents mode for Claude participants |
--auto-start |
No | false |
Start immediately after creation |
Used in manual mode to approve or reject a proposal at waiting_manual state.
# Approve and immediately start
py -m awe_agentcheck.cli decide <task-id> --approve --auto-start
# Approve without auto-start (task goes to queued)
py -m awe_agentcheck.cli decide <task-id> --approve
# Reject (task is canceled)
py -m awe_agentcheck.cli decide <task-id>
# Approve with a note
py -m awe_agentcheck.cli decide <task-id> --approve --note "Looks good, proceed" --auto-startpy -m awe_agentcheck.cli status <task-id>Returns the full task object as JSON, including status, rounds completed, gate reason, etc.
py -m awe_agentcheck.cli tasks --limit 20py -m awe_agentcheck.cli statsReturns pass rates, failure buckets, provider error counts, and average task duration.
py -m awe_agentcheck.cli analytics --limit 300Returns failure taxonomy/trend and reviewer drift metrics for observability analysis.
py -m awe_agentcheck.cli policy-templates --workspace-path "."Returns repo profile and suggested task-control presets by size/risk.
py -m awe_agentcheck.cli benchmark `
--workspace-path "." `
--variant-a-name "baseline" `
--variant-b-name "candidate" `
--reviewer "claude#review-B"Runs the fixed benchmark pack and writes JSON/Markdown reports under .agents/benchmarks/.
py -m awe_agentcheck.cli github-summary <task-id>Returns markdown summary and artifact links suitable for GitHub PR description.
py -m awe_agentcheck.cli start <task-id>
py -m awe_agentcheck.cli start <task-id> --backgroundpy -m awe_agentcheck.cli cancel <task-id>py -m awe_agentcheck.cli force-fail <task-id> --reason "Manual abort: wrong branch"py -m awe_agentcheck.cli promote-round <task-id> --round 2 --merge-target-path "."Use when max_rounds>1 and auto_merge=0. Promotes one selected round snapshot into target path.
py -m awe_agentcheck.cli events <task-id>Returns the full event timeline for a task (discussions, reviews, verifications, gate results, etc.).
py -m awe_agentcheck.cli tree --workspace-path "." --max-depth 4The most conservative approach — sandbox execution with manual approval:
py -m awe_agentcheck.cli run `
--task "Improve error handling in the API layer" `
--author "claude#author-A" `
--reviewer "codex#review-B" `
--reviewer "claude#review-C" `
--workspace-path "." `
--auto-startWhat happens:
- System creates an isolated sandbox workspace (
awe-agentcheck-lab/20260213-...) - Reviewers precheck and challenge the proposal first (reviewer-first stage)
- Author revises proposal, reviewers re-check for consensus
- Task pauses at
waiting_manual— you review in the web UI - You approve → system runs implementation → reviewers review code → tests + lint → gate decision
- If passed: changes auto-merge back to your main workspace with a changelog
For unattended operation (make sure you trust the safety controls):
py -m awe_agentcheck.cli run `
--task "Overnight continuous improvement" `
--author "codex#author-A" `
--reviewer "claude#review-B" `
--sandbox-mode 1 `
--self-loop-mode 1 `
--max-rounds 5 `
--workspace-path "." `
--auto-startWhat happens:
- Codex (author) goes directly into the workflow loop — no manual checkpoint
- Each round: discussion → implementation → review → verify → gate
- If gate passes: done. If fails: retries up to 5 rounds
- Results auto-merge back on pass
When you want to review changes manually before merging:
py -m awe_agentcheck.cli run `
--task "Experimental refactoring" `
--author "claude#author-A" `
--reviewer "codex#review-B" `
--workspace-path "." `
--no-auto-merge `
--auto-startWhat happens:
- Everything runs as normal, but on pass, changes stay in the sandbox
- You can manually review the sandbox directory and merge changes yourself
When you want changes applied directly to your main workspace:
py -m awe_agentcheck.cli run `
--task "Quick fix: typo in README" `
--author "claude#author-A" `
--reviewer "codex#review-B" `
--sandbox-mode 0 `
--self-loop-mode 1 `
--workspace-path "." `
--auto-startWarning
With sandbox_mode=0, changes are made directly in your workspace. Use this only for low-risk tasks or when you have git to revert.
All endpoints are served at http://localhost:8000. Request/response bodies are JSON.
POST /api/tasks
Request body
{
"title": "Fix login validation bug",
"description": "The email validator accepts invalid formats",
"author_participant": "claude#author-A",
"reviewer_participants": ["codex#review-B"],
"conversation_language": "en",
"provider_models": {
"claude": "claude-opus-4-6",
"codex": "gpt-5.3-codex"
},
"provider_model_params": {
"codex": "-c model_reasoning_effort=xhigh"
},
"claude_team_agents": false,
"sandbox_mode": true,
"self_loop_mode": 0,
"auto_merge": true,
"workspace_path": ".",
"max_rounds": 3,
"test_command": "py -m pytest -q",
"lint_command": "py -m ruff check .",
"auto_start": true
}Response (201)
{
"task_id": "task-abc123",
"title": "Fix login validation bug",
"status": "queued",
"sandbox_mode": true,
"self_loop_mode": 0,
"auto_merge": true,
"rounds_completed": 0,
...
}| Method | Endpoint | Description |
|---|---|---|
POST |
/api/tasks |
Create a new task |
GET |
/api/tasks |
List all tasks (?limit=100) |
GET |
/api/tasks/{id} |
Get task details |
POST |
/api/tasks/{id}/start |
Start a task ({"background": true} for async) |
POST |
/api/tasks/{id}/cancel |
Request task cancellation |
POST |
/api/tasks/{id}/force-fail |
Force-fail with {"reason": "..."} |
POST |
/api/tasks/{id}/promote-round |
Promote one selected round into merge target (requires max_rounds>1 and auto_merge=0) |
POST |
/api/tasks/{id}/author-decision |
Approve/reject in manual mode: {"approve": true, "auto_start": true} |
GET |
/api/tasks/{id}/events |
Get full event timeline |
POST |
/api/tasks/{id}/gate |
Submit manual gate result |
GET |
/api/provider-models |
Get provider model catalog for UI dropdowns |
GET |
/api/policy-templates |
Get workspace profile and recommended control presets |
GET |
/api/analytics |
Get failure taxonomy/trends and reviewer drift analytics |
GET |
/api/tasks/{id}/github-summary |
Build GitHub/PR-ready markdown summary |
GET |
/api/project-history |
Project-level history records (core_findings, revisions, disputes, next_steps) |
POST |
/api/project-history/clear |
Clear scoped history records (optionally includes matching live tasks) |
GET |
/api/workspace-tree |
File tree (?workspace_path=.&max_depth=4) |
GET |
/api/stats |
Aggregated statistics (pass rates, durations, failure buckets) |
GET |
/healthz |
Health check |
| Capability | Description | Status |
|---|---|---|
| Sandbox-first execution | Default sandbox_mode=1, runs in *-lab workspace with auto-generated per-task isolation |
GA |
| Author-approval gate | Default self_loop_mode=0, enters waiting_manual after reviewer-first proposal consensus rounds |
GA |
| Autonomous self-loop | self_loop_mode=1 for unattended operation |
GA |
| Auto fusion | On pass: merge + CHANGELOG.auto.md + snapshot |
GA |
| Provider model pinning | Set model per provider (claude / codex / gemini) per task |
GA |
| Claude team-agents mode | Per-task toggle to enable Claude --agents behavior |
GA |
| Multi-provider role model | provider#alias participants (cross-provider or same-provider multi-session) |
GA |
| Web monitor console | Project tree, roles/sessions, avatar-based chat, task controls, drag-and-drop | GA |
| Project history ledger | Cross-task timeline with findings/revisions/disputes/next-steps by project | GA |
| Multi-theme UI | Neon Grid, Terminal Pixel, Executive Glass | GA |
| Observability stack | OpenTelemetry, Prometheus, Loki, Tempo, Grafana | GA |
| Overnight supervisor | Timeout watchdog, provider fallback, cooldown, single-instance lock | GA |
This is the recommended mode for most use cases:
- Create task → status becomes
queued - Start task → system runs proposal-consensus rounds:
- if
debate_mode=1, reviewers precheck first (proposal_precheck_review) - author replies with a revised proposal based on reviewer feedback
- reviewers evaluate proposal quality/alignment (
proposal_review)
- if
- Consensus rule:
- one round is counted only when all required reviewers return pass-level consensus
- same-round retries continue until alignment, but now have a 10-retry stall guard (
proposal_consensus_stalled_in_round) - repeated same-issue consensus across rounds has a 4-round stall guard (
proposal_consensus_stalled_across_rounds) - stall details are surfaced in Project History under
DisputesandNext Steps(not hidden in backend-only logs)
- Wait for human → after required consensus rounds are complete, status becomes
waiting_manual - Author decides:
- Approve → status becomes
queued(withauthor_approvedreason), then immediately re-starts into the full workflow - Reject → status becomes
canceled
- Approve → status becomes
- Full workflow runs: reviewer-first debate (optional) → author discussion → author implementation → reviewer review → verify (test + lint) → gate
- Gate result:
- Pass →
passed→ Auto Fusion (merge + changelog + snapshot + sandbox cleanup) - Fail → retry next round; limit by
Evolve Untilwhen set, otherwise bymax_rounds, thenfailed_gate
- Pass →
For unattended operation:
- Create task →
queued - Start task → immediately enters the full workflow (no manual checkpoint)
- Round 1..N: reviewer-first debate (optional) → author discussion → author implementation → reviewer review → verify → gate
- Gate result:
- Pass →
passed→ Auto Fusion - Fail → retry until deadline (
Evolve Until) ormax_rounds(when no deadline), thenfailed_gate
- Pass →
When a task passes and auto_merge=1:
- Changed files are copied from sandbox to your main workspace
CHANGELOG.auto.mdis appended with a summary- A snapshot is saved to
.agents/snapshots/ - The auto-generated sandbox is cleaned up (if system-generated)
- An
auto_merge_summary.jsonartifact is written
Sandbox lifecycle details
- Without explicit
sandbox_workspace_path, the system creates a unique per-task sandbox:<project>-lab/<timestamp>-<id>/ - The sandbox is a filtered copy of your project (excludes
.git,.venv,node_modules,__pycache__, etc.) - When task passes and auto-fusion completes, system-generated sandboxes are auto-cleaned
- If you specified a custom
sandbox_workspace_path, it is retained by default
- Sandbox-first default policy
- Author-approval gate
- Auto-fusion + changelog + snapshot
- Role/session monitor with multi-theme UI
- Richer GitHub/PR integration (change summary linking to task artifacts)
- Policy templates by repo size/risk profile
- Pluggable participant adapters beyond built-in Claude/Codex/Gemini
- Branch-aware auto promotion pipeline (sandbox -> main with policy guard)
- Advanced visual analytics (failure taxonomy trends, reviewer drift signals)
| Document | Description |
|---|---|
README.zh-CN.md |
Chinese documentation |
docs/RUNBOOK.md |
Operations guide & commands |
docs/ARCHITECTURE_FLOW.md |
System architecture deep dive |
docs/API_EXPOSURE_AUDIT.md |
Localhost/public API exposure audit and guardrails |
docs/TESTING_TARGET_POLICY.md |
Testing approach & policy |
docs/GITHUB_ABOUT.md |
Suggested GitHub About/description copy (EN/CN) |
docs/SESSION_HANDOFF.md |
Session handoff notes |
# Lint
py -m ruff check .
# Test
py -m pytest -qContributions are welcome! Please ensure:
- Code passes
ruff check .with no warnings - All tests pass with
pytest -q - New features include appropriate test coverage
MIT
Built for teams that demand structured, observable, and safe multi-model code review workflows.