Skip to content

cloveric/awe-agentforge

Repository files navigation

AWE-AgentForge

GitHub  GitHub Repo stars  Version  Python  FastAPI  CI  License

Multi-Agent Collaboration  Bugfix and Review Loops  Self Evolution  Policy Guardrails  Ruff


Reviewer-first control tower for vibe coders: orchestrate Codex, Claude, Gemini, and more in one evidence-gated loop.
Run memory-aware, multi-agent consensus workflows to find real bugs, ship safer fixes, and continuously evolve your codebase with full observability.

Brand mode (low-risk rename): display name = AWE-AgentForge, runtime/package IDs stay awe-agentcheck / awe_agentcheck.

🇨🇳 中文文档  ·   Runbook  ·   Architecture  ·   Contributing  ·   Security  ·   Dashboard Guide  ·   Stars  ·   Quick Start




Latest Update (Daily Summary)

Date Daily Summary
2026-02-22 Added anti-drift hardening for autonomous loops: auto-merge scope guard now blocks meta-only policy/doc gate relaxations in discovery mode, structural mode now requires touching architecture-violation scope, and prompt guidance now explicitly forbids “policy bypass instead of code fix.”
2026-02-21 Landed integrated memory/runtime controls and then hardened proposal-review contracts end-to-end: reviewer issue IDs, author structured issue responses, reviewer issue-check closure gate, new observability events/artifacts, and full verification pass (ruff/mypy/pytest/bandit/pytest-cov).
2026-02-20 Adapter strategy/factory split, service-layer package split, prompt/template + LangGraph round-flow upgrades, dashboard modularization, and CI/governance/security baseline hardening.
2026-02-19 Reviewer-first/manual-consensus stabilization, preflight/precompletion/resume guardrails, benchmark + analytics loops, and project history/PR summary integrations.

Detailed timeline is maintained in CHANGELOG.auto.md.

Why AWE-AgentForge?

Multi-Agent Collaboration

Run cross-agent workflows where one model authors, others review, and sessions challenge each other until the result is defensible.

Bug Resolution Engine

Turn vague failures into structured rounds: reproduce, patch, review, verify, and gate. Built for real bug-fixing throughput, not demo chats.

Continuous Self-Evolution

Run guided or proactive evolution loops so agents can propose, test, and refine improvements beyond the immediate bug ticket.

Human + Policy Control

Manual author approval, medium-gate decisions, and force-fail controls keep operators in charge when risk is high.

Live Operations Console

Monitor project tree, role sessions, and conversation flow in real time, then execute task controls from a single surface.

Reliability + Observability

Use watchdog timeouts, provider fallback, cooldowns, metrics, logs, and traces to keep long-running automation measurable and stable.


Architecture

system architecture


Visual Overview

Monitor Dashboard (Terminal Pixel Theme)

terminal pixel dashboard preview with multi-role sessions

Preview focus:

  1. Terminal pixel visual style.
  2. High-density multi-role session panel (not only 2-3 roles).
  3. Conversation-centric layout with operational controls visible.

Runtime Flow (Clean Lanes, No Arrow Crossing Through Bubbles)

workflow flow


Project Pulse (Stars)

GitHub Repo stars  GitHub forks

Star History Chart


Core Concepts

Before diving into usage, here are the key concepts:

Participants

Every task has one author (who writes the code) and one or more reviewers (who evaluate it). Participants are identified using the provider#alias format:

Format Meaning
claude#author-A Claude CLI acting as author, alias "author-A"
codex#review-B Codex CLI acting as reviewer, alias "review-B"
gemini#review-C Gemini CLI acting as second reviewer, alias "review-C"

The provider determines which CLI tool is invoked (claude, codex, or gemini). The alias is a human-readable label for identification in the web console and logs.

Task Lifecycle

Every task follows this lifecycle:

queued → running → passed / failed_gate / failed_system / canceled

In manual mode (self_loop_mode=0), an extra state is inserted:

queued → running → waiting_manual → (approve) → queued → running → passed/failed
                                  → (reject)  → canceled

running in manual mode is now a proposal-consensus stage (reviewer-first when debate_mode=1) before pausing at waiting_manual.

Three Controls

Control Values Default What It Does
sandbox_mode 0 / 1 1 1 = run in an isolated *-lab copy of the workspace; 0 = run directly in main workspace
self_loop_mode 0 / 1 0 0 = run proposal consensus rounds, then pause for approval; 1 = run autonomous implementation/review loops
auto_merge 0 / 1 1 1 = on pass, auto-merge changes back + generate changelog; 0 = keep results in sandbox only

Tip

Recommended defaults for safety: sandbox_mode=1 + self_loop_mode=0 + auto_merge=1 — sandbox execution with human sign-off and automatic artifact fusion on pass.

Runtime Controls (New)

Control Values Default What It Does
memory_mode off / basic / strict basic Controls memory recall/persistence aggressiveness for proposal/discussion/implementation/review prompts
phase_timeout_seconds JSON map {} Optional per-phase timeout override for proposal, discussion, implementation, review, command

CLI mirrors these controls via --memory-mode and repeatable --phase-timeout phase=seconds.


Quick Start

Prerequisites

  • Python 3.10+
  • Claude CLI installed and authenticated (for Claude participants)
  • Codex CLI installed and authenticated (for Codex participants)
  • Gemini CLI installed and authenticated (for Gemini participants)
  • PostgreSQL (optional — falls back to in-memory database if unavailable)

Step 1: Install

git clone https://github.com/cloveric/awe-agentforge.git
cd awe-agentforge
pip install -e .[dev]
# Optional: copy baseline env and adjust
cp .env.example .env

Step 2: Configure Environment

The system needs to know where your tools are and how to connect. Set the following environment variables:

# Required: tell Python where the source is
$env:PYTHONPATH="src"

# Optional: database connection (omit for in-memory mode)
$env:AWE_DATABASE_URL="postgresql+psycopg://postgres:postgres@localhost:5432/awe_agentcheck?connect_timeout=2"

# Optional: where task artifacts (logs, reports, events) are stored
$env:AWE_ARTIFACT_ROOT=".agents"

# Optional: workflow orchestrator backend (langgraph/classic)
$env:AWE_WORKFLOW_BACKEND="langgraph"

Or use .env.example as a starter and export the values in your shell.

All environment variables reference
Variable Default Description
PYTHONPATH (none) Must include src/ directory
AWE_DATABASE_URL postgresql+psycopg://...?...connect_timeout=2 PostgreSQL connection string. If DB is unavailable, fallback is faster and then switches to in-memory
AWE_ARTIFACT_ROOT .agents Directory for task artifacts (threads, events, reports)
AWE_CLAUDE_COMMAND claude -p --dangerously-skip-permissions --effort low --model claude-opus-4-6 Command template for invoking Claude CLI
AWE_CODEX_COMMAND codex exec --skip-git-repo-check ... -c model_reasoning_effort=xhigh Command template for invoking Codex CLI
AWE_GEMINI_COMMAND gemini --yolo Command template for invoking Gemini CLI
AWE_PARTICIPANT_TIMEOUT_SECONDS 3600 Max seconds a single participant (Claude/Codex/Gemini) can run per step
AWE_COMMAND_TIMEOUT_SECONDS 300 Max seconds for test/lint commands
AWE_PARTICIPANT_TIMEOUT_RETRIES 1 Retry count when a participant times out
AWE_MAX_CONCURRENT_RUNNING_TASKS 1 How many tasks can run simultaneously
AWE_WORKFLOW_BACKEND langgraph Workflow backend (langgraph preferred, classic fallback)
AWE_ARCH_AUDIT_MODE (auto by evolution level) Architecture audit enforcement mode: off, warn, hard
AWE_ARCH_PYTHON_FILE_LINES_MAX 1200 Override max lines for a Python file in architecture audit
AWE_ARCH_FRONTEND_FILE_LINES_MAX 2500 Override max lines for frontend files in architecture audit
AWE_ARCH_RESPONSIBILITY_KEYWORDS_MAX 10 Override mixed-responsibility keyword threshold for large Python files
AWE_ARCH_SERVICE_FILE_LINES_MAX 4500 Override max lines for src/awe_agentcheck/service.py
AWE_ARCH_WORKFLOW_FILE_LINES_MAX 2600 Override max lines for src/awe_agentcheck/workflow.py
AWE_ARCH_DASHBOARD_JS_LINES_MAX 3800 Override max lines for web/assets/dashboard.js
AWE_ARCH_PROMPT_BUILDER_COUNT_MAX 14 Override prompt-builder hotspot threshold
AWE_ARCH_ADAPTER_RUNTIME_RAISE_MAX 0 Max allowed raw RuntimeError raises in adapter runtime path
AWE_PROVIDER_ADAPTERS_JSON (none) JSON map for extra providers, e.g. {"qwen":"qwen-cli --yolo"}
AWE_PROMOTION_GUARD_ENABLED true Enable promotion guard checks before auto-merge/promote-round
AWE_PROMOTION_ALLOWED_BRANCHES (empty) Optional comma-separated allowed branches (empty = allow any branch)
AWE_PROMOTION_REQUIRE_CLEAN false Require clean git worktree for promotion when guard is enabled
AWE_SANDBOX_USE_PUBLIC_BASE false Use shared/public sandbox root only when explicitly set to 1/true
AWE_API_ALLOW_REMOTE false Allow non-loopback API access (false keeps local-only default)
AWE_API_TOKEN (none) Optional bearer token for API protection
AWE_API_TOKEN_HEADER Authorization Header name used for API token validation
AWE_API_RATE_LIMIT_PER_MINUTE 120 Per-client/per-path API quota for /api/* (0 disables quota)
AWE_DRY_RUN false When true, participants are not actually invoked
AWE_SERVICE_NAME awe-agentcheck Service name for observability
AWE_OTEL_EXPORTER_OTLP_ENDPOINT (none) OpenTelemetry collector endpoint

[!NOTE] If AWE_DATABASE_URL is unset and you start via provided scripts, runtime defaults to local SQLite (.agents/runtime/awe-agentcheck.sqlite3) so history survives restarts. Direct custom startup paths may still choose in-memory fallback.

Step 3: Start the API Server

pwsh -NoProfile -ExecutionPolicy Bypass -File "scripts/start_api.ps1" -ForceRestart
bash scripts/start_api.sh --force-restart

Health check:

(Invoke-WebRequest -UseBasicParsing "http://127.0.0.1:8000/healthz").Content

Expected:

{"status":"ok"}

Stop API safely:

pwsh -NoProfile -ExecutionPolicy Bypass -File "scripts/stop_api.ps1"
bash scripts/stop_api.sh

Step 4: Open the Web Monitor

Open your browser and navigate to:

http://localhost:8000/

You'll see the monitor dashboard with:

  • Left panel: project file tree + roles/sessions
  • Right panel: task controls, conversation stream, and task creation form

Beginner Dashboard Guide (Button-by-Button)

If this is your first time, operate in this exact order:

  1. Confirm API is online (API: ONLINE at the top right).
  2. Click Refresh.
  3. In Dialogue Scope, choose Project and Task.
  4. Read Conversation first, then decide start/approve/reject.
  5. Use Force Fail only when a task is stuck and cannot recover.

Top Bar

Control What it means When to use
Refresh Pull latest tasks/stats/tree/events immediately Any time data looks stale
Auto Poll: OFF/ON Toggle periodic refresh Turn ON during active runs
Theme Switch visual style (Neon Grid, Terminal Pixel, Executive Glass) Personal preference
API: ONLINE/RETRY(n) Backend health indicator If RETRY, check server logs first

Left Panel: Project Structure

Control What it means When to use
Expand Open all currently loaded folders in the tree Get full repository context quickly
Collapse Close all folders Reduce noise when tree is too dense
Tree node ([D] / [F]) Directory or file item for selected project Verify target repo and key files

Left Panel: Roles / Sessions

Control What it means When to use
all roles Show full mixed conversation stream Default view for global context
provider#alias role row Filter conversation to a single role/session Debug one participant's behavior

Right Panel: Dialogue Scope + Task Controls

Control What it means When to use
Project Active project scope Switch when multiple repos are tracked
Task Active task scope Move between tasks in selected project
Force-fail reason Reason text sent if force-failing a task Fill before pressing Force Fail
Start Start selected queued task Normal start action
Approve + Queue Approve proposal in waiting_manual, leave task queued Approve now, start later
Approve + Start Approve proposal and immediately run Fast path after proposal review
Reject Reject proposal in waiting_manual and cancel task Proposal is risky or low quality
Cancel Cancel current running/queued task Stop work intentionally
Force Fail Mark task failed_system with your reason Last resort for stuck/hung tasks
Reload Dialogue Force re-fetch event stream for selected task Dialogue appears incomplete

Conversation Panel

Area What it means How to read
Actor label (e.g. claude#author-A) Who sent the event Track accountability by role
Event kind (e.g. discussion, review) Workflow stage marker Detect where failures happen
Message body Raw or summarized event payload Validate claims before approving

Create Task Form (Every Input)

Field Meaning Recommended beginner value
Title Task name shown everywhere Clear and short
Workspace path Repository root path Your actual project path
Author Implementing participant claude#author-A / codex#author-A / gemini#author-A
Reviewers One or more reviewers, comma-separated At least 1 reviewer
Claude Model / Codex Model / Gemini Model Per-provider model pinning (dropdown + editable) Start from defaults (claude-opus-4-6, gpt-5.3-codex, gemini-3-pro-preview)
Claude/Codex/Gemini Model Params Optional extra args per provider For Codex use -c model_reasoning_effort=xhigh
Policy Template Preset execution posture (applies multiple controls at once) Start with deep-discovery-first; use frontier-evolve for aggressive idea/framework/UI exploration
Claude Team Agents Enable/disable Claude --agents mode 0 (disabled)
Evolution Level 0 fix-only, 1 guided evolve, 2 proactive evolve, 3 frontier/aggressive evolve Start with 0
Repair Mode minimal / balanced / structural Start with balanced
Max Rounds self_loop_mode=0: required consensus rounds; self_loop_mode=1: retry cap fallback when no deadline 1
Evolve Until Optional deadline (YYYY-MM-DD HH:MM) Empty unless running overnight
Max Rounds + Evolve Until Priority rule If Evolve Until is set, deadline wins; if empty, Max Rounds is used
Conversation Language Prompt language for agent outputs (en / zh) English for logs, 中文 for Chinese collaboration
Plain Mode Beginner-friendly readable output (1 on / 0 off) Start with 1
Stream Mode Realtime stream chunks from participant stdout/stderr (1 on / 0 off) Start with 1
Debate Mode Enable reviewer-first debate/precheck stage (1 on / 0 off) Start with 1
Sandbox Mode 1 sandbox / 0 main workspace Keep 1 for safety
Sandbox Workspace Path Optional custom sandbox path Leave blank (auto per-task path)
Self Loop Mode 0 manual approval / 1 autonomous Start with 0
Auto Merge 1 auto-fusion on pass / 0 disable Keep 1 initially
Merge Target Path Where pass results are merged Project root
Description Detailed requirement text Include acceptance criteria

UI policy note: when Sandbox Mode = 0, the dashboard forces Auto Merge = 0 and locks that selector.

Policy template quick map:

  • deep-discovery-first (default): audit-first, broad discovery, evolution_level=2.
  • frontier-evolve: aggressive proactive evolution, evolution_level=3.
  • deep-evolve: deep structural refactor posture, auto_merge=0.
  • safe-review: conservative risk-first/manual bias.
  • rapid-fix: fastest small-patch posture.

Create Buttons

Button Behavior Use case
Create Create task only (stays queued) You want to review settings first
Create + Start Create and start immediately You already trust current settings

Safe Beginner Preset

Use this default stack for lowest risk:

  • Sandbox Mode = 1
  • Self Loop Mode = 0
  • Auto Merge = 1
  • Reviewer count >= 1

Then run this rhythm: Create + Start -> wait for waiting_manual -> inspect Conversation -> Approve + Start or Reject.


Step 5: Create Your First Task

You can create a task via the Web UI (use the "Create Task" form at the bottom of the dashboard) or via the CLI:

py -m awe_agentcheck.cli run `
  --task "Fix the login validation bug" `
  --author "codex#author-A" `
  --reviewer "claude#review-B" `
  --conversation-language en `
  --workspace-path "." `
  --auto-start

This will:

  1. Create a task with title "Fix the login validation bug"
  2. Assign Codex as the author and Claude as the reviewer
  3. Use default policies (sandbox_mode=1, self_loop_mode=0, auto_merge=1)
  4. Automatically start the task (--auto-start)
  5. Since self_loop_mode=0, the system will run reviewer-first proposal consensus rounds, then pause at waiting_manual for your approval

Step 6: Approve and Execute (Manual Mode)

After the system pauses at waiting_manual, review the proposal in the web UI or via CLI, then approve:

# Approve the proposal and immediately start execution
py -m awe_agentcheck.cli decide <task-id> --approve --auto-start

Or reject:

# Reject the proposal (task will be canceled)
py -m awe_agentcheck.cli decide <task-id>

Important

In manual mode, the task will not proceed to implementation until you explicitly approve. This is by design — it ensures you have full control over what gets implemented.


CLI Reference

The CLI communicates with the API server over HTTP. Make sure the server is running before using any CLI command.

py -m awe_agentcheck.cli [--api-base URL] <command> [options]

Global option: --api-base (default: http://127.0.0.1:8000) — the API server URL.

run — Create a New Task

Creates a task and optionally starts it immediately.

py -m awe_agentcheck.cli run `
  --task "Task title" `
  --description "Detailed description of what to do" `
  --author "claude#author-A" `
  --reviewer "codex#review-B" `
  --reviewer "claude#review-C" `
  --conversation-language en `
  --sandbox-mode 1 `
  --self-loop-mode 0 `
  --auto-merge `
  --workspace-path "C:/path/to/your/project" `
  --max-rounds 3 `
  --test-command "py -m pytest -q" `
  --lint-command "py -m ruff check ." `
  --auto-start
Flag Required Default Description
--task Yes Task title (shown in UI and logs)
--description No same as --task Detailed description for the AI participants
--author Yes Author participant in provider#alias format
--reviewer Yes Reviewer participant (repeatable for multiple reviewers)
--sandbox-mode No 1 1 = sandbox, 0 = main workspace
--sandbox-workspace-path No auto-generated Custom sandbox directory path
--self-loop-mode No 0 0 = manual approval, 1 = autonomous
--auto-merge / --no-auto-merge No enabled Enable/disable auto-fusion on pass
--merge-target-path No project root Where to merge changes back to
--workspace-path No . Path to the target repository
--max-rounds No 3 Manual mode: required consensus rounds. Autonomous mode: max gate retries when no deadline
--test-command No py -m pytest -q Command to run tests
--lint-command No py -m ruff check . Command to run linter
--evolution-level No 0 0 = fix-only, 1 = guided evolve, 2 = proactive evolve, 3 = frontier/aggressive evolve
--repair-mode No balanced Repair policy (minimal / balanced / structural)
--evolve-until No Deadline for evolution (e.g. 2026-02-13 06:00)
--conversation-language No en Agent output language (en or zh)
--plain-mode / --no-plain-mode No enabled Toggle beginner-readable output mode
--stream-mode / --no-stream-mode No enabled Toggle realtime stream events
--debate-mode / --no-debate-mode No enabled Toggle reviewer-first debate/precheck stage
--provider-model No Per-provider model override in provider=model format (repeatable)
--provider-model-param No Per-provider extra args in provider=args format (repeatable)
--claude-team-agents No 0 1 enables Claude --agents mode for Claude participants
--auto-start No false Start immediately after creation

decide — Submit Author Decision

Used in manual mode to approve or reject a proposal at waiting_manual state.

# Approve and immediately start
py -m awe_agentcheck.cli decide <task-id> --approve --auto-start

# Approve without auto-start (task goes to queued)
py -m awe_agentcheck.cli decide <task-id> --approve

# Reject (task is canceled)
py -m awe_agentcheck.cli decide <task-id>

# Approve with a note
py -m awe_agentcheck.cli decide <task-id> --approve --note "Looks good, proceed" --auto-start

status — Get Task Details

py -m awe_agentcheck.cli status <task-id>

Returns the full task object as JSON, including status, rounds completed, gate reason, etc.

tasks — List All Tasks

py -m awe_agentcheck.cli tasks --limit 20

stats — Show Aggregated Statistics

py -m awe_agentcheck.cli stats

Returns pass rates, failure buckets, provider error counts, and average task duration.

analytics — Show Advanced Analytics

py -m awe_agentcheck.cli analytics --limit 300

Returns failure taxonomy/trend and reviewer drift metrics for observability analysis.

policy-templates — Get Recommended Policy Presets

py -m awe_agentcheck.cli policy-templates --workspace-path "."

Returns repo profile and suggested task-control presets by size/risk.

benchmark — Run Fixed A/B Benchmark Harness

py -m awe_agentcheck.cli benchmark `
  --workspace-path "." `
  --variant-a-name "baseline" `
  --variant-b-name "candidate" `
  --reviewer "claude#review-B"

Runs the fixed benchmark pack and writes JSON/Markdown reports under .agents/benchmarks/.

github-summary — Generate PR-Ready Summary

py -m awe_agentcheck.cli github-summary <task-id>

Returns markdown summary and artifact links suitable for GitHub PR description.

start — Start an Existing Task

py -m awe_agentcheck.cli start <task-id>
py -m awe_agentcheck.cli start <task-id> --background

cancel — Cancel a Task

py -m awe_agentcheck.cli cancel <task-id>

force-fail — Force-Fail a Task

py -m awe_agentcheck.cli force-fail <task-id> --reason "Manual abort: wrong branch"

promote-round — Promote One Round Snapshot (Manual Multi-Round Mode)

py -m awe_agentcheck.cli promote-round <task-id> --round 2 --merge-target-path "."

Use when max_rounds>1 and auto_merge=0. Promotes one selected round snapshot into target path.

events — List Task Events

py -m awe_agentcheck.cli events <task-id>

Returns the full event timeline for a task (discussions, reviews, verifications, gate results, etc.).

tree — Show Workspace File Tree

py -m awe_agentcheck.cli tree --workspace-path "." --max-depth 4

Usage Examples

Example 1: Safe Manual Review (Recommended for First Use)

The most conservative approach — sandbox execution with manual approval:

py -m awe_agentcheck.cli run `
  --task "Improve error handling in the API layer" `
  --author "claude#author-A" `
  --reviewer "codex#review-B" `
  --reviewer "claude#review-C" `
  --workspace-path "." `
  --auto-start

What happens:

  1. System creates an isolated sandbox workspace (awe-agentcheck-lab/20260213-...)
  2. Reviewers precheck and challenge the proposal first (reviewer-first stage)
  3. Author revises proposal, reviewers re-check for consensus
  4. Task pauses at waiting_manual — you review in the web UI
  5. You approve → system runs implementation → reviewers review code → tests + lint → gate decision
  6. If passed: changes auto-merge back to your main workspace with a changelog

Example 2: Fully Autonomous Overnight Run

For unattended operation (make sure you trust the safety controls):

py -m awe_agentcheck.cli run `
  --task "Overnight continuous improvement" `
  --author "codex#author-A" `
  --reviewer "claude#review-B" `
  --sandbox-mode 1 `
  --self-loop-mode 1 `
  --max-rounds 5 `
  --workspace-path "." `
  --auto-start

What happens:

  1. Codex (author) goes directly into the workflow loop — no manual checkpoint
  2. Each round: discussion → implementation → review → verify → gate
  3. If gate passes: done. If fails: retries up to 5 rounds
  4. Results auto-merge back on pass

Example 3: No Auto-Merge (Keep Results in Sandbox)

When you want to review changes manually before merging:

py -m awe_agentcheck.cli run `
  --task "Experimental refactoring" `
  --author "claude#author-A" `
  --reviewer "codex#review-B" `
  --workspace-path "." `
  --no-auto-merge `
  --auto-start

What happens:

  1. Everything runs as normal, but on pass, changes stay in the sandbox
  2. You can manually review the sandbox directory and merge changes yourself

Example 4: Direct Main Workspace (No Sandbox)

When you want changes applied directly to your main workspace:

py -m awe_agentcheck.cli run `
  --task "Quick fix: typo in README" `
  --author "claude#author-A" `
  --reviewer "codex#review-B" `
  --sandbox-mode 0 `
  --self-loop-mode 1 `
  --workspace-path "." `
  --auto-start

Warning

With sandbox_mode=0, changes are made directly in your workspace. Use this only for low-risk tasks or when you have git to revert.


API Reference

All endpoints are served at http://localhost:8000. Request/response bodies are JSON.

Create Task

POST /api/tasks
Request body
{
  "title": "Fix login validation bug",
  "description": "The email validator accepts invalid formats",
  "author_participant": "claude#author-A",
  "reviewer_participants": ["codex#review-B"],
  "conversation_language": "en",
  "provider_models": {
    "claude": "claude-opus-4-6",
    "codex": "gpt-5.3-codex"
  },
  "provider_model_params": {
    "codex": "-c model_reasoning_effort=xhigh"
  },
  "claude_team_agents": false,
  "sandbox_mode": true,
  "self_loop_mode": 0,
  "auto_merge": true,
  "workspace_path": ".",
  "max_rounds": 3,
  "test_command": "py -m pytest -q",
  "lint_command": "py -m ruff check .",
  "auto_start": true
}
Response (201)
{
  "task_id": "task-abc123",
  "title": "Fix login validation bug",
  "status": "queued",
  "sandbox_mode": true,
  "self_loop_mode": 0,
  "auto_merge": true,
  "rounds_completed": 0,
  ...
}

All Endpoints

Method Endpoint Description
POST /api/tasks Create a new task
GET /api/tasks List all tasks (?limit=100)
GET /api/tasks/{id} Get task details
POST /api/tasks/{id}/start Start a task ({"background": true} for async)
POST /api/tasks/{id}/cancel Request task cancellation
POST /api/tasks/{id}/force-fail Force-fail with {"reason": "..."}
POST /api/tasks/{id}/promote-round Promote one selected round into merge target (requires max_rounds>1 and auto_merge=0)
POST /api/tasks/{id}/author-decision Approve/reject in manual mode: {"approve": true, "auto_start": true}
GET /api/tasks/{id}/events Get full event timeline
POST /api/tasks/{id}/gate Submit manual gate result
GET /api/provider-models Get provider model catalog for UI dropdowns
GET /api/policy-templates Get workspace profile and recommended control presets
GET /api/analytics Get failure taxonomy/trends and reviewer drift analytics
GET /api/tasks/{id}/github-summary Build GitHub/PR-ready markdown summary
GET /api/project-history Project-level history records (core_findings, revisions, disputes, next_steps)
POST /api/project-history/clear Clear scoped history records (optionally includes matching live tasks)
GET /api/workspace-tree File tree (?workspace_path=.&max_depth=4)
GET /api/stats Aggregated statistics (pass rates, durations, failure buckets)
GET /healthz Health check

Feature Matrix

Capability Description Status
Sandbox-first execution Default sandbox_mode=1, runs in *-lab workspace with auto-generated per-task isolation GA
Author-approval gate Default self_loop_mode=0, enters waiting_manual after reviewer-first proposal consensus rounds GA
Autonomous self-loop self_loop_mode=1 for unattended operation GA
Auto fusion On pass: merge + CHANGELOG.auto.md + snapshot GA
Provider model pinning Set model per provider (claude / codex / gemini) per task GA
Claude team-agents mode Per-task toggle to enable Claude --agents behavior GA
Multi-provider role model provider#alias participants (cross-provider or same-provider multi-session) GA
Web monitor console Project tree, roles/sessions, avatar-based chat, task controls, drag-and-drop GA
Project history ledger Cross-task timeline with findings/revisions/disputes/next-steps by project GA
Multi-theme UI Neon Grid, Terminal Pixel, Executive Glass GA
Observability stack OpenTelemetry, Prometheus, Loki, Tempo, Grafana GA
Overnight supervisor Timeout watchdog, provider fallback, cooldown, single-instance lock GA

How the Workflow Works

Manual Mode (self_loop_mode=0 — Default)

This is the recommended mode for most use cases:

  1. Create task → status becomes queued
  2. Start task → system runs proposal-consensus rounds:
    • if debate_mode=1, reviewers precheck first (proposal_precheck_review)
    • author replies with a revised proposal based on reviewer feedback
    • reviewers evaluate proposal quality/alignment (proposal_review)
  3. Consensus rule:
    • one round is counted only when all required reviewers return pass-level consensus
    • same-round retries continue until alignment, but now have a 10-retry stall guard (proposal_consensus_stalled_in_round)
    • repeated same-issue consensus across rounds has a 4-round stall guard (proposal_consensus_stalled_across_rounds)
    • stall details are surfaced in Project History under Disputes and Next Steps (not hidden in backend-only logs)
  4. Wait for human → after required consensus rounds are complete, status becomes waiting_manual
  5. Author decides:
    • Approve → status becomes queued (with author_approved reason), then immediately re-starts into the full workflow
    • Reject → status becomes canceled
  6. Full workflow runs: reviewer-first debate (optional) → author discussion → author implementation → reviewer review → verify (test + lint) → gate
  7. Gate result:
    • Passpassed → Auto Fusion (merge + changelog + snapshot + sandbox cleanup)
    • Fail → retry next round; limit by Evolve Until when set, otherwise by max_rounds, then failed_gate

Autonomous Mode (self_loop_mode=1)

For unattended operation:

  1. Create taskqueued
  2. Start task → immediately enters the full workflow (no manual checkpoint)
  3. Round 1..N: reviewer-first debate (optional) → author discussion → author implementation → reviewer review → verify → gate
  4. Gate result:
    • Passpassed → Auto Fusion
    • Fail → retry until deadline (Evolve Until) or max_rounds (when no deadline), then failed_gate

Auto-Fusion Details

When a task passes and auto_merge=1:

  1. Changed files are copied from sandbox to your main workspace
  2. CHANGELOG.auto.md is appended with a summary
  3. A snapshot is saved to .agents/snapshots/
  4. The auto-generated sandbox is cleaned up (if system-generated)
  5. An auto_merge_summary.json artifact is written
Sandbox lifecycle details
  1. Without explicit sandbox_workspace_path, the system creates a unique per-task sandbox: <project>-lab/<timestamp>-<id>/
  2. The sandbox is a filtered copy of your project (excludes .git, .venv, node_modules, __pycache__, etc.)
  3. When task passes and auto-fusion completes, system-generated sandboxes are auto-cleaned
  4. If you specified a custom sandbox_workspace_path, it is retained by default

Roadmap

2026 Q1   complete

  • Sandbox-first default policy
  • Author-approval gate
  • Auto-fusion + changelog + snapshot
  • Role/session monitor with multi-theme UI

2026 Q2   complete

  • Richer GitHub/PR integration (change summary linking to task artifacts)
  • Policy templates by repo size/risk profile
  • Pluggable participant adapters beyond built-in Claude/Codex/Gemini

2026 Q3   complete

  • Branch-aware auto promotion pipeline (sandbox -> main with policy guard)
  • Advanced visual analytics (failure taxonomy trends, reviewer drift signals)

Documentation

Document Description
README.zh-CN.md Chinese documentation
docs/RUNBOOK.md Operations guide & commands
docs/ARCHITECTURE_FLOW.md System architecture deep dive
docs/API_EXPOSURE_AUDIT.md Localhost/public API exposure audit and guardrails
docs/TESTING_TARGET_POLICY.md Testing approach & policy
docs/GITHUB_ABOUT.md Suggested GitHub About/description copy (EN/CN)
docs/SESSION_HANDOFF.md Session handoff notes

Development

# Lint
py -m ruff check .

# Test
py -m pytest -q

Contributing

Contributions are welcome! Please ensure:

  1. Code passes ruff check . with no warnings
  2. All tests pass with pytest -q
  3. New features include appropriate test coverage

License

MIT



Built for teams that demand structured, observable, and safe multi-model code review workflows.

About

Reviewer-first multi-CLI control tower for vibe coding: evidence-gated, memory-aware consensus loops across Codex, Claude, and Gemini.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors