GitHub - cloveric/awe-agentforge: Reviewer-first multi-CLI control tower for vibe coding: evidence-gated, memory-aware consensus loops across Codex, Claude, and Gemini.

Reviewer-first control tower for vibe coders: orchestrate Codex, Claude, Gemini, and more in one evidence-gated loop.
_{Run memory-aware, multi-agent consensus workflows to find real bugs, ship safer fixes, and continuously evolve your codebase with full observability.}

_{Brand mode (low-risk rename): display name = AWE-AgentForge, runtime/package IDs stay awe-agentcheck / awe_agentcheck.}

🇨🇳 中文文档 · Runbook · Architecture · Contributing · Security · Dashboard Guide · Stars · Quick Start

Latest Update (Daily Summary)

Date	Daily Summary
2026-02-22	Added anti-drift hardening for autonomous loops: auto-merge scope guard now blocks meta-only policy/doc gate relaxations in discovery mode, structural mode now requires touching architecture-violation scope, and prompt guidance now explicitly forbids “policy bypass instead of code fix.”
2026-02-21	Landed integrated memory/runtime controls and then hardened proposal-review contracts end-to-end: reviewer issue IDs, author structured issue responses, reviewer issue-check closure gate, new observability events/artifacts, and full verification pass (ruff/mypy/pytest/bandit/pytest-cov).
2026-02-20	Adapter strategy/factory split, service-layer package split, prompt/template + LangGraph round-flow upgrades, dashboard modularization, and CI/governance/security baseline hardening.
2026-02-19	Reviewer-first/manual-consensus stabilization, preflight/precompletion/resume guardrails, benchmark + analytics loops, and project history/PR summary integrations.

Detailed timeline is maintained in CHANGELOG.auto.md.

Why AWE-AgentForge?

Multi-Agent Collaboration

Run cross-agent workflows where one model authors, others review, and sessions challenge each other until the result is defensible.

Bug Resolution Engine

Turn vague failures into structured rounds: reproduce, patch, review, verify, and gate. Built for real bug-fixing throughput, not demo chats.

Continuous Self-Evolution

Run guided or proactive evolution loops so agents can propose, test, and refine improvements beyond the immediate bug ticket.

Human + Policy Control

Manual author approval, medium-gate decisions, and force-fail controls keep operators in charge when risk is high.

Live Operations Console

Monitor project tree, role sessions, and conversation flow in real time, then execute task controls from a single surface.

Reliability + Observability

Use watchdog timeouts, provider fallback, cooldowns, metrics, logs, and traces to keep long-running automation measurable and stable.

Architecture

Visual Overview

Monitor Dashboard (Terminal Pixel Theme)

Preview focus:

Terminal pixel visual style.
High-density multi-role session panel (not only 2-3 roles).
Conversation-centric layout with operational controls visible.

Runtime Flow (Clean Lanes, No Arrow Crossing Through Bubbles)

Project Pulse (Stars)

Core Concepts

Before diving into usage, here are the key concepts:

Participants

Every task has one author (who writes the code) and one or more reviewers (who evaluate it). Participants are identified using the provider#alias format:

Format	Meaning
`claude#author-A`	Claude CLI acting as author, alias "author-A"
`codex#review-B`	Codex CLI acting as reviewer, alias "review-B"
`gemini#review-C`	Gemini CLI acting as second reviewer, alias "review-C"

The provider determines which CLI tool is invoked (claude, codex, or gemini). The alias is a human-readable label for identification in the web console and logs.

Task Lifecycle

Every task follows this lifecycle:

queued → running → passed / failed_gate / failed_system / canceled

In manual mode (self_loop_mode=0), an extra state is inserted:

queued → running → waiting_manual → (approve) → queued → running → passed/failed
                                  → (reject)  → canceled

running in manual mode is now a proposal-consensus stage (reviewer-first when debate_mode=1) before pausing at waiting_manual.

Three Controls

Control	Values	Default	What It Does
`sandbox_mode`	`0` / `1`	`1`	`1` = run in an isolated `*-lab` copy of the workspace; `0` = run directly in main workspace
`self_loop_mode`	`0` / `1`	`0`	`0` = run proposal consensus rounds, then pause for approval; `1` = run autonomous implementation/review loops
`auto_merge`	`0` / `1`	`1`	`1` = on pass, auto-merge changes back + generate changelog; `0` = keep results in sandbox only

Tip

Recommended defaults for safety: sandbox_mode=1 + self_loop_mode=0 + auto_merge=1 — sandbox execution with human sign-off and automatic artifact fusion on pass.

Runtime Controls (New)

Control	Values	Default	What It Does
`memory_mode`	`off` / `basic` / `strict`	`basic`	Controls memory recall/persistence aggressiveness for proposal/discussion/implementation/review prompts
`phase_timeout_seconds`	JSON map	`{}`	Optional per-phase timeout override for `proposal`, `discussion`, `implementation`, `review`, `command`

CLI mirrors these controls via --memory-mode and repeatable --phase-timeout phase=seconds.

Quick Start

Prerequisites

Python 3.10+
Claude CLI installed and authenticated (for Claude participants)
Codex CLI installed and authenticated (for Codex participants)
Gemini CLI installed and authenticated (for Gemini participants)
PostgreSQL (optional — falls back to in-memory database if unavailable)

Step 1: Install

git clone https://github.com/cloveric/awe-agentforge.git
cd awe-agentforge
pip install -e .[dev]
# Optional: copy baseline env and adjust
cp .env.example .env

Step 2: Configure Environment

The system needs to know where your tools are and how to connect. Set the following environment variables:

# Required: tell Python where the source is
$env:PYTHONPATH="src"

# Optional: database connection (omit for in-memory mode)
$env:AWE_DATABASE_URL="postgresql+psycopg://postgres:postgres@localhost:5432/awe_agentcheck?connect_timeout=2"

# Optional: where task artifacts (logs, reports, events) are stored
$env:AWE_ARTIFACT_ROOT=".agents"

# Optional: workflow orchestrator backend (langgraph/classic)
$env:AWE_WORKFLOW_BACKEND="langgraph"

Or use .env.example as a starter and export the values in your shell.

All environment variables reference

Variable	Default	Description
`PYTHONPATH`	(none)	Must include `src/` directory
`AWE_DATABASE_URL`	`postgresql+psycopg://...?...connect_timeout=2`	PostgreSQL connection string. If DB is unavailable, fallback is faster and then switches to in-memory
`AWE_ARTIFACT_ROOT`	`.agents`	Directory for task artifacts (threads, events, reports)
`AWE_CLAUDE_COMMAND`	`claude -p --dangerously-skip-permissions --effort low --model claude-opus-4-6`	Command template for invoking Claude CLI
`AWE_CODEX_COMMAND`	`codex exec --skip-git-repo-check ... -c model_reasoning_effort=xhigh`	Command template for invoking Codex CLI
`AWE_GEMINI_COMMAND`	`gemini --yolo`	Command template for invoking Gemini CLI
`AWE_PARTICIPANT_TIMEOUT_SECONDS`	`3600`	Max seconds a single participant (Claude/Codex/Gemini) can run per step
`AWE_COMMAND_TIMEOUT_SECONDS`	`300`	Max seconds for test/lint commands
`AWE_PARTICIPANT_TIMEOUT_RETRIES`	`1`	Retry count when a participant times out
`AWE_MAX_CONCURRENT_RUNNING_TASKS`	`1`	How many tasks can run simultaneously
`AWE_WORKFLOW_BACKEND`	`langgraph`	Workflow backend (`langgraph` preferred, `classic` fallback)
`AWE_ARCH_AUDIT_MODE`	(auto by evolution level)	Architecture audit enforcement mode: `off`, `warn`, `hard`
`AWE_ARCH_PYTHON_FILE_LINES_MAX`	`1200`	Override max lines for a Python file in architecture audit
`AWE_ARCH_FRONTEND_FILE_LINES_MAX`	`2500`	Override max lines for frontend files in architecture audit
`AWE_ARCH_RESPONSIBILITY_KEYWORDS_MAX`	`10`	Override mixed-responsibility keyword threshold for large Python files
`AWE_ARCH_SERVICE_FILE_LINES_MAX`	`4500`	Override max lines for `src/awe_agentcheck/service.py`
`AWE_ARCH_WORKFLOW_FILE_LINES_MAX`	`2600`	Override max lines for `src/awe_agentcheck/workflow.py`
`AWE_ARCH_DASHBOARD_JS_LINES_MAX`	`3800`	Override max lines for `web/assets/dashboard.js`
`AWE_ARCH_PROMPT_BUILDER_COUNT_MAX`	`14`	Override prompt-builder hotspot threshold
`AWE_ARCH_ADAPTER_RUNTIME_RAISE_MAX`	`0`	Max allowed raw `RuntimeError` raises in adapter runtime path
`AWE_PROVIDER_ADAPTERS_JSON`	(none)	JSON map for extra providers, e.g. `{"qwen":"qwen-cli --yolo"}`
`AWE_PROMOTION_GUARD_ENABLED`	`true`	Enable promotion guard checks before auto-merge/promote-round
`AWE_PROMOTION_ALLOWED_BRANCHES`	(empty)	Optional comma-separated allowed branches (empty = allow any branch)
`AWE_PROMOTION_REQUIRE_CLEAN`	`false`	Require clean git worktree for promotion when guard is enabled
`AWE_SANDBOX_USE_PUBLIC_BASE`	`false`	Use shared/public sandbox root only when explicitly set to `1/true`
`AWE_API_ALLOW_REMOTE`	`false`	Allow non-loopback API access (`false` keeps local-only default)
`AWE_API_TOKEN`	(none)	Optional bearer token for API protection
`AWE_API_TOKEN_HEADER`	`Authorization`	Header name used for API token validation
`AWE_API_RATE_LIMIT_PER_MINUTE`	`120`	Per-client/per-path API quota for `/api/*` (`0` disables quota)
`AWE_DRY_RUN`	`false`	When `true`, participants are not actually invoked
`AWE_SERVICE_NAME`	`awe-agentcheck`	Service name for observability
`AWE_OTEL_EXPORTER_OTLP_ENDPOINT`	(none)	OpenTelemetry collector endpoint

[!NOTE] If AWE_DATABASE_URL is unset and you start via provided scripts, runtime defaults to local SQLite (.agents/runtime/awe-agentcheck.sqlite3) so history survives restarts. Direct custom startup paths may still choose in-memory fallback.

Step 3: Start the API Server

pwsh -NoProfile -ExecutionPolicy Bypass -File "scripts/start_api.ps1" -ForceRestart

bash scripts/start_api.sh --force-restart

Health check:

(Invoke-WebRequest -UseBasicParsing "http://127.0.0.1:8000/healthz").Content

Expected:

{"status":"ok"}

Stop API safely:

pwsh -NoProfile -ExecutionPolicy Bypass -File "scripts/stop_api.ps1"

bash scripts/stop_api.sh

Step 4: Open the Web Monitor

Open your browser and navigate to:

http://localhost:8000/

You'll see the monitor dashboard with:

Left panel: project file tree + roles/sessions
Right panel: task controls, conversation stream, and task creation form

Beginner Dashboard Guide (Button-by-Button)

If this is your first time, operate in this exact order:

Confirm API is online (API: ONLINE at the top right).
Click Refresh.
In Dialogue Scope, choose Project and Task.
Read Conversation first, then decide start/approve/reject.
Use Force Fail only when a task is stuck and cannot recover.

Top Bar

Control	What it means	When to use
`Refresh`	Pull latest tasks/stats/tree/events immediately	Any time data looks stale
`Auto Poll: OFF/ON`	Toggle periodic refresh	Turn ON during active runs
`Theme`	Switch visual style (`Neon Grid`, `Terminal Pixel`, `Executive Glass`)	Personal preference
`API: ONLINE/RETRY(n)`	Backend health indicator	If `RETRY`, check server logs first

Left Panel: Project Structure

Control	What it means	When to use
`Expand`	Open all currently loaded folders in the tree	Get full repository context quickly
`Collapse`	Close all folders	Reduce noise when tree is too dense
Tree node (`[D]` / `[F]`)	Directory or file item for selected project	Verify target repo and key files

Left Panel: Roles / Sessions

Control	What it means	When to use
`all roles`	Show full mixed conversation stream	Default view for global context
`provider#alias` role row	Filter conversation to a single role/session	Debug one participant's behavior

Right Panel: Dialogue Scope + Task Controls

Control	What it means	When to use
`Project`	Active project scope	Switch when multiple repos are tracked
`Task`	Active task scope	Move between tasks in selected project
`Force-fail reason`	Reason text sent if force-failing a task	Fill before pressing `Force Fail`
`Start`	Start selected `queued` task	Normal start action
`Approve + Queue`	Approve proposal in `waiting_manual`, leave task queued	Approve now, start later
`Approve + Start`	Approve proposal and immediately run	Fast path after proposal review
`Reject`	Reject proposal in `waiting_manual` and cancel task	Proposal is risky or low quality
`Cancel`	Cancel current running/queued task	Stop work intentionally
`Force Fail`	Mark task `failed_system` with your reason	Last resort for stuck/hung tasks
`Reload Dialogue`	Force re-fetch event stream for selected task	Dialogue appears incomplete

Conversation Panel

Area	What it means	How to read
Actor label (e.g. `claude#author-A`)	Who sent the event	Track accountability by role
Event kind (e.g. `discussion`, `review`)	Workflow stage marker	Detect where failures happen
Message body	Raw or summarized event payload	Validate claims before approving

Create Task Form (Every Input)

Field	Meaning	Recommended beginner value
`Title`	Task name shown everywhere	Clear and short
`Workspace path`	Repository root path	Your actual project path
`Author`	Implementing participant	`claude#author-A` / `codex#author-A` / `gemini#author-A`
`Reviewers`	One or more reviewers, comma-separated	At least 1 reviewer
`Claude Model / Codex Model / Gemini Model`	Per-provider model pinning (dropdown + editable)	Start from defaults (`claude-opus-4-6`, `gpt-5.3-codex`, `gemini-3-pro-preview`)
`Claude/Codex/Gemini Model Params`	Optional extra args per provider	For Codex use `-c model_reasoning_effort=xhigh`
`Policy Template`	Preset execution posture (applies multiple controls at once)	Start with `deep-discovery-first`; use `frontier-evolve` for aggressive idea/framework/UI exploration
`Claude Team Agents`	Enable/disable Claude `--agents` mode	`0` (disabled)
`Evolution Level`	`0` fix-only, `1` guided evolve, `2` proactive evolve, `3` frontier/aggressive evolve	Start with `0`
`Repair Mode`	`minimal` / `balanced` / `structural`	Start with `balanced`
`Max Rounds`	`self_loop_mode=0`: required consensus rounds; `self_loop_mode=1`: retry cap fallback when no deadline	`1`
`Evolve Until`	Optional deadline (`YYYY-MM-DD HH:MM`)	Empty unless running overnight
`Max Rounds` + `Evolve Until`	Priority rule	If `Evolve Until` is set, deadline wins; if empty, `Max Rounds` is used
`Conversation Language`	Prompt language for agent outputs (`en` / `zh`)	`English` for logs, `中文` for Chinese collaboration
`Plain Mode`	Beginner-friendly readable output (`1` on / `0` off)	Start with `1`
`Stream Mode`	Realtime stream chunks from participant stdout/stderr (`1` on / `0` off)	Start with `1`
`Debate Mode`	Enable reviewer-first debate/precheck stage (`1` on / `0` off)	Start with `1`
`Sandbox Mode`	`1` sandbox / `0` main workspace	Keep `1` for safety
`Sandbox Workspace Path`	Optional custom sandbox path	Leave blank (auto per-task path)
`Self Loop Mode`	`0` manual approval / `1` autonomous	Start with `0`
`Auto Merge`	`1` auto-fusion on pass / `0` disable	Keep `1` initially
`Merge Target Path`	Where pass results are merged	Project root
`Description`	Detailed requirement text	Include acceptance criteria

UI policy note: when Sandbox Mode = 0, the dashboard forces Auto Merge = 0 and locks that selector.

Policy template quick map:

deep-discovery-first (default): audit-first, broad discovery, evolution_level=2.
frontier-evolve: aggressive proactive evolution, evolution_level=3.
deep-evolve: deep structural refactor posture, auto_merge=0.
safe-review: conservative risk-first/manual bias.
rapid-fix: fastest small-patch posture.

Create Buttons

Button	Behavior	Use case
`Create`	Create task only (stays queued)	You want to review settings first
`Create + Start`	Create and start immediately	You already trust current settings

Safe Beginner Preset

Use this default stack for lowest risk:

Sandbox Mode = 1
Self Loop Mode = 0
Auto Merge = 1
Reviewer count >= 1

Then run this rhythm: Create + Start -> wait for waiting_manual -> inspect Conversation -> Approve + Start or Reject.

Step 5: Create Your First Task

You can create a task via the Web UI (use the "Create Task" form at the bottom of the dashboard) or via the CLI:

py -m awe_agentcheck.cli run `
  --task "Fix the login validation bug" `
  --author "codex#author-A" `
  --reviewer "claude#review-B" `
  --conversation-language en `
  --workspace-path "." `
  --auto-start

This will:

Create a task with title "Fix the login validation bug"
Assign Codex as the author and Claude as the reviewer
Use default policies (sandbox_mode=1, self_loop_mode=0, auto_merge=1)
Automatically start the task (--auto-start)
Since self_loop_mode=0, the system will run reviewer-first proposal consensus rounds, then pause at waiting_manual for your approval

Step 6: Approve and Execute (Manual Mode)

After the system pauses at waiting_manual, review the proposal in the web UI or via CLI, then approve:

# Approve the proposal and immediately start execution
py -m awe_agentcheck.cli decide <task-id> --approve --auto-start

Or reject:

# Reject the proposal (task will be canceled)
py -m awe_agentcheck.cli decide <task-id>

Important

In manual mode, the task will not proceed to implementation until you explicitly approve. This is by design — it ensures you have full control over what gets implemented.

CLI Reference

The CLI communicates with the API server over HTTP. Make sure the server is running before using any CLI command.

py -m awe_agentcheck.cli [--api-base URL] <command> [options]

Global option: --api-base (default: http://127.0.0.1:8000) — the API server URL.

`run` — Create a New Task

Creates a task and optionally starts it immediately.

py -m awe_agentcheck.cli run `
  --task "Task title" `
  --description "Detailed description of what to do" `
  --author "claude#author-A" `
  --reviewer "codex#review-B" `
  --reviewer "claude#review-C" `
  --conversation-language en `
  --sandbox-mode 1 `
  --self-loop-mode 0 `
  --auto-merge `
  --workspace-path "C:/path/to/your/project" `
  --max-rounds 3 `
  --test-command "py -m pytest -q" `
  --lint-command "py -m ruff check ." `
  --auto-start

Flag	Required	Default	Description
`--task`	Yes	—	Task title (shown in UI and logs)
`--description`	No	same as `--task`	Detailed description for the AI participants
`--author`	Yes	—	Author participant in `provider#alias` format
`--reviewer`	Yes	—	Reviewer participant (repeatable for multiple reviewers)
`--sandbox-mode`	No	`1`	`1` = sandbox, `0` = main workspace
`--sandbox-workspace-path`	No	auto-generated	Custom sandbox directory path
`--self-loop-mode`	No	`0`	`0` = manual approval, `1` = autonomous
`--auto-merge` / `--no-auto-merge`	No	enabled	Enable/disable auto-fusion on pass
`--merge-target-path`	No	project root	Where to merge changes back to
`--workspace-path`	No	`.`	Path to the target repository
`--max-rounds`	No	`3`	Manual mode: required consensus rounds. Autonomous mode: max gate retries when no deadline
`--test-command`	No	`py -m pytest -q`	Command to run tests
`--lint-command`	No	`py -m ruff check .`	Command to run linter
`--evolution-level`	No	`0`	`0` = fix-only, `1` = guided evolve, `2` = proactive evolve, `3` = frontier/aggressive evolve
`--repair-mode`	No	`balanced`	Repair policy (`minimal` / `balanced` / `structural`)
`--evolve-until`	No	—	Deadline for evolution (e.g. `2026-02-13 06:00`)
`--conversation-language`	No	`en`	Agent output language (`en` or `zh`)
`--plain-mode` / `--no-plain-mode`	No	enabled	Toggle beginner-readable output mode
`--stream-mode` / `--no-stream-mode`	No	enabled	Toggle realtime stream events
`--debate-mode` / `--no-debate-mode`	No	enabled	Toggle reviewer-first debate/precheck stage
`--provider-model`	No	—	Per-provider model override in `provider=model` format (repeatable)
`--provider-model-param`	No	—	Per-provider extra args in `provider=args` format (repeatable)
`--claude-team-agents`	No	`0`	`1` enables Claude `--agents` mode for Claude participants
`--auto-start`	No	`false`	Start immediately after creation

`decide` — Submit Author Decision

Used in manual mode to approve or reject a proposal at waiting_manual state.

# Approve and immediately start
py -m awe_agentcheck.cli decide <task-id> --approve --auto-start

# Approve without auto-start (task goes to queued)
py -m awe_agentcheck.cli decide <task-id> --approve

# Reject (task is canceled)
py -m awe_agentcheck.cli decide <task-id>

# Approve with a note
py -m awe_agentcheck.cli decide <task-id> --approve --note "Looks good, proceed" --auto-start

`status` — Get Task Details

py -m awe_agentcheck.cli status <task-id>

Returns the full task object as JSON, including status, rounds completed, gate reason, etc.

`tasks` — List All Tasks

py -m awe_agentcheck.cli tasks --limit 20

`stats` — Show Aggregated Statistics

py -m awe_agentcheck.cli stats

Returns pass rates, failure buckets, provider error counts, and average task duration.

`analytics` — Show Advanced Analytics

py -m awe_agentcheck.cli analytics --limit 300

Returns failure taxonomy/trend and reviewer drift metrics for observability analysis.

`policy-templates` — Get Recommended Policy Presets

py -m awe_agentcheck.cli policy-templates --workspace-path "."

Returns repo profile and suggested task-control presets by size/risk.

`benchmark` — Run Fixed A/B Benchmark Harness

py -m awe_agentcheck.cli benchmark `
  --workspace-path "." `
  --variant-a-name "baseline" `
  --variant-b-name "candidate" `
  --reviewer "claude#review-B"

Runs the fixed benchmark pack and writes JSON/Markdown reports under .agents/benchmarks/.

`github-summary` — Generate PR-Ready Summary

py -m awe_agentcheck.cli github-summary <task-id>

Returns markdown summary and artifact links suitable for GitHub PR description.

`start` — Start an Existing Task

py -m awe_agentcheck.cli start <task-id>
py -m awe_agentcheck.cli start <task-id> --background

`cancel` — Cancel a Task

py -m awe_agentcheck.cli cancel <task-id>

`force-fail` — Force-Fail a Task

py -m awe_agentcheck.cli force-fail <task-id> --reason "Manual abort: wrong branch"

`promote-round` — Promote One Round Snapshot (Manual Multi-Round Mode)

py -m awe_agentcheck.cli promote-round <task-id> --round 2 --merge-target-path "."

Use when max_rounds>1 and auto_merge=0. Promotes one selected round snapshot into target path.

`events` — List Task Events

py -m awe_agentcheck.cli events <task-id>

Returns the full event timeline for a task (discussions, reviews, verifications, gate results, etc.).

`tree` — Show Workspace File Tree

py -m awe_agentcheck.cli tree --workspace-path "." --max-depth 4

Usage Examples

Example 1: Safe Manual Review (Recommended for First Use)

The most conservative approach — sandbox execution with manual approval:

py -m awe_agentcheck.cli run `
  --task "Improve error handling in the API layer" `
  --author "claude#author-A" `
  --reviewer "codex#review-B" `
  --reviewer "claude#review-C" `
  --workspace-path "." `
  --auto-start

What happens:

System creates an isolated sandbox workspace (awe-agentcheck-lab/20260213-...)
Reviewers precheck and challenge the proposal first (reviewer-first stage)
Author revises proposal, reviewers re-check for consensus
Task pauses at waiting_manual — you review in the web UI
You approve → system runs implementation → reviewers review code → tests + lint → gate decision
If passed: changes auto-merge back to your main workspace with a changelog

Example 2: Fully Autonomous Overnight Run

For unattended operation (make sure you trust the safety controls):

py -m awe_agentcheck.cli run `
  --task "Overnight continuous improvement" `
  --author "codex#author-A" `
  --reviewer "claude#review-B" `
  --sandbox-mode 1 `
  --self-loop-mode 1 `
  --max-rounds 5 `
  --workspace-path "." `
  --auto-start

What happens:

Codex (author) goes directly into the workflow loop — no manual checkpoint
Each round: discussion → implementation → review → verify → gate
If gate passes: done. If fails: retries up to 5 rounds
Results auto-merge back on pass

Example 3: No Auto-Merge (Keep Results in Sandbox)

When you want to review changes manually before merging:

py -m awe_agentcheck.cli run `
  --task "Experimental refactoring" `
  --author "claude#author-A" `
  --reviewer "codex#review-B" `
  --workspace-path "." `
  --no-auto-merge `
  --auto-start

What happens:

Everything runs as normal, but on pass, changes stay in the sandbox
You can manually review the sandbox directory and merge changes yourself

Example 4: Direct Main Workspace (No Sandbox)

When you want changes applied directly to your main workspace:

py -m awe_agentcheck.cli run `
  --task "Quick fix: typo in README" `
  --author "claude#author-A" `
  --reviewer "codex#review-B" `
  --sandbox-mode 0 `
  --self-loop-mode 1 `
  --workspace-path "." `
  --auto-start

Warning

With sandbox_mode=0, changes are made directly in your workspace. Use this only for low-risk tasks or when you have git to revert.

API Reference

All endpoints are served at http://localhost:8000. Request/response bodies are JSON.

Create Task

POST /api/tasks

Request body

{
  "title": "Fix login validation bug",
  "description": "The email validator accepts invalid formats",
  "author_participant": "claude#author-A",
  "reviewer_participants": ["codex#review-B"],
  "conversation_language": "en",
  "provider_models": {
    "claude": "claude-opus-4-6",
    "codex": "gpt-5.3-codex"
  },
  "provider_model_params": {
    "codex": "-c model_reasoning_effort=xhigh"
  },
  "claude_team_agents": false,
  "sandbox_mode": true,
  "self_loop_mode": 0,
  "auto_merge": true,
  "workspace_path": ".",
  "max_rounds": 3,
  "test_command": "py -m pytest -q",
  "lint_command": "py -m ruff check .",
  "auto_start": true
}

Response (201)

{
  "task_id": "task-abc123",
  "title": "Fix login validation bug",
  "status": "queued",
  "sandbox_mode": true,
  "self_loop_mode": 0,
  "auto_merge": true,
  "rounds_completed": 0,
  ...
}

All Endpoints

Method	Endpoint	Description
`POST`	`/api/tasks`	Create a new task
`GET`	`/api/tasks`	List all tasks (`?limit=100`)
`GET`	`/api/tasks/{id}`	Get task details
`POST`	`/api/tasks/{id}/start`	Start a task (`{"background": true}` for async)
`POST`	`/api/tasks/{id}/cancel`	Request task cancellation
`POST`	`/api/tasks/{id}/force-fail`	Force-fail with `{"reason": "..."}`
`POST`	`/api/tasks/{id}/promote-round`	Promote one selected round into merge target (requires `max_rounds>1` and `auto_merge=0`)
`POST`	`/api/tasks/{id}/author-decision`	Approve/reject in manual mode: `{"approve": true, "auto_start": true}`
`GET`	`/api/tasks/{id}/events`	Get full event timeline
`POST`	`/api/tasks/{id}/gate`	Submit manual gate result
`GET`	`/api/provider-models`	Get provider model catalog for UI dropdowns
`GET`	`/api/policy-templates`	Get workspace profile and recommended control presets
`GET`	`/api/analytics`	Get failure taxonomy/trends and reviewer drift analytics
`GET`	`/api/tasks/{id}/github-summary`	Build GitHub/PR-ready markdown summary
`GET`	`/api/project-history`	Project-level history records (`core_findings`, `revisions`, `disputes`, `next_steps`)
`POST`	`/api/project-history/clear`	Clear scoped history records (optionally includes matching live tasks)
`GET`	`/api/workspace-tree`	File tree (`?workspace_path=.&max_depth=4`)
`GET`	`/api/stats`	Aggregated statistics (pass rates, durations, failure buckets)
`GET`	`/healthz`	Health check

Feature Matrix

Capability	Description	Status
Sandbox-first execution	Default `sandbox_mode=1`, runs in `*-lab` workspace with auto-generated per-task isolation	`GA`
Author-approval gate	Default `self_loop_mode=0`, enters `waiting_manual` after reviewer-first proposal consensus rounds	`GA`
Autonomous self-loop	`self_loop_mode=1` for unattended operation	`GA`
Auto fusion	On pass: merge + `CHANGELOG.auto.md` + snapshot	`GA`
Provider model pinning	Set model per provider (`claude` / `codex` / `gemini`) per task	`GA`
Claude team-agents mode	Per-task toggle to enable Claude `--agents` behavior	`GA`
Multi-provider role model	`provider#alias` participants (cross-provider or same-provider multi-session)	`GA`
Web monitor console	Project tree, roles/sessions, avatar-based chat, task controls, drag-and-drop	`GA`
Project history ledger	Cross-task timeline with findings/revisions/disputes/next-steps by project	`GA`
Multi-theme UI	Neon Grid, Terminal Pixel, Executive Glass	`GA`
Observability stack	OpenTelemetry, Prometheus, Loki, Tempo, Grafana	`GA`
Overnight supervisor	Timeout watchdog, provider fallback, cooldown, single-instance lock	`GA`

How the Workflow Works

Manual Mode (`self_loop_mode=0` — Default)

This is the recommended mode for most use cases:

Create task → status becomes queued
Start task → system runs proposal-consensus rounds:
- if debate_mode=1, reviewers precheck first (proposal_precheck_review)
- author replies with a revised proposal based on reviewer feedback
- reviewers evaluate proposal quality/alignment (proposal_review)
Consensus rule:
- one round is counted only when all required reviewers return pass-level consensus
- same-round retries continue until alignment, but now have a 10-retry stall guard (proposal_consensus_stalled_in_round)
- repeated same-issue consensus across rounds has a 4-round stall guard (proposal_consensus_stalled_across_rounds)
- stall details are surfaced in Project History under Disputes and Next Steps (not hidden in backend-only logs)
Wait for human → after required consensus rounds are complete, status becomes waiting_manual
Author decides:
- Approve → status becomes queued (with author_approved reason), then immediately re-starts into the full workflow
- Reject → status becomes canceled
Full workflow runs: reviewer-first debate (optional) → author discussion → author implementation → reviewer review → verify (test + lint) → gate
Gate result:
- Pass → passed → Auto Fusion (merge + changelog + snapshot + sandbox cleanup)
- Fail → retry next round; limit by Evolve Until when set, otherwise by max_rounds, then failed_gate

Autonomous Mode (`self_loop_mode=1`)

For unattended operation:

Create task → queued
Start task → immediately enters the full workflow (no manual checkpoint)
Round 1..N: reviewer-first debate (optional) → author discussion → author implementation → reviewer review → verify → gate
Gate result:
- Pass → passed → Auto Fusion
- Fail → retry until deadline (Evolve Until) or max_rounds (when no deadline), then failed_gate

Auto-Fusion Details

When a task passes and auto_merge=1:

Changed files are copied from sandbox to your main workspace
CHANGELOG.auto.md is appended with a summary
A snapshot is saved to .agents/snapshots/
The auto-generated sandbox is cleaned up (if system-generated)
An auto_merge_summary.json artifact is written

Sandbox lifecycle details

Without explicit sandbox_workspace_path, the system creates a unique per-task sandbox: <project>-lab/<timestamp>-<id>/
The sandbox is a filtered copy of your project (excludes .git, .venv, node_modules, __pycache__, etc.)
When task passes and auto-fusion completes, system-generated sandboxes are auto-cleaned
If you specified a custom sandbox_workspace_path, it is retained by default

Roadmap

2026 Q1

Sandbox-first default policy
Author-approval gate
Auto-fusion + changelog + snapshot
Role/session monitor with multi-theme UI

2026 Q2

Richer GitHub/PR integration (change summary linking to task artifacts)
Policy templates by repo size/risk profile
Pluggable participant adapters beyond built-in Claude/Codex/Gemini

2026 Q3

Branch-aware auto promotion pipeline (sandbox -> main with policy guard)
Advanced visual analytics (failure taxonomy trends, reviewer drift signals)

Documentation

Document	Description
`README.zh-CN.md`	Chinese documentation
`docs/RUNBOOK.md`	Operations guide & commands
`docs/ARCHITECTURE_FLOW.md`	System architecture deep dive
`docs/API_EXPOSURE_AUDIT.md`	Localhost/public API exposure audit and guardrails
`docs/TESTING_TARGET_POLICY.md`	Testing approach & policy
`docs/GITHUB_ABOUT.md`	Suggested GitHub About/description copy (EN/CN)
`docs/SESSION_HANDOFF.md`	Session handoff notes

Development

# Lint
py -m ruff check .

# Test
py -m pytest -q

Contributing

Contributions are welcome! Please ensure:

Code passes ruff check . with no warnings
All tests pass with pytest -q
New features include appropriate test coverage

License

MIT

_{Built for teams that demand structured, observable, and safe multi-model code review workflows.}

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github		.github
docs		docs
ops		ops
scripts		scripts
src/awe_agentcheck		src/awe_agentcheck
tests		tests
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.auto.md		CHANGELOG.auto.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
docker-compose.observability.yml		docker-compose.observability.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Latest Update (Daily Summary)

Why AWE-AgentForge?

Architecture

Visual Overview

Monitor Dashboard (Terminal Pixel Theme)

Runtime Flow (Clean Lanes, No Arrow Crossing Through Bubbles)

Project Pulse (Stars)

Core Concepts

Participants

Task Lifecycle

Three Controls

Runtime Controls (New)

Quick Start

Prerequisites

Step 1: Install

Step 2: Configure Environment

Step 3: Start the API Server

Step 4: Open the Web Monitor

Beginner Dashboard Guide (Button-by-Button)

Top Bar

Left Panel: Project Structure

Left Panel: Roles / Sessions

Right Panel: Dialogue Scope + Task Controls

Conversation Panel

Create Task Form (Every Input)

Create Buttons

Safe Beginner Preset

Step 5: Create Your First Task

Step 6: Approve and Execute (Manual Mode)

CLI Reference

run — Create a New Task

decide — Submit Author Decision

status — Get Task Details

tasks — List All Tasks

stats — Show Aggregated Statistics

analytics — Show Advanced Analytics

policy-templates — Get Recommended Policy Presets

benchmark — Run Fixed A/B Benchmark Harness

github-summary — Generate PR-Ready Summary

start — Start an Existing Task

cancel — Cancel a Task

force-fail — Force-Fail a Task

promote-round — Promote One Round Snapshot (Manual Multi-Round Mode)

events — List Task Events

tree — Show Workspace File Tree

Usage Examples

Example 1: Safe Manual Review (Recommended for First Use)

Example 2: Fully Autonomous Overnight Run

Example 3: No Auto-Merge (Keep Results in Sandbox)

Example 4: Direct Main Workspace (No Sandbox)

API Reference

Create Task

All Endpoints

Feature Matrix

How the Workflow Works

Manual Mode (self_loop_mode=0 — Default)

Autonomous Mode (self_loop_mode=1)

Auto-Fusion Details

Roadmap

2026 Q1

2026 Q2

2026 Q3

Documentation

Development

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

`run` — Create a New Task

`decide` — Submit Author Decision

`status` — Get Task Details

`tasks` — List All Tasks

`stats` — Show Aggregated Statistics

`analytics` — Show Advanced Analytics

`policy-templates` — Get Recommended Policy Presets

`benchmark` — Run Fixed A/B Benchmark Harness

`github-summary` — Generate PR-Ready Summary

`start` — Start an Existing Task

`cancel` — Cancel a Task

`force-fail` — Force-Fail a Task

`promote-round` — Promote One Round Snapshot (Manual Multi-Round Mode)

`events` — List Task Events

`tree` — Show Workspace File Tree

Manual Mode (`self_loop_mode=0` — Default)

Autonomous Mode (`self_loop_mode=1`)

Packages