A governed, session-centered control plane for long-running agents.
Run coding agents under zero-trust execution, durable session history, isolated capabilities, and explicit promotion gates instead of handing repository ownership to a single runtime.
English | 简体中文
Autonomous Agent Stack (AAS) is a governed control plane for long-running agent execution.
It separates durable session history, execution capabilities, orchestration policies, and promotion authority so that no single model runtime gets to discover work, edit code, approve its own output, and publish it.
AAS is not a generic AI agent demo. It is built for teams that want to integrate tools such as OpenHands, Codex, or custom agents without collapsing trust boundaries. In AAS, those tools are execution surfaces, not the system of record.
Today, AAS is focused on a high-value vertical: governed repository changes. Long term, the same control-plane model is intended to support broader agent work across heterogeneous runtimes, tools, and environments.
AAS is evolving toward a more Agent OS-like control layer, but today it should first be understood as a governed control plane for long-running agents.
Over time, agent distribution may look increasingly app-like, with installable and removable agent packages, tools, or skills. But that is the distribution layer. AAS is concerned with the system layer beneath it: session, capability, policy, and promotion.
In federated settings, agents are not just app-like packages. They also behave like dispatched workers: scoped, leased, auditable, and recallable across trust boundaries. Capabilities may look like apps, agents behave more like workers, and AAS exists as the control plane that governs both.
As agents take on work that spans many context windows, the hard problem is no longer just "can the model code?"
The hard problem is:
- Can the system preserve progress across sessions?
- Can it recover state after failure or handoff?
- Can it isolate capabilities without making one runtime the trusted core?
- Can it promote privileged changes explicitly instead of implicitly?
Most agent stacks hard-code temporary model limitations into permanent architecture. AAS takes the opposite approach: keep the system abstractions stable, and keep the harness replaceable.
Session -> policy -> isolated capability -> validation -> promotion
Current implementation focus:
Planner -> isolated worker -> validation gate -> promotion gate -> patch artifact or draft PR
Core invariants:
- Patch-only by default
- Deny-wins policy merging
- Single-writer promotion for mutable state
- Runtime artifacts never promote into source
- Clean-base checks before draft PR promotion
Deep implementation details live in ARCHITECTURE.md and the RFC index in docs/rfc/README.en.md.
A durable execution history, not a mirror of the context window.
Sandboxes, remote workers, MCP servers, browsers, and git proxies treated as isolated hands.
Replaceable orchestration rules for context assembly, retries, evaluation, escalation, and routing.
Explicit, auditable state transitions for any privileged change.
| Traditional agent stack | AAS |
|---|---|
| Agent gets repository write authority | Worker produces a bounded patch candidate |
| Planning, execution, and merge authority live in one runtime | Policy, execution, and promotion are separated |
| Validation is optional or ad hoc | Validation and promotion rules are on the main path |
| External tools become the de facto control plane | Tools plug into a governed control plane |
| Runtime state leaks into source changes | Runtime artifacts and source promotion are isolated |
| Trust is implicit | Zero-trust invariants are explicit and auditable |
- Do not turn temporary model weaknesses into permanent system architecture.
- Do not let a single runtime become the trusted core.
- Do not rely on the model being "not clever enough" for security.
- Keep orchestration replaceable.
- Keep privileged changes explicit.
- Preserve recoverable history outside the context window.
Requirements:
- Python 3.11+
make- Docker or Colima for
ai-laband sandbox-backed flows (optional for basic local startup)
git clone https://github.com/srxly888-creator/autonomous-agent-stack.git
cd autonomous-agent-stack
make setup
make doctor
make startEnvironment files: make setup creates .env from .env.example when .env is missing. Prefer .env.local for secrets (gitignored). Do not commit real tokens.
Open after startup:
- API docs:
http://127.0.0.1:8001/docs - Admin panel:
http://127.0.0.1:8001/panel - Health check:
http://127.0.0.1:8001/health
Validate the local setup:
make test-quick
make smoke-local
make hygiene-checkFor detailed setup and troubleshooting, read docs/QUICK_START.md. For remote or multi-machine execution, start with docs/linux-remote-worker.md. If you want to run Hermes on Windows through WSL2 and let the base control plane take over, read docs/windows-wsl2-hermes-control-plane.md.
Native Windows support is currently limited to the minimal local control-plane path:
make setup, make doctor, and make start. Other targets still assume Bash and/or macOS/Linux tooling.
v0.1.0-stable establishes a verified baseline for running AAS on a single machine without external dependencies.
The default mode is minimal (stable), which:
- Starts reliably with core features only
- Makes optional routers non-blocking
- Disables experimental features by default
- Suitable for local development and testing
# Default: minimal mode (stable)
AUTORESEARCH_MODE=minimal make start
# Full mode: all features (experimental)
AUTORESEARCH_MODE=full make start| Feature | Status |
|---|---|
| FastAPI application | ✅ Starts at http://127.0.0.1:8001 |
| SQLite control plane | ✅ artifacts/api/*.sqlite3 |
| AEP runner (mock) | ✅ End-to-end execution |
| Worker schedules | ✅ APScheduler-backed once / interval schedules via /api/v1/worker-schedules |
| Runtime artifact exclusion | ✅ Patch hygiene enforced |
| Health/docs endpoints | ✅ All respond correctly |
- Distributed execution (requires queue infrastructure)
- Telegram integration (requires bot token)
- WebAuthn (requires additional setup)
- Cluster mode (distributed coordination only)
- Complex cron syntax and multi-node scheduling
See STATUS_AND_RELEASE_NOTES.md for complete details.
Branch: feat/single-machine-aas-ready-for-req4
Status: ✅ Engineering Scaffold Complete - NOT Production Complete
This branch provides a complete engineering scaffold for requirement #4 (Excel commission processing). All preparation is done - business logic implementation can start immediately when required assets arrive.
| Component | File | Status |
|---|---|---|
| Commission Engine | src/autoresearch/core/services/commission_engine.py |
✅ Deterministic interface |
| Excel Jobs Repository | src/autoresearch/core/repositories/excel_jobs.py |
✅ SQLite-backed |
| Excel Ops Service | src/autoresearch/core/services/excel_ops.py |
✅ Orchestration layer |
| Excel Ops Router | src/autoresearch/api/routers/excel_ops.py |
✅ REST API |
| Models & Contracts | src/autoresearch/shared/excel_ops_models.py |
✅ Schemas defined |
| Contract Tests | tests/test_excel_ops_service.py |
✅ Verify blocking |
| Validation Script | scripts/validate_stable_baseline.sh |
✅ make validate-req4 |
| Asset | Purpose | Location |
|---|---|---|
| Excel contracts | File schemas, column mappings | tests/fixtures/requirement4_contracts/ |
| Ambiguity checklist | 7 categories of edge case decisions | tests/fixtures/requirement4_contracts/ |
| Sample Excel files | Real input data for testing | tests/fixtures/requirement4_samples/ |
| Golden outputs | Expected calculation results | tests/fixtures/requirement4_golden/ |
# Validate requirement #4 readiness
make validate-req4
# Run contract tests
pytest tests/test_excel_ops_service.py -v
# Check readiness status
cat docs/requirement4/IMPLEMENTATION_READY_CHECKLIST.md- No Silent Calculations: Blocks without valid contracts
- Deterministic Only: No LLM reasoning in production path
- Audit Trail: Job state tracked in SQLite
- Runtime Artifact Exclusion: Patches exclude
.masfactory_runtime/,logs/,memory/
See: docs/requirement4/ for complete preparation details.
For implementation:
- English: docs/requirement4/CLAUDE_CODE_BEST_PRACTICES.md
- 中文: docs/requirement4/CLAUDE_CODE_BEST_PRACTICES_ZH.md
- 资产到达后的行动指南: docs/requirement4/ACTION_PLAN_WHEN_ASSETS_ARRIVE_ZH.md ⭐ 推荐 - 包含 4 个必需资产的详细说明和示例
AAS is designed to integrate agent runtimes without turning them into the trusted core:
- OpenHands as a constrained worker behind patch-only contracts and promotion gates
- Codex and custom adapters through controlled execution and AEP-style job specs
- Remote workers for machine-specific capabilities, credentials, or isolated execution surfaces
- GitHub and chat-triggered workflows routed back into the same control plane
See docs/openhands-cli-integration.md, docs/agent-execution-protocol.md, and docs/linux-remote-worker.md.
Start here:
- WHY_AAS.md: project motivation and design direction
- docs/QUICK_START.md: detailed setup and troubleshooting
- CONTRIBUTING.md: contribution workflow and expectations
Go deeper:
- ARCHITECTURE.md: canonical current architecture
- docs/agent-execution-protocol.md: execution contract and policy model
- docs/api-reference.md: API surface
Explore integrations and evolution:
- docs/openhands-cli-integration.md: OpenHands as a controlled worker
- docs/github-assistant-quickstart.md: GitHub assistant flows
- docs/rfc/README.en.md: RFC index and design process
A stable single-repo control plane with isolated execution and promotion checks.
- Session-first recovery and replay
- Capability registry for heterogeneous workers and tools
- Policy seams for orchestration strategies
- Fast policy router vs slow orchestration (butler: rules-first, model fill-in, Hermes for heavy work) — docs/decisions/fast-policy-router-and-slow-orchestration-v1.md
- Distributed execution with durable queues, leases, and heartbeats
A governed runtime substrate for long-running agent work across multiple models, multiple hands, and multiple trust boundaries.
AAS is for teams that want:
- autonomous execution without repository ownership
- durable progress across long-running tasks
- zero-trust safety boundaries
- auditable promotion workflows
- multi-runtime interoperability without surrendering control
If you want to contribute, start with CONTRIBUTING.md and ARCHITECTURE.md. Small documentation fixes and focused bug fixes are good first contributions. Architectural changes should start as an RFC in docs/rfc/.
A typical local loop is:
make review-setup
make test-quick
make hygiene-check
make review-gates-localmake review-setup installs mypy, bandit, and semgrep into .venv-review so the
main .venv can stay aligned with make setup.
Open an issue or discussion if you want to validate a design direction before implementation.