Skip to content

Latest commit

 

History

History
321 lines (214 loc) · 13.2 KB

File metadata and controls

321 lines (214 loc) · 13.2 KB

Autonomous Agent Stack

A governed, session-centered control plane for long-running agents.

Run coding agents under zero-trust execution, durable session history, isolated capabilities, and explicit promotion gates instead of handing repository ownership to a single runtime.

CI Quality Gates RFC

English | 简体中文


What AAS Is

Autonomous Agent Stack (AAS) is a governed control plane for long-running agent execution.

It separates durable session history, execution capabilities, orchestration policies, and promotion authority so that no single model runtime gets to discover work, edit code, approve its own output, and publish it.

AAS is not a generic AI agent demo. It is built for teams that want to integrate tools such as OpenHands, Codex, or custom agents without collapsing trust boundaries. In AAS, those tools are execution surfaces, not the system of record.

Today, AAS is focused on a high-value vertical: governed repository changes. Long term, the same control-plane model is intended to support broader agent work across heterogeneous runtimes, tools, and environments.

AAS is evolving toward a more Agent OS-like control layer, but today it should first be understood as a governed control plane for long-running agents.

Over time, agent distribution may look increasingly app-like, with installable and removable agent packages, tools, or skills. But that is the distribution layer. AAS is concerned with the system layer beneath it: session, capability, policy, and promotion.

In federated settings, agents are not just app-like packages. They also behave like dispatched workers: scoped, leased, auditable, and recallable across trust boundaries. Capabilities may look like apps, agents behave more like workers, and AAS exists as the control plane that governs both.

Why This Matters

As agents take on work that spans many context windows, the hard problem is no longer just "can the model code?"

The hard problem is:

  • Can the system preserve progress across sessions?
  • Can it recover state after failure or handoff?
  • Can it isolate capabilities without making one runtime the trusted core?
  • Can it promote privileged changes explicitly instead of implicitly?

Most agent stacks hard-code temporary model limitations into permanent architecture. AAS takes the opposite approach: keep the system abstractions stable, and keep the harness replaceable.

Core Model

Session -> policy -> isolated capability -> validation -> promotion

Current implementation focus:

Planner -> isolated worker -> validation gate -> promotion gate -> patch artifact or draft PR

Core invariants:

  • Patch-only by default
  • Deny-wins policy merging
  • Single-writer promotion for mutable state
  • Runtime artifacts never promote into source
  • Clean-base checks before draft PR promotion

Deep implementation details live in ARCHITECTURE.md and the RFC index in docs/rfc/README.en.md.

Stable Abstractions

Session

A durable execution history, not a mirror of the context window.

Capability

Sandboxes, remote workers, MCP servers, browsers, and git proxies treated as isolated hands.

Policy

Replaceable orchestration rules for context assembly, retries, evaluation, escalation, and routing.

Promotion

Explicit, auditable state transitions for any privileged change.

What Makes AAS Different

Traditional agent stack AAS
Agent gets repository write authority Worker produces a bounded patch candidate
Planning, execution, and merge authority live in one runtime Policy, execution, and promotion are separated
Validation is optional or ad hoc Validation and promotion rules are on the main path
External tools become the de facto control plane Tools plug into a governed control plane
Runtime state leaks into source changes Runtime artifacts and source promotion are isolated
Trust is implicit Zero-trust invariants are explicit and auditable

Design Principles

  • Do not turn temporary model weaknesses into permanent system architecture.
  • Do not let a single runtime become the trusted core.
  • Do not rely on the model being "not clever enough" for security.
  • Keep orchestration replaceable.
  • Keep privileged changes explicit.
  • Preserve recoverable history outside the context window.

Quick Start

Requirements:

  • Python 3.11+
  • make
  • Docker or Colima for ai-lab and sandbox-backed flows (optional for basic local startup)
git clone https://github.com/srxly888-creator/autonomous-agent-stack.git
cd autonomous-agent-stack

make setup
make doctor
make start

Environment files: make setup creates .env from .env.example when .env is missing. Prefer .env.local for secrets (gitignored). Do not commit real tokens.

Open after startup:

  • API docs: http://127.0.0.1:8001/docs
  • Admin panel: http://127.0.0.1:8001/panel
  • Health check: http://127.0.0.1:8001/health

Validate the local setup:

make test-quick
make smoke-local
make hygiene-check

For detailed setup and troubleshooting, read docs/QUICK_START.md. For remote or multi-machine execution, start with docs/linux-remote-worker.md. If you want to run Hermes on Windows through WSL2 and let the base control plane take over, read docs/windows-wsl2-hermes-control-plane.md.

Native Windows support is currently limited to the minimal local control-plane path: make setup, make doctor, and make start. Other targets still assume Bash and/or macOS/Linux tooling.

Stable Single-Machine Mode

v0.1.0-stable establishes a verified baseline for running AAS on a single machine without external dependencies.

The default mode is minimal (stable), which:

  • Starts reliably with core features only
  • Makes optional routers non-blocking
  • Disables experimental features by default
  • Suitable for local development and testing
# Default: minimal mode (stable)
AUTORESEARCH_MODE=minimal make start

# Full mode: all features (experimental)
AUTORESEARCH_MODE=full make start

What Works in Stable Mode

Feature Status
FastAPI application ✅ Starts at http://127.0.0.1:8001
SQLite control plane artifacts/api/*.sqlite3
AEP runner (mock) ✅ End-to-end execution
Worker schedules ✅ APScheduler-backed once / interval schedules via /api/v1/worker-schedules
Runtime artifact exclusion ✅ Patch hygiene enforced
Health/docs endpoints ✅ All respond correctly

What's Explicitly Out of Scope

  • Distributed execution (requires queue infrastructure)
  • Telegram integration (requires bot token)
  • WebAuthn (requires additional setup)
  • Cluster mode (distributed coordination only)
  • Complex cron syntax and multi-node scheduling

See STATUS_AND_RELEASE_NOTES.md for complete details.

Requirement #4 Ready Baseline

Branch: feat/single-machine-aas-ready-for-req4 Status: ✅ Engineering Scaffold Complete - NOT Production Complete

This branch provides a complete engineering scaffold for requirement #4 (Excel commission processing). All preparation is done - business logic implementation can start immediately when required assets arrive.

⚠️ This is a "stable single-machine requirement-4 ready baseline" - engineering scaffold is complete and verified, but business logic implementation is blocked awaiting business assets.

What's Ready

Component File Status
Commission Engine src/autoresearch/core/services/commission_engine.py ✅ Deterministic interface
Excel Jobs Repository src/autoresearch/core/repositories/excel_jobs.py ✅ SQLite-backed
Excel Ops Service src/autoresearch/core/services/excel_ops.py ✅ Orchestration layer
Excel Ops Router src/autoresearch/api/routers/excel_ops.py ✅ REST API
Models & Contracts src/autoresearch/shared/excel_ops_models.py ✅ Schemas defined
Contract Tests tests/test_excel_ops_service.py ✅ Verify blocking
Validation Script scripts/validate_stable_baseline.sh make validate-req4

Awaiting Business Assets

Asset Purpose Location
Excel contracts File schemas, column mappings tests/fixtures/requirement4_contracts/
Ambiguity checklist 7 categories of edge case decisions tests/fixtures/requirement4_contracts/
Sample Excel files Real input data for testing tests/fixtures/requirement4_samples/
Golden outputs Expected calculation results tests/fixtures/requirement4_golden/

Validate Scaffold

# Validate requirement #4 readiness
make validate-req4

# Run contract tests
pytest tests/test_excel_ops_service.py -v

# Check readiness status
cat docs/requirement4/IMPLEMENTATION_READY_CHECKLIST.md

Safety Guarantees

  • No Silent Calculations: Blocks without valid contracts
  • Deterministic Only: No LLM reasoning in production path
  • Audit Trail: Job state tracked in SQLite
  • Runtime Artifact Exclusion: Patches exclude .masfactory_runtime/, logs/, memory/

See: docs/requirement4/ for complete preparation details.

For implementation:


Controlled Integrations

Controlled Integrations

AAS is designed to integrate agent runtimes without turning them into the trusted core:

  • OpenHands as a constrained worker behind patch-only contracts and promotion gates
  • Codex and custom adapters through controlled execution and AEP-style job specs
  • Remote workers for machine-specific capabilities, credentials, or isolated execution surfaces
  • GitHub and chat-triggered workflows routed back into the same control plane

See docs/openhands-cli-integration.md, docs/agent-execution-protocol.md, and docs/linux-remote-worker.md.

Documentation

Start here:

Go deeper:

Explore integrations and evolution:

Roadmap

Now

A stable single-repo control plane with isolated execution and promotion checks.

Next

  • Session-first recovery and replay
  • Capability registry for heterogeneous workers and tools
  • Policy seams for orchestration strategies
  • Fast policy router vs slow orchestration (butler: rules-first, model fill-in, Hermes for heavy work) — docs/decisions/fast-policy-router-and-slow-orchestration-v1.md
  • Distributed execution with durable queues, leases, and heartbeats

Long Term

A governed runtime substrate for long-running agent work across multiple models, multiple hands, and multiple trust boundaries.

Who This Is For

AAS is for teams that want:

  • autonomous execution without repository ownership
  • durable progress across long-running tasks
  • zero-trust safety boundaries
  • auditable promotion workflows
  • multi-runtime interoperability without surrendering control

Contributing

If you want to contribute, start with CONTRIBUTING.md and ARCHITECTURE.md. Small documentation fixes and focused bug fixes are good first contributions. Architectural changes should start as an RFC in docs/rfc/.

A typical local loop is:

make review-setup
make test-quick
make hygiene-check
make review-gates-local

make review-setup installs mypy, bandit, and semgrep into .venv-review so the main .venv can stay aligned with make setup.

Open an issue or discussion if you want to validate a design direction before implementation.

License

MIT