Skip to content

Multi-agent development pipeline for OpenClaw — 6 AI agents autonomously plan, implement, test, review, and create PRs. SURGE x OpenClaw Hackathon Track 3.

License

Notifications You must be signed in to change notification settings

ExpertVagabond/antfarm-devpipe

Repository files navigation

Antfarm -- Multi-Agent Development Pipeline for OpenClaw

SURGE x OpenClaw Hackathon Track 3: Developer Infrastructure License: MIT

Give it a task. Get back a tested, reviewed PR. Six AI agents handle the rest.

Antfarm is a multi-agent development pipeline built on OpenClaw that takes a task description and autonomously plans, implements, tests, reviews, and creates pull requests. Each agent runs in an isolated session with a defined role, communicating through structured KEY: VALUE pairs and a shared progress log.


What It Does

You describe what you want built. Antfarm decomposes the work into user stories, then six specialized agents execute a pipeline: planning the stories, preparing the environment, implementing each story with tests, verifying correctness, running integration tests, creating a PR, and reviewing the code. The entire process runs autonomously -- you get a tested, reviewed pull request at the end.


Architecture

  Task Description
        |
        v
  +-------------+
  |   Planner   |  Decomposes task into ordered user stories (max 20)
  +-------------+
        |
        v
  +-------------+
  |    Setup    |  Creates branch, discovers build/test commands, establishes baseline
  +-------------+
        |
        v
  +------------------------------------------+
  |         Story Execution Loop             |
  |                                          |
  |   +-------------+     +-------------+   |
  |   |  Developer  | --> |  Verifier   |   |
  |   +-------------+     +-------------+   |
  |         ^                    |           |
  |         |   STATUS: retry    |           |
  |         +--------------------+           |
  |                                          |
  |   For each story:                        |
  |     Developer implements + writes tests  |
  |     Verifier checks acceptance criteria  |
  |     Pass -> next story                   |
  |     Fail -> Developer retries (max 2)    |
  +------------------------------------------+
        |
        v
  +-------------+
  |   Tester    |  Integration/E2E testing across all stories
  +-------------+
        |
        v
  +-------------+
  |     PR      |  Creates pull request with summary
  +-------------+
        |
        v
  +-------------+
  |  Reviewer   |  Reviews PR, approves or requests changes
  +-------------+
        |
        v
      Done

The Developer and Verifier form a tight loop: the Developer implements a single story and writes tests, then the Verifier checks each acceptance criterion, runs the test suite, and either approves or sends specific feedback back for a retry. Each story gets a fresh agent session -- no accumulated context drift.


Agent Roster

Agent Role Type Description
Planner analysis Decomposes tasks into ordered user stories with verifiable acceptance criteria
Setup coding Creates branch, discovers build/test commands, establishes baseline
Developer coding Implements stories one at a time, writes tests, commits with structured messages
Verifier verification Checks each story against acceptance criteria, runs tests, security checks
Tester testing Integration and E2E testing after all stories are implemented
Reviewer analysis Reviews the PR, approves or requests changes with actionable feedback

Each agent has its own workspace with three identity files:

  • AGENTS.md -- operational instructions and process
  • SOUL.md -- personality and decision-making style
  • IDENTITY.md -- name and role declaration

See docs/agent-roles.md for full documentation of each agent.


Real Evidence

This pipeline is not theoretical. It is running RIGHT NOW on three Solana ecosystem repos as part of the Solana Graveyard Hackathon:

Repository Branch Stories What Antfarm Did
ExpertVagabond/tribeca-dao revival/graveyard-hack 9 Anchor 0.30 migration, IDL regeneration, TS SDK migration, demo lifecycle
ExpertVagabond/grape-art revival/graveyard-hack 11 Full dependency modernization, Parcel build fix, marketplace demo
ExpertVagabond/port-lending revival/graveyard-hack 13 Rust dependency updates, cargo build-sbf, test restoration, TS SDK

33 stories total across 3 repositories, all autonomously planned and implemented by Antfarm agents over 48 hours of continuous execution starting February 16, 2026.

See docs/evidence.md for full details and verification instructions.


Quick Start

# Prerequisites: OpenClaw installed and running

# 1. Clone the repo
git clone https://github.com/ExpertVagabond/antfarm-devpipe.git
cd antfarm-devpipe

# 2. Configure environment
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY and GITHUB_TOKEN

# 3. Install the workflow
antfarm workflow install feature-dev

# 4. Run on any task
antfarm workflow run feature-dev "Add user authentication with JWT tokens"

# 5. Monitor progress
antfarm workflow status

How It Works

Step 1: Planning

The Planner agent explores the target codebase, understands the stack and conventions, and decomposes the task into ordered user stories (max 20). Each story is sized to fit in a single agent context window. Stories are ordered by dependency: schema/DB first, then backend, then frontend, then integration. Every acceptance criterion is mechanically verifiable.

Step 2: Setup

The Setup agent creates a feature branch, reads package.json / Cargo.toml / pyproject.toml and CI configs to discover build and test commands, ensures .gitignore exists, and runs the build and test suite to establish a baseline. Downstream agents receive the discovered BUILD_CMD and TEST_CMD.

Step 3: Story Execution Loop

For each story in order, the pipeline spawns a fresh Developer session. The Developer reads progress.txt for codebase patterns discovered by previous stories, implements the story, writes tests, runs the build and test suite, commits, and appends to the progress log. Then the Verifier checks every acceptance criterion, runs tests, performs security checks, and either approves or sends specific retry feedback. Max 2 retries per story before escalating to a human.

Step 4: Integration Testing

After all stories pass verification, the Tester agent runs the full test suite and checks for integration issues between stories, cross-cutting concerns, and E2E flows.

Step 5: PR Creation

The Developer agent creates a pull request with a clear title, description of changes, and test results.

Step 6: Code Review

The Reviewer agent reads the PR diff, checks for code quality, bugs, test coverage, and convention adherence. It posts its review directly to GitHub -- either approving or requesting changes with specific, actionable feedback.


Key Features

Story-Based Execution

Each story gets a fresh agent session. No accumulated context drift. No 200K-token conversations that lose the plot. One story, one session, one commit.

Verification Loops

The Verifier checks every acceptance criterion mechanically. If something fails, it sends specific feedback: "The test asserts on the wrong field -- it checks name but the requirement was about displayName." The Developer retries with that feedback. Max 2 retries before escalating.

Progress Persistence

Agents share knowledge through progress.txt. The Codebase Patterns section at the top captures reusable discoveries: "This project uses node:sqlite DatabaseSync, not async." Each new Developer session reads this before starting, so story 8 benefits from patterns discovered in story 1.

Self-Healing Medic

A medic watchdog runs on a 5-minute cron interval, checking for stalled steps. If an agent is stuck (no progress for too long), the medic resets the step and alerts the main agent. No silent failures.

Staggered Cron Polling

Agents fire in sequence on staggered intervals (0s / 60s / 120s / 180s / 240s / 300s) within a 5-minute cron cycle. This prevents resource contention and ensures orderly pipeline progression.

Security-First Verification

The Verifier blocks sensitive files (.env, *.key, *.pem, credentials), checks .gitignore exists, and scans diffs for hardcoded credentials. Security failures are always a rejection, regardless of whether the code works.


Extending

Antfarm workflows are defined in YAML. You can create custom workflows with different agent configurations, step sequences, and loop structures.

See docs/workflow-authoring.md for a complete guide to writing custom workflows.

See docs/architecture.md for deep technical details on the pipeline, cron system, and communication protocol.


Built With

  • OpenClaw -- Multi-agent orchestration and cron scheduling
  • Antfarm -- Workflow engine and agent runtime
  • Claude (Anthropic) -- AI model powering all agents
  • GitHub CLI -- PR creation and code review

SURGE x OpenClaw Hackathon

Track 3: Developer Infrastructure

Antfarm DevPipe demonstrates how multi-agent pipelines can autonomously handle the full software development lifecycle -- from task decomposition through code review -- using OpenClaw's orchestration primitives. The pipeline is not a demo: it is actively processing real codebases with real PRs.


Author

Matthew Karsten -- Purple Squirrel Media


Links


License

MIT -- see LICENSE for details.

About

Multi-agent development pipeline for OpenClaw — 6 AI agents autonomously plan, implement, test, review, and create PRs. SURGE x OpenClaw Hackathon Track 3.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors