Skip to content

feature: add swarm mode for multi-instance collaboration#117

Open
Zhaoyikaiii wants to merge 3 commits intosipeed:mainfrom
Zhaoyikaiii:feature/swarm-mode
Open

feature: add swarm mode for multi-instance collaboration#117
Zhaoyikaiii wants to merge 3 commits intosipeed:mainfrom
Zhaoyikaiii:feature/swarm-mode

Conversation

@Zhaoyikaiii
Copy link
Collaborator

@Zhaoyikaiii Zhaoyikaiii commented Feb 13, 2026

Summary

  • Add pkg/swarm/ — a NATS-based multi-instance collaboration system that lets multiple PicoClaw nodes discover each other, distribute tasks by capability, and coordinate work through a coordinator/worker architecture
  • Introduce SwarmConfig in pkg/config/ with NATS and optional Temporal settings, plus picoclaw swarm CLI subcommands (start, status, nodes)

Architecture

Coordinator ──NATS──> Worker A (capability: code)
     │                Worker B (capability: research)
     │                Specialist C (capability: ml)
     └── Discovery (heartbeat, node registry, load-based selection)

What's in the box

Component File What it does
Types types.go NodeInfo, SwarmTask, TaskResult, Heartbeat, etc.
NATS Bridge nats.go Pub/sub, queue groups, discovery queries
Embedded NATS embedded.go In-process NATS server for dev/testing
Discovery discovery.go Heartbeat loop, stale node cleanup, capability-based worker selection
Coordinator coordinator.go Task routing (direct/broadcast/workflow), local fallback
Worker worker.go Concurrent task execution, progress reporting, load tracking
Temporal temporal.go Optional workflow engine (graceful degradation if unavailable)
Workflows workflows.go Decompose → parallel execute → synthesize pipeline
Manager manager.go Top-level orchestrator, startup/shutdown sequencing

Covers: type serialization, embedded NATS lifecycle, pub/sub mechanics, discovery registry, coordinator dispatch (direct + local fallback + timeout), worker execution, and full coordinator↔worker integration round-trips.

Introduce a NATS-based swarm system that allows multiple PicoClaw
instances to discover each other, distribute tasks by capability,
and coordinate work through a coordinator/worker architecture.

- Add pkg/swarm/ with 9 source files: types, nats bridge, embedded
  NATS server, discovery, coordinator, worker, temporal client,
  workflows, and manager
- Add SwarmConfig to pkg/config with NATS and Temporal settings
- Add `picoclaw swarm` CLI subcommands (start/status/nodes)
- Add 8 test files with 32 tests and 83+ subtests (all passing)
- Add swarm feature discussion document
@cosmic-gao
Copy link

I want multiple picoclaw nodes to coordinate their work, divided into:

  1. Task planning node (https://www.vibekanban.com/vibe-guide#solve-dev-servers)

  2. Opencode coding node

  3. Requirements design node

  4. Review and analysis node
    The task planning for all nodes will be displayed on vibekanban.

@Zhaoyikaiii
Copy link
Collaborator Author

I want multiple picoclaw nodes to coordinate their work, divided into:

  1. Task planning node (https://www.vibekanban.com/vibe-guide#solve-dev-servers)
  2. Opencode coding node
  3. Requirements design node
  4. Review and analysis node
    The task planning for all nodes will be displayed on vibekanban.

We are adhering to a minimalist implementation to ensure the swarm remains lightweight. The goal is to prove the coordination mechanism works—specifically via NATS—while leaving advanced features like dynamic load balancing or LLM-driven decomposition for subsequent iterations.

@Leeaandrob
Copy link
Collaborator

Why no libp2p to do that? I was implementing it using libp2p. What do you guys think?

yk and others added 2 commits February 14, 2026 05:35
- Fix race condition in worker.go using atomic operations for tasksRunning
- Fix memory leak in discovery.go by GC long-dead nodes (10x timeout)
- Add SwarmConfig.Validate() for configuration validation
- Add input validation in handleNodeJoin to prevent nil/invalid nodes
- Fix log level: discovery query failure now uses Warn instead of Debug
- Update tests to match new behavior

Fixes issues from review:
1. Race condition - worker.go:108
2. Memory leak - discovery.go:220
3. Config validation missing
4. Input validation missing
Resolved conflicts in:
- cmd/picoclaw/main.go: kept both 'state' and 'swarm' imports
- go.mod: merged dependencies (added github copilot sdk, kept temporal mocks)
- go.sum: regenerated via go mod tidy
- pkg/config/config.go: kept both 'Devices' and 'Swarm' config fields

This merge brings in upstream changes including:
- LINE channel support
- OneBot channel support
- GitHub Copilot provider
- Device monitoring service
- Hardware tools (I2C, SPI)
- New workspace file structure

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Leeaandrob
Copy link
Collaborator

@Zepan This PR addresses roadmap issue #284 (Swarm Mode — priority: medium, status: Todo). At +5473 lines, it's the largest open PR.

Important consideration: The roadmap defines a clear dependency chain: #294 (Base Multi-agent Framework) should land first, then #295 (Model Routing), and finally #284 (Swarm Mode). Swarm Mode builds on top of the multi-agent foundation.

PR #131 currently addresses #294 (base multi-agent framework). It would be cleaner to merge #131 first, then evolve this PR to build on that foundation rather than implementing everything from scratch.

Recommendation: Defer until #294 (multi-agent base) is merged. The scope (+5473 lines) is very large and would benefit from being built on top of the base framework rather than as a standalone implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants