Skip to content

docs: hiclaw-controller 重构与 K8s 部署设计方案 || docs: hiclaw-controller reconstruction and K8s deployment design plan#551

Merged
johnlanni merged 1 commit intoagentscope-ai:mainfrom
johnlanni:docs/hiclaw-controller-refactor-design
Apr 3, 2026
Merged

docs: hiclaw-controller 重构与 K8s 部署设计方案 || docs: hiclaw-controller reconstruction and K8s deployment design plan#551
johnlanni merged 1 commit intoagentscope-ai:mainfrom
johnlanni:docs/hiclaw-controller-refactor-design

Conversation

@johnlanni
Copy link
Copy Markdown
Collaborator

@johnlanni johnlanni commented Apr 3, 2026

Summary

  • 新增 hiclaw-controller 重构与 K8s 部署的完整设计文档(1626 行)
  • 覆盖 controller 独立容器化、incluster 模式、Manager 可选部署、Team Leader 增强、DebugWorker、Helm Chart、平滑升级等核心设计
  • 重点关注:升级平滑性(Skill 热更新 + Per-Worker/Team 镜像独立升级)、Manager 职责分离、Team Leader 权限隔离、K8s 下 Debug 自助能力

设计要点

  • hiclaw-controller 剥离为独立容器(合并 docker-proxy),同时支持 embedded 和 incluster 模式
  • WorkerBackend 抽象层统一 Docker/K8s/ACK 后端
  • Reconciler 去脚本化,纯 Go 实现(MatrixClient + OSSClient + HigressClient)
  • Manager Agent 变为可选部署,资源管理类 skill 统一改为 hiclaw CLI 调用
  • Team Leader 新增 Heartbeat 机制和 Worker 生命周期管理(含权限隔离)
  • DebugWorker CRD:实时挂载目标成员工作目录 + 内置 debug-analysis skill
  • 平滑升级:Skill/配置通过 OSS 热更新不重启,Per-Worker/Team 独立镜像升级

Test plan

  • 团队 review 设计文档
  • 确认 embedded 模式下 controller 独立容器的部署拓扑
  • 确认 Team Leader 权限隔离方案
  • 确认 DebugWorker 实时工作目录挂载方案
  • 确认升级机制的两种模式(仅推送配置 vs 推送并更新全量)

🤖 Generated with Claude Code


Summary

  • Added complete design document for hiclaw-controller reconstruction and K8s deployment (line 1626)
  • Covers core designs such as controller independent containerization, include mode, Manager optional deployment, Team Leader enhancement, DebugWorker, Helm Chart, smooth upgrade, etc.
  • Focus on: Upgrade smoothness (Skill hot update + Per-Worker/Team image independent upgrade), Manager responsibility separation, Team Leader permission isolation, Debug self-service capability under K8s

Design points

  • hiclaw-controller is stripped into an independent container (merged with docker-proxy), supporting both embedded and incluster modes
  • WorkerBackend abstraction layer unifies Docker/K8s/ACK backend
  • Reconciler is descripted and implemented in pure Go (MatrixClient + OSSClient + HigressClient)
  • Manager Agent becomes optional for deployment, and resource management skills are changed to hiclaw CLI calls
  • Team Leader adds Heartbeat mechanism and Worker life cycle management (including permission isolation)
  • DebugWorker CRD: real-time mounting of target member working directories + built-in debug-analysis skill
  • Smooth upgrade: Skill/configuration through OSS hot update without restarting, Per-Worker/Team independent image upgrade

Test plan

  • Team review design document
  • Confirm the deployment topology of the controller independent container in embedded mode
  • Confirm Team Leader permission isolation scheme
  • Confirm DebugWorker real-time working directory mounting scheme
  • Confirm two modes of upgrade mechanism (push configuration only vs push and update full volume)

🤖 Generated with Claude Code

Design document covering:
- hiclaw-controller separation as independent container (merged with docker-proxy)
- WorkerBackend abstraction layer (Docker/K8s/ACK)
- Pure Go reconcilers replacing bash scripts
- Manager Agent optional deployment with clean responsibility separation
- Team Leader heartbeat mechanism and worker lifecycle management with permission isolation
- Manager/DebugWorker CRD definitions
- DebugWorker with real-time workspace mounting and built-in debug-analysis skill
- Helm Chart structure and values
- Smooth upgrade mechanism: per-Worker/Team image upgrades, config hot-push via OSS
- hiclaw CLI incluster mode with dual-mode ResourceClient

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot changed the title docs: hiclaw-controller 重构与 K8s 部署设计方案 docs: hiclaw-controller 重构与 K8s 部署设计方案 || docs: hiclaw-controller reconstruction and K8s deployment design plan Apr 3, 2026
@johnlanni johnlanni merged commit 271306c into agentscope-ai:main Apr 3, 2026
1 check passed
Comment on lines +540 to +544
# Team Leader 唤醒 Worker(仅限本 Team)
hiclaw worker wake --name alpha-dev --team alpha-team

# Team Leader 休眠 Worker(仅限本 Team)
hiclaw worker sleep --name alpha-dev --team alpha-team
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

唤醒和睡眠 worker 有什么应用场景吗?我的理解是 worker 不是一直就在那等着 manager 来分配任务吗?

Comment on lines +754 to +790
### 6.3 DebugWorker 核心设计

DebugWorker 的核心能力是实时访问调试目标的所有成员工作目录,并通过内置的 debug skill 生成调试日志、结合源码分析问题。

工作目录实时挂载:

```
DebugWorker 容器内的目录结构:

/root/debug/
├── workspaces/ # 实时同步的目标成员工作目录(通过 mc mirror)
│ ├── alpha-lead/ # Team Leader 的完整工作目录
│ │ ├── SOUL.md
│ │ ├── AGENTS.md
│ │ ├── team-state.json
│ │ ├── skills/
│ │ ├── sessions/ # LLM 请求/响应日志
│ │ └── memory/
│ ├── alpha-dev/ # Worker 的完整工作目录
│ │ ├── SOUL.md
│ │ ├── openclaw.json
│ │ ├── skills/
│ │ ├── sessions/
│ │ └── memory/
│ └── alpha-qa/
│ └── ...
├── matrix-export/ # Matrix 消息导出(按需生成)
│ ├── team-room.json
│ ├── alpha-lead-room.json
│ └── alpha-dev-room.json
├── hiclaw-source/ # hiclaw 指定版本的源码
│ ├── manager/
│ ├── hiclaw-controller/
│ └── ...
└── output/ # debug skill 生成的分析报告
└── debug-report-20260403.md
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我在想是否有必要专门提供 DebugWorker CRD 用于排查问题:是不是可以直接把 debug 所需的目录开放给 manager,同时把 debug skill 也内置到 manager 里?这样一来,遇到需要排查的问题时,直接问 manager 就可以了。

如果还希望在 manager 本身也挂掉的情况下保留 debug 能力,我觉得可以再提供一个 skill 或者 plugin,让 Claude Code 能够自动进入进行排查。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants