feat(orchestrator): unified Worker lifecycle service replacing docker-proxy#451
Draft
Jing-ze wants to merge 12 commits intoagentscope-ai:mainfrom
Draft
feat(orchestrator): unified Worker lifecycle service replacing docker-proxy#451Jing-ze wants to merge 12 commits intoagentscope-ai:mainfrom
Jing-ze wants to merge 12 commits intoagentscope-ai:mainfrom
Conversation
d9c3198 to
7139003
Compare
Contributor
Author
|
CI failure is expected — the Local |
b4d969d to
325c57d
Compare
…le service Rename docker-proxy/ to orchestrator/ and restructure into a multi-package Go service that exposes both a unified Worker lifecycle REST API and the existing Docker API passthrough. - Add WorkerBackend/GatewayBackend interfaces for pluggable backends - Implement DockerBackend (Create/Delete/Start/Stop/Status/List via socket) - Add /workers/* REST API with proper HTTP status mapping (409/404/503) - Add /gateway/* API stubs (501, Phase 2 will implement APIG backend) - Preserve Docker API passthrough with SecurityValidator for backward compat - Add Backend Registry with auto-detection (Docker first, SAE in Phase 2) - Update Makefile, CI workflows, install scripts with new names - Comprehensive test coverage: backend, registry, handler, security Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rvice Phase 2 of the orchestrator refactoring. Transforms the service from a Docker-only proxy into a full cloud-capable control plane. - SAE Backend: manage worker lifecycle via Alibaba Cloud SAE API (Go SDK v4) - APIG Backend: manage AI Gateway consumers (Go SDK v6) - Auth middleware: two-tier auth with static manager key + per-worker API keys - STS Token Service: centralized credential issuance with per-worker OSS policy - OSS key persistence: worker API keys stored in OSS for recovery across restarts - Worker shell rewrite: oss-credentials.sh now uses orchestrator-mediated STS refresh - Shared httputil package: consolidated writeJSON/writeError across packages Workers have no OIDC capability — orchestrator is the sole credential issuer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 3: replace direct Docker API calls and Python/Shell SAE wrappers with thin orchestrator REST API client. - Rewrite container-api.sh: worker_backend_* now call orchestrator /workers/* API - Simplify gateway-api.sh: cloud path calls orchestrator /gateway/* API - Simplify create-worker.sh Step 9: unified orchestrator call, no Docker/SAE split - Delete aliyun-sae.sh and aliyun-api.py (replaced by orchestrator Go backends) - Remove Python SDK dependencies from Dockerfile.aliyun Net deletion: ~1100 lines of shell/Python code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Unify HICLAW_CONTAINER_API → HICLAW_ORCHESTRATOR_URL (single env var) 2. Remove HICLAW_RUNTIME from create-worker.sh (orchestrator decides) 3. Make image optional in worker create API (backend provides default) 4. Add Timestamp to STS AssumeRoleWithOIDC call 5. SAEBackend.Create() auto-injects HICLAW_RUNTIME=aliyun into worker env 6. oss-credentials.sh: support dual path (RRSA direct + orchestrator mediated) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SAEBackend.Create() polls DescribeApplicationStatus until RUNNING (max 120s)
- New POST /workers/{name}/ready endpoint for worker self-reporting
- GET /workers/{name} merges readiness: running + reported ready = "ready"
- Worker entrypoints (openclaw + copaw) report ready to orchestrator in background
- New worker_backend_wait_ready() in container-api.sh for unified readiness polling
- create-worker.sh Step 9 uses unified wait instead of Docker exec-based polling
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eliminate all backend-specific logic from handler and main layers: - Add NeedsCredentialInjection() to WorkerBackend interface - Move credential injection (API key, orchestrator URL, HICLAW_RUNTIME) into SAEBackend.Create() — handler no longer checks b.Name() - Replace cfg.Runtime == "aliyun" checks with config-driven backend registration (buildBackends function) - Delete IsAliyunRuntime() global function - Delete Config.Runtime field - Backend Available() now checks own config, not global env var "aliyun" string now only exists inside sae.go (backend internal). Handler and main layers are fully backend-agnostic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each backend now declares its deployment mode ("local" or "cloud") via
the DeploymentMode() interface method. The API response includes a new
deployment_mode field, eliminating the backend-name-to-mode translation
in create-worker.sh (5 lines → 1 line).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g after rebase Upstream refactor(network) replaced ExtraHosts with Docker network aliases on the manager container. Remove leftover ExtraHosts injection in create-worker.sh and duplicate hiclaw-net setup in install scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add ensureImage() to auto-pull missing images before container create - Handle 409 Conflict by deleting existing container and retrying once - Add ExposedPorts/PortBindings support for CoPaw console port mapping with port conflict retry (up to 10 attempts) - Pass complete env vars (FS credentials, orchestrator URL) when recreating workers in lifecycle-worker.sh and start-manager-agent.sh - Pass HICLAW_WORKER_IMAGE and HICLAW_COPAW_WORKER_IMAGE to orchestrator container in install scripts so it knows which images to use - Extract console_host_port from orchestrator response in create-worker.sh and enable-worker-console.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… persist docs, delete bug - Worker/CoPaw readiness reporters now heartbeat every 60s after initial ready, so orchestrator restarts self-heal without persistence - Add comment documenting persist-outside-lock trade-off in keys.go - Fix _detect_worker_backend call in lifecycle-worker.sh action_delete (function was removed in refactor, replaced with container_api_available) - Add backward-compat env var fallback for HICLAW_INSTALL_DOCKER_PROXY_IMAGE - Update stale comment in copaw-worker-entrypoint.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b52bd41 to
90095d7
Compare
…or rename) Merge origin/main into feature branch, combining the new manager-copaw build targets from main with the docker-proxy → orchestrator rename from this branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR replaces the
docker-proxy(a simple Docker API security proxy) withhiclaw-orchestrator— a unified Worker lifecycle service that abstracts away the underlying compute platform. The orchestrator exposes a single REST API that Manager and Workers interact with, regardless of whether workers run as local Docker containers or cloud SAE applications.Architecture
What changed
Phase 1 — Restructure: Renamed
docker-proxy/toorchestrator/, restructured intoproxy/,backend/,api/packages. DefinedWorkerBackendandGatewayBackendinterfaces. ImplementedDockerBackend. Added unified/workers/*REST API while preserving Docker API passthrough for exec/logs.Phase 2 — Cloud backends: Implemented
SAEBackend(Alibaba Cloud SAE) andAPIGBackend(AI Gateway consumer management) using Go SDKs, replacingaliyun-api.pyandaliyun-sae.sh. Added two-tier auth (static manager key + per-worker API keys). Added centralized STS token service — workers no longer need OIDC credentials; the orchestrator issues scoped OSS tokens viaPOST /credentials/sts. Worker API keys persisted to OSS for recovery across restarts.Phase 3 — Shell simplification: Rewrote
container-api.shas a thin orchestrator API client (~170 lines, down from ~730). Simplifiedgateway-api.shcloud path. Simplifiedcreate-worker.shStep 9 into a single unified orchestrator call. Deletedaliyun-api.py(527 lines) andaliyun-sae.sh(81 lines). Removed Python SDK dependencies fromDockerfile.aliyun.Readiness detection: SAE
Create()polls until the application reaches RUNNING state. Workers self-report readiness viaPOST /workers/{name}/readyafter agent initialization.GET /workers/{name}merges backend status with readiness:running+ reported ready =ready. Unifiedworker_backend_wait_readyreplaces Docker exec-based health polling.Backend abstraction:
WorkerBackendinterface includesNeedsCredentialInjection()capability method. All backend-specific logic (credential injection, runtime env vars) is encapsulated inside each backend'sCreate(). Handler and main layers are fully backend-agnostic — no runtime string checks, no backend name matching. Adding a new backend (K8s, ACS) only requires implementing the interface.Key design decisions
HICLAW_ORCHESTRATOR_URLreplaces bothHICLAW_CONTAINER_APIand the oldHICLAW_ORCHESTRATOR_URL(unified)HICLAW_SAE_WORKER_IMAGEenables SAE), not runtime-string-drivenTest plan
cd orchestrator && go test ./...— all packages passgrep -r '"aliyun"' orchestrator/only appears insae.go(backend internal)grep -r 'IsAliyunRuntime' orchestrator/returns 0 resultsgrep -r 'b.Name()' orchestrator/api/returns 0 resultsmake build-orchestrator— Docker image builds🤖 Generated with Claude Code