feat: gatekeeper sidecar — Rust implementation + design doc + CI#9
feat: gatekeeper sidecar — Rust implementation + design doc + CI#9
Conversation
masami-agent
left a comment
There was a problem hiding this comment.
Bug: Mutex held during Telegram approval (up to 60s)
fetch_with_approval is called while holding the state Mutex lock (line ~78: let mut s = state.lock().await). Inside fetch_with_approval, request_approval polls Telegram for up to 60 seconds — meaning the lock is held for the entire approval window.
During this time, every other incoming socket connection blocks on state.lock().await, making the gatekeeper effectively single-threaded and unresponsive.
Fix: release the lock before waiting for Telegram approval. Extract the fields needed (bot token, chat id, rate limit state) before dropping the lock, then re-acquire only to update last_request after the approval resolves:
async fn handle(stream: UnixStream, state: Arc<Mutex<State>>) -> Result<()> {
// ...parse req...
// 1. check rate limit and extract config — hold lock briefly
let (bot_token, chat_id, secret_name) = {
let mut s = state.lock().await;
if let Some(last) = s.last_request {
if last.elapsed() < Duration::from_secs(RATE_LIMIT_SECS) {
// write error response and return early
}
}
s.last_request = Some(Instant::now());
(s.tg_bot_token.clone(), s.tg_chat_id.clone(), req.name.clone())
}; // lock released here
// 2. wait for Telegram approval — no lock held
let approved = request_approval(&bot_token, &chat_id, &secret_name).await?;
// 3. fetch from AWS if approved — lock not needed (sm_client is Send+Sync)
// ...
}This also requires making sm_client accessible without the Mutex (it's already Clone + Send + Sync), or wrapping it in its own Arc.
Code ReviewSidecar 程式碼、設計文件、CI workflow 都很完整,但目前 PR 只有 gatekeeper 本體,還缺 Helm chart 整合才能實際部署。以下是建議的補充項目: 1. 缺少
|
| 環境 | 做法 |
|---|---|
| EKS | IRSA 或 EKS Pod Identity — SA annotation |
| 非 EKS K8s / K3s | 透過 gatekeeper.secretRef 注入 AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY 環境變數 |
| 不用 AWS SM | 未來可擴充其他 backend(Vault、GCP SM),目前先用 secretRef 注入所有需要的 env |
values 可以加一個通用的 serviceAccount 區塊,讓使用者自行決定是否帶 annotation:
gatekeeper:
serviceAccount:
create: false # 預設不建立,用 Pod 的 SA
name: ""
annotations: {} # EKS 使用者可加 eks.amazonaws.com/role-arn這樣無論是 EKS IRSA、靜態 key、還是其他雲的 workload identity 都能支援。
5. 注意事項
- Socket 路徑衝突:目前主容器已經用 emptyDir 掛載
/tmp,gatekeeper 也寫/tmp/gatekeeper.sock。建議把 socket 改到/var/run/gatekeeper/gatekeeper.sock,主容器和 sidecar 共享這個 emptyDir,避免跟主容器的 tmp 混在一起。對應main.rs的SOCKET_PATH常數也要改。 - Telegram token 來源:gatekeeper 需要
GATEKEEPER_TG_BOT_TOKEN和GATEKEEPER_TG_CHAT_ID,這兩個是 bootstrap secret(gatekeeper 啟動前就要有),無法從 AWS SM 拿。需要透過gatekeeper.secretRef從 K8s Secret 注入。 - SA 是 Pod 級別的:K8s 的
serviceAccountName作用在 Pod 而非 Container,所以無法讓主容器和 sidecar 用不同 SA。如果要做 IAM 隔離,可以:(1) 在非 EKS 環境用secretRef只注入到 sidecar container 的 env,主容器拿不到;(2) 在 EKS 環境用 Pod Identity 的 container-level credential。
建議做法
可以在這個 PR 繼續追加 Helm 整合的 commits,或者先合併現有程式碼,再開 follow-up PR 處理 chart 整合。
Problem
OpenClaw is an AI agent gateway — the agent can execute arbitrary bash, Python, and HTTP tools as part of its reasoning loop. This creates a fundamental security tension:
With naive approaches (env vars, mounted K8s Secrets), the agent and its secrets share the same trust boundary. There is no audit trail and no human in the loop.
Solution
Move secrets entirely out of the agent's trust boundary using a Gatekeeper sidecar.
Pod-level view (container isolation):
Full 3-tier view (end-to-end flow):
Three layers of protection:
Changes
docs/gatekeeper.md— full design doc: problem statement, architecture, threat model, Helm values designgatekeeper/src/main.rs— Rust implementation: Unix socket server, Telegram approval gate, AWS Secrets Manager fetch,zeroizememory cleanup, rate limitinggatekeeper/Cargo.toml— dependencies + release profile (size-optimized)gatekeeper/Dockerfile— musl static build → Alpine (~10MB image).github/workflows/gatekeeper-image.yml— CI: build & push to ghcr.io on changes togatekeeper/**