feat: gatekeeper sidecar — Rust implementation + design doc + CI by thepagent · Pull Request #9 · thepagent/openclaw-helm

thepagent · 2026-03-14T02:11:37Z

Problem

OpenClaw is an AI agent gateway — the agent can execute arbitrary bash, Python, and HTTP tools as part of its reasoning loop. This creates a fundamental security tension:

Giving the agent access to its own secrets means a compromised or misbehaving agent can exfiltrate them.

With naive approaches (env vars, mounted K8s Secrets), the agent and its secrets share the same trust boundary. There is no audit trail and no human in the loop.

Solution

Move secrets entirely out of the agent's trust boundary using a Gatekeeper sidecar.

Pod-level view (container isolation):

┌─────────────────── K8s Pod ───────────────────────┐
│                                                   │
│  ┌─────────────────┐    ┌──────────────────────┐  │
│  │  main (OpenClaw) │    │  gatekeeper (sidecar) │  │
│  │                 │    │                      │  │
│  │  ❌ no secrets   │───►│  ✅ holds secrets     │  │
│  │  ❌ no IAM       │    │  ✅ IAM Role (AWS SM) │  │
│  │                 │◄───│  ✅ Telegram approval │  │
│  └─────────────────┘    └──────────────────────┘  │
│         Unix socket on shared emptyDir             │
└───────────────────────────────────────────────────┘

Full 3-tier view (end-to-end flow):

┌─── Tier 1 ───────┐   ┌─── Tier 2 ──────────────────────────┐   ┌─── Tier 3 ───┐
│  AWS Secrets     │   │  K8s Pod                            │   │  Operator    │
│  Manager         │   │                                     │   │  📱 Telegram │
│                  │   │  ┌────────────┐  ┌───────────────┐  │   │              │
│  openclaw/tokens │   │  │    main    │  │  gatekeeper   │  │   │  [✅ Approve] │
│  - TG_TOKEN      │   │  │ (OpenClaw) │  │   (sidecar)   │  │   │  [❌ Deny]   │
│  - GW_TOKEN      │   │  │            │  │               │  │   │              │
│                  │   │  │ 1. request ──► 2. notify ──────────►  3. operator  │
│                  │   │  │            │  │               │  │   │     taps     │
│                  │◄──────────────────── 5. fetch  ◄──────────── 4. approved  │
│                  │   │  │ 6. secret ◄── 6. return  │    │   │              │
│                  │   │  │  in memory │  │               │  │   │              │
│                  │   │  └────────────┘  └───────────────┘  │   └──────────────┘
│  CloudTrail logs │   │  no IAM, no secrets                 │
│  every access    │   └─────────────────────────────────────┘
└──────────────────┘

Three layers of protection:

AWS Secrets Manager — secrets never stored in K8s etcd or on disk; every access logged in CloudTrail
Gatekeeper sidecar — only container with IAM access; agent cannot exec into it; filesystem isolated
Telegram approval gate — operator must tap Approve on their phone before any secret is returned

Changes

docs/gatekeeper.md — full design doc: problem statement, architecture, threat model, Helm values design
gatekeeper/src/main.rs — Rust implementation: Unix socket server, Telegram approval gate, AWS Secrets Manager fetch, zeroize memory cleanup, rate limiting
gatekeeper/Cargo.toml — dependencies + release profile (size-optimized)
gatekeeper/Dockerfile — musl static build → Alpine (~10MB image)
.github/workflows/gatekeeper-image.yml — CI: build & push to ghcr.io on changes to gatekeeper/**

masami-agent

Bug: Mutex held during Telegram approval (up to 60s)

fetch_with_approval is called while holding the state Mutex lock (line ~78: let mut s = state.lock().await). Inside fetch_with_approval, request_approval polls Telegram for up to 60 seconds — meaning the lock is held for the entire approval window.

During this time, every other incoming socket connection blocks on state.lock().await, making the gatekeeper effectively single-threaded and unresponsive.

Fix: release the lock before waiting for Telegram approval. Extract the fields needed (bot token, chat id, rate limit state) before dropping the lock, then re-acquire only to update last_request after the approval resolves:

async fn handle(stream: UnixStream, state: Arc<Mutex<State>>) -> Result<()> {
    // ...parse req...

    // 1. check rate limit and extract config — hold lock briefly
    let (bot_token, chat_id, secret_name) = {
        let mut s = state.lock().await;
        if let Some(last) = s.last_request {
            if last.elapsed() < Duration::from_secs(RATE_LIMIT_SECS) {
                // write error response and return early
            }
        }
        s.last_request = Some(Instant::now());
        (s.tg_bot_token.clone(), s.tg_chat_id.clone(), req.name.clone())
    }; // lock released here

    // 2. wait for Telegram approval — no lock held
    let approved = request_approval(&bot_token, &chat_id, &secret_name).await?;

    // 3. fetch from AWS if approved — lock not needed (sm_client is Send+Sync)
    // ...
}

This also requires making sm_client accessible without the Mutex (it's already Clone + Send + Sync), or wrapping it in its own Arc.

auto-machine · 2026-03-16T11:41:12Z

Code Review

Sidecar 程式碼、設計文件、CI workflow 都很完整，但目前 PR 只有 gatekeeper 本體，還缺 Helm chart 整合才能實際部署。以下是建議的補充項目：

1. 缺少 `Cargo.lock`

Dockerfile 第 4 行 COPY Cargo.toml Cargo.lock ./，但 PR 沒有附 Cargo.lock，build 會失敗。需要先 cargo generate-lockfile 後一起提交。

2. `values.yaml` — 新增 gatekeeper 區塊

gatekeeper:
  enabled: false                  # opt-in
  image:
    repository: ghcr.io/thepagent/openclaw-gatekeeper
    tag: latest
    pullPolicy: IfNotPresent
  aws:
    region: ap-northeast-1
    secretsManagerPath: openclaw/tokens
  telegram:
    approvalTimeoutSeconds: 60
    rateLimitMinutes: 5
  # gatekeeper 自身需要的 secrets（TG bot token、chat ID）
  # 可以從現有 K8s Secret 注入，或直接在 values 設定
  secretRef: ""                   # 引用已存在的 K8s Secret
  resources:
    limits:
      cpu: 100m
      memory: 64Mi
    requests:
      cpu: 50m
      memory: 32Mi

3. `deployment.yaml` — 條件注入 sidecar

當 gatekeeper.enabled=true 時：

加入 gatekeeper sidecar container，掛 env（GATEKEEPER_TG_BOT_TOKEN、GATEKEEPER_TG_CHAT_ID）
新增 shared emptyDir volume gatekeeper-sock，掛載到獨立路徑（見注意事項第一點）
主容器也掛載同一個 volume

{{- if .Values.gatekeeper.enabled }}
- name: gatekeeper
  image: "{{ .Values.gatekeeper.image.repository }}:{{ .Values.gatekeeper.image.tag }}"
  volumeMounts:
  - name: gatekeeper-sock
    mountPath: /var/run/gatekeeper
  envFrom:
  {{- if .Values.gatekeeper.secretRef }}
  - secretRef:
      name: {{ .Values.gatekeeper.secretRef }}
  {{- end }}
  resources:
    {{- toYaml .Values.gatekeeper.resources | nindent 10 }}
{{- end }}

4. AWS 憑證注入方式

設計文件假設了 IRSA（EKS），但不是所有人都部署在 EKS 上。建議支援多種方式：

環境	做法
EKS	IRSA 或 EKS Pod Identity — SA annotation
非 EKS K8s / K3s	透過 `gatekeeper.secretRef` 注入 `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` 環境變數
不用 AWS SM	未來可擴充其他 backend（Vault、GCP SM），目前先用 `secretRef` 注入所有需要的 env

values 可以加一個通用的 serviceAccount 區塊，讓使用者自行決定是否帶 annotation：

gatekeeper:
  serviceAccount:
    create: false                 # 預設不建立，用 Pod 的 SA
    name: ""
    annotations: {}               # EKS 使用者可加 eks.amazonaws.com/role-arn

這樣無論是 EKS IRSA、靜態 key、還是其他雲的 workload identity 都能支援。

5. 注意事項

Socket 路徑衝突：目前主容器已經用 emptyDir 掛載 /tmp，gatekeeper 也寫 /tmp/gatekeeper.sock。建議把 socket 改到 /var/run/gatekeeper/gatekeeper.sock，主容器和 sidecar 共享這個 emptyDir，避免跟主容器的 tmp 混在一起。對應 main.rs 的 SOCKET_PATH 常數也要改。
Telegram token 來源：gatekeeper 需要 GATEKEEPER_TG_BOT_TOKEN 和 GATEKEEPER_TG_CHAT_ID，這兩個是 bootstrap secret（gatekeeper 啟動前就要有），無法從 AWS SM 拿。需要透過 gatekeeper.secretRef 從 K8s Secret 注入。
SA 是 Pod 級別的：K8s 的 serviceAccountName 作用在 Pod 而非 Container，所以無法讓主容器和 sidecar 用不同 SA。如果要做 IAM 隔離，可以：(1) 在非 EKS 環境用 secretRef 只注入到 sidecar container 的 env，主容器拿不到；(2) 在 EKS 環境用 Pod Identity 的 container-level credential。

建議做法

可以在這個 PR 繼續追加 Helm 整合的 commits，或者先合併現有程式碼，再開 follow-up PR 處理 chart 整合。

thepagent added 3 commits March 14, 2026 02:11

docs: add gatekeeper sidecar design doc

b3f3e70

docs: add problem statement to gatekeeper design doc

346a33d

feat: add gatekeeper sidecar in Rust + CI workflow

4a6a928

thepagent changed the title ~~docs: add gatekeeper sidecar design doc~~ feat: gatekeeper sidecar — Rust implementation + design doc + CI Mar 14, 2026

masami-agent reviewed Mar 16, 2026

View reviewed changes

thepagent mentioned this pull request Mar 16, 2026

Would like to know more about how to implement AWS SM. #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: gatekeeper sidecar — Rust implementation + design doc + CI#9

feat: gatekeeper sidecar — Rust implementation + design doc + CI#9
thepagent wants to merge 3 commits intomainfrom
docs/gatekeeper-design

thepagent commented Mar 14, 2026 •

edited

Loading

Uh oh!

masami-agent left a comment

Uh oh!

auto-machine commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thepagent commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Uh oh!

masami-agent left a comment

Choose a reason for hiding this comment

Uh oh!

auto-machine commented Mar 16, 2026

Code Review

1. 缺少 Cargo.lock

2. values.yaml — 新增 gatekeeper 區塊

3. deployment.yaml — 條件注入 sidecar

4. AWS 憑證注入方式

5. 注意事項

建議做法

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thepagent commented Mar 14, 2026 •

edited

Loading

1. 缺少 `Cargo.lock`

2. `values.yaml` — 新增 gatekeeper 區塊

3. `deployment.yaml` — 條件注入 sidecar