You want Roo/Qoder (running on a laptop over public internet) to work with:
- Azure Codex via the Responses API (not Chat Completions)
- Azure embeddings for codebase indexing
Roo/Qoder currently struggles with Azure model/operation mismatches. A gateway normalizes the surface to OpenAI-compatible endpoints and handles Azure-specific routing.
- Provide a single, stable OpenAI-compatible base URL per environment.
- Support:
POST /v1/responsesrouted to Azure Responses endpoint for configurable model (default:gpt-5.3-codex).POST /v1/embeddingsrouted to Azure embeddings deployment.
- Enable multiple environments (dev/staging/prod) and multiple downstream projects.
- Infrastructure managed with Terraform.
- CI/CD via GitHub Actions using Azure OIDC (no long-lived secrets).
- “Get it working” first; hardening follows.
- Private-only networking end-to-end (phase 2)
- Fine-grained org-wide chargeback and per-user quotas (phase 2)
- Complex policy engine/redaction pipeline (phase 2)
devstagingprod
Each env is independently deployable.
Use: pvc-{env}-{projname}-{resourcetype}-{location}
{projname}=aigateway{location}defaultsan(southafricanorth)
Examples:
- Resource group:
pvc-dev-aigateway-rg-san - Log Analytics:
pvc-dev-aigateway-law-san - Container Apps env:
pvc-dev-aigateway-cae-san - Container App:
pvc-dev-aigateway-ca-san - Key Vault:
pvc-dev-aigateway-kv-san - Storage (tfstate):
pvc-dev-aigateway-st-san - App Insights (optional):
pvc-dev-aigateway-ai-san
- You (developer) using Roo/Qoder from a laptop over the public internet.
- Later: CI agents, teammates, other internal tools.
Gateway must expose:
POST /v1/responsesPOST /v1/embeddings
/v1/responses→ Azure.../openai/responses?api-version=<var>using<model_var>/v1/embeddings→ Azure.../openai/deployments/<embed>/embeddings?api-version=<var>
- Simple shared secret header (fastest):
x-gateway-key: <secret>orAuthorization: Bearer <secret>(LiteLLM standard). - Reject requests without the header.
- Store Azure API keys in Key Vault.
- Inject into Container App as secrets/env vars.
- Basic rate limit to prevent indexing storms.
- Retry on transient 429/5xx with bounded backoff.
- Each env has its own gateway URL and secrets.
- Target 99% for v1 (it’s a dev tool, but should not be flaky).
- No secrets in repo.
- Keys stored in Key Vault.
- Gateway enforces client auth header.
- Minimal logging of request bodies (avoid storing source code prompts).
- Central logging in Log Analytics.
- Track: request counts, latency, 4xx/5xx, 429, upstream failures.
- Scale-to-zero or low minimum scale.
- Optional concurrency limits.
Azure Container Apps (ACA)
- Low ops
- Good revision/rollback
- Built-in scaling
Phase 1 (Get it working): External ingress
- Required because client is laptop on public internet.
- Mitigate with:
- Gateway auth header
- Optional IP allowlist (if your egress IP is stable)
Phase 2 (Harden):
- Front Door + WAF, or private ingress/VNET if you move clients inside Azure.
- Resource group
- Log Analytics workspace
- Container Apps Environment
- Container App (LiteLLM)
- Key Vault
- Storage Account for Terraform state (or shared central tfstate)
-
Repo Structure:
docs/- Documentation.infra/modules/aigateway_aca- Core Terraform module.env/dev|staging|prod- Environment-specific configurations.
.github/workflows/- CI/CD pipelines.scripts/- Helper scripts (bootstrap).
-
Phase 0: Bootstrap
- Script to create Azure Storage Account for Terraform state backend.
- Script to configure Azure OIDC (App Registration, Service Principal, Federated Credentials) for GitHub Actions.
-
Phase 1: Terraform & CI/CD
- Terraform defines infra.
- GitHub Actions deploys using Azure OIDC.
- Dev auto-apply on merge; Staging/Prod gated with environment approvals.
- Roo/Qoder can use gateway for coding with configured model (default
gpt-5.3-codex) withoutchatCompletion operation does not work. - Codebase indexing completes using embeddings through the gateway.
- Dev/staging/Prod are reproducible via Terraform + Actions.
- No secrets committed.
- Public ingress risk → auth header + (optional) IP allowlist + minimal logs.
- Azure API-version drift → pin versions in config, add smoke tests in pipeline.
- Roo endpoint expectations → keep gateway strictly OpenAI-compatible.
- M0: Repo setup, Bootstrap scripts (OIDC, State Backend).
- M1: Dev env deployed; smoke tests pass; Roo works.
- M2: staging + Prod; environment approvals.
- M3: Hardening (Front Door/WAF, Entra auth).