Fix/terra alignment by JustAGhosT · Pull Request #69 · phoenixvc/sluice

JustAGhosT · 2026-03-15T12:26:13Z

Pull Request Checklist

Summary

What changed?
Why was it needed?

Validation

Local checks run (if applicable)
Relevant workflow/jobs observed

Deployment Notes

No environment/config changes required
Environment/config changes required (describe below)

Staging Toggle (PRs to `main`)

Add label run-staging to this PR to enable staging deployment (deploy-staging).
Remove label run-staging to skip staging deployment.

Risk / Rollback

Risk level: low / medium / high
Rollback plan:

Summary by CodeRabbit

Documentation
- Added Prometheus telemetry and retention guidance, new API routing docs for Azure OpenAI via LiteLLM, Webhook Auth in gateway docs, multiple formatting/clarity updates across architecture references, and a new AGENTS guide.
Improvements
- Enhanced deployment workflows with per-environment selection, configurable smoke-test attempt count, expanded optional state/dashboard and monitoring configuration, and support for mapping additional credentials and environment variables.

- Add C3 connection in container architecture flowchart - Rename "Azure Monitor" to "Application Insights" in telemetry diagram - Add Prometheus as a telemetry sink with implementation details - Document retention policies for Application Insights - Update matrix gateway JSON example with new field names - Fix confidence threshold in matrix-gateway from 0.70 to 0.75 - Update SOP suggestion threshold from 0.78 to 0.8 - Fix code block formatting in multiple files

Add environment input parameter to deploy-environment.yaml workflow to specify GitHub environment, improving deployment control and security. Replace hardcoded environment settings in deploy.yaml with the new parameter. Also fix code fences in documentation to use text format and update various documentation details.

…ttempts and update default value

…ttempts and update default value # Pull Request Checklist ## Summary - What changed? - Why was it needed? ## Validation - [ ] Local checks run (if applicable) - [ ] Relevant workflow/jobs observed ## Deployment Notes - [ ] No environment/config changes required - [ ] Environment/config changes required (describe below) ## UAT Toggle (PRs to `main`) - Add label `run-uat` to this PR to enable UAT deployment (`deploy-uat`). - Remove label `run-uat` to skip UAT deployment. ## Risk / Rollback - Risk level: low / medium / high - Rollback plan:

blocksorg · 2026-03-15T12:26:16Z

Mention Blocks like a regular teammate with your question or request:

@blocks review this pull request
@blocks make the following changes ...
@blocks create an issue from what was mentioned in the following comment ...
@blocks explain the following code ...
@blocks are there any security or performance concerns?

Run @blocks /help for more information.

Workspace settings | Disable this message

chatgpt-codex-connector · 2026-03-15T12:26:18Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

coderabbitai · 2026-03-15T12:26:29Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Updates CI/CD workflows to add new inputs, per-job environments, secrets, and TF_VAR mappings (Azure auth, Terraform backend, state/dashboard images, Grafana, EXPECTED_AOAI_ENDPOINT_HOST), renames smoke test wait parameter, and applies related infra and documentation adjustments.

Changes

Cohort / File(s)	Summary
GitHub Actions: deploy workflows `\.github/workflows/deploy-environment.yaml`, `\.github/workflows/deploy.yaml`	Added inputs (`smoke_models_wait_attempts`, `environment`, state/dashboard TF_VARs), new per-job secrets (AZURE_, TF_BACKEND_, EXPECTED_AOAI_ENDPOINT_HOST, STATE_SERVICE_*, DASHBOARD/GRAFANA), mapped env vars/TF_VARs, set job-level environment, and updated smoke-test step to use the new attempts input.
Infra — Application Insights output `infra/modules/aigateway_aca/outputs.tf`	Replaced sensitive `application_insights_connection_string` output with non-sensitive `application_insights_name`; guidance updated to retrieve connection string from Key Vault.
Infra — state/dashboard token trim change `infra/modules/dashboard_aca/main.tf`, `infra/modules/state_service_aca/main.tf`	Changed whitespace trimming from `trim` to `trimspace` for `use_shared_token` checks, altering inclusion logic for shared-token secrets/env.
Infra env vars `infra/env/dev/terraform.tfvars`	Added `state_service_container_image`, `state_service_registry_username`, `state_service_registry_password` assignments for dev environment.
Docs — container & observability `docs/architecture/02-container-architecture.md`, `docs/architecture/04-observability-telemetry.md`	Added Webhook Auth node/edges to gateway diagram; introduced Prometheus as a telemetry sink with /metrics callbacks and retention guidance for Application Insights (Prod 90d, Non-prod 30d).
Docs — systems & routing `docs/architecture/systems/*`, `docs/architecture/systems/ai-gateway.md`	Standardized code fence language to `text` and added a v1 API routing section describing Azure OpenAI via LiteLLM and routing rules for `/v1/responses` and `/v1/embeddings`.
Docs — reference & policies `docs/architecture/reference/*`, `docs/architecture/reference/strategic/07-deployment-model.md`	Updated example payload fields (`request_id`, `label`, `recommended_tier`), added migration note, tightened thresholds (0.70→0.75, 0.78→0.80), expanded fallback pattern, and converted code fences to `text`.
Docs — planning `docs/planning/request_to_token_attribution.md`	Terminology and labeling updates (Required→Recommended, Method A → Preferred), clarified Application Insights wiring via Key Vault, and updated action items.
Docs — minor formatting/others `docs/architecture/reference/slm-`, `docs/architecture/systems/`, `docs/architecture/systems/codeflow-engine.md`	Converted several code fences to `text`, small formatting tweaks, and minor content adjustments (escaping braces, wording changes).
Ops guide `AGENTS.md`	Added AGENTS.md with developer guidance for AI coding agents, tech stack, build/test commands, and style conventions.
Repo docs & README `README.md`, `CLAUDE.md`	Project rename references updated (ai-gateway → sluice), README expanded with components, observability, CI/CD, and ecosystem links; CLAUDE onboarding links updated.
Misc — workflow per-env inputs `\.github/workflows/deploy.yaml` (per-env blocks)	Removed inline environment lines in favor of explicit `environment:` entries per job and added per-environment input defaults for smoke attempts and optional state/dashboard values.

Sequence Diagram(s)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Fix/grafana env local #54: Overlaps on dashboard/Grafana and state_service wiring (TF_VARs and workflow dashboard steps).
Feat/comprehensive cicd multi env 810071297942926866 #9: Related edits to GitHub Actions deploy workflows and per-environment inputs/secrets.
feat: add Azure OpenAI endpoint validation and deployment discovery to smoke test #23: Adds/uses EXPECTED_AOAI_ENDPOINT_HOST and smoke-test AOAI validation—directly connected to the new secret here.

Poem

🐇 I hop through YAML, neat and spry,
Secrets tucked where pipelines lie,
Grafana hums and metrics chime,
Key Vault keeps the strings in time,
I nibble bugs and push — goodbye! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely a blank template with unchecked items and no substantive content. The "What changed?" and "Why was it needed?" prompts are unanswered, leaving no context for reviewers.	Fill in the Summary section with concrete details about what was changed (workflow updates, Terraform vars, documentation) and why (alignment, security, feature support). Complete or check relevant validation and deployment items.
Title check	❓ Inconclusive	The title "Fix/terra alignment" is vague and overly generic. It uses a file/area reference ("terra") without clearly describing what the actual change is or why it matters.	Revise the title to be more specific and descriptive of the main change, such as "Add state service configuration to deployment workflows" or "Update Terraform variables for state service integration".

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/terra-alignment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

docs/architecture/reference/matrix-gateway.md (1)
48-49: Document migration impact for renamed response fields.

recommended_tier/cacheable look consistent here, but this is a contract change for clients. Please add a short migration note (old → new fields, deprecation window/version) in this doc section to reduce integration risk.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture/reference/matrix-gateway.md` around lines 48 - 49, Add a
short migration note in the Matrix Gateway reference section describing the
rename/contract change for the response fields: explicitly state the old field
names (e.g., if any) and the new fields recommended_tier and cacheable, show
example mapping (old → new), and declare the deprecation window/version and
recommended upgrade steps for clients; update the section near the response
schema so consumers know the change, timeline, and how to handle backwards
compatibility (e.g., fallback logic or versioned endpoints).
docs/architecture/02-container-architecture.md (1)
60-60: Model webhook verification explicitly in the flow.

Line 60 introduces direct GitHub Webhook ingress; consider adding a distinct “signature verification/auth” step in the diagram so trust boundaries are explicit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture/02-container-architecture.md` at line 60, The diagram
currently connects C3 directly to G1 via "C3 --> G1"; insert an explicit
verification/auth step between them (e.g., create a node like "W_Verify" or
"WebhookAuth" and change the flow to "C3 --> WebhookAuth --> G1") and label that
node to indicate signature validation/authorization so the trust boundary for
GitHub webhook ingress is clear in the diagram and documentation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/deploy-environment.yaml:
- Around line 31-35: Add a missing input declaration for smoke_models_wait_sleep
to the reusable workflow inputs so inputs.smoke_models_wait_sleep is defined;
specifically add an input block named smoke_models_wait_sleep (e.g., required:
false, type: string, default: "10", description: Sleep seconds between model
availability checks) alongside the existing smoke_models_wait_attempts
declaration so callers that pass smoke_models_wait_sleep continue to work.

In `@docs/architecture/04-observability-telemetry.md`:
- Line 79: The description reverses Prometheus' role: update the sentence that
says "The Prometheus exporter exposes a `/metrics` endpoint that scrapes
application metrics" to indicate that Prometheus scrapes the `/metrics` endpoint
(e.g., "The Prometheus exporter exposes a `/metrics` endpoint which is scraped
by Prometheus") and ensure the surrounding text still references
`success_callback`/`failure_callback` containing "prometheus" and the container
config at `infra/modules/aigateway_aca/main.tf:95-113`.

---

Nitpick comments:
In `@docs/architecture/02-container-architecture.md`:
- Line 60: The diagram currently connects C3 directly to G1 via "C3 --> G1";
insert an explicit verification/auth step between them (e.g., create a node like
"W_Verify" or "WebhookAuth" and change the flow to "C3 --> WebhookAuth --> G1")
and label that node to indicate signature validation/authorization so the trust
boundary for GitHub webhook ingress is clear in the diagram and documentation.

In `@docs/architecture/reference/matrix-gateway.md`:
- Around line 48-49: Add a short migration note in the Matrix Gateway reference
section describing the rename/contract change for the response fields:
explicitly state the old field names (e.g., if any) and the new fields
recommended_tier and cacheable, show example mapping (old → new), and declare
the deprecation window/version and recommended upgrade steps for clients; update
the section near the response schema so consumers know the change, timeline, and
how to handle backwards compatibility (e.g., fallback logic or versioned
endpoints).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e51ac272-e68f-4e8e-a52c-d8a36251d57f

📥 Commits

Reviewing files that changed from the base of the PR and between 2bc7860 and 0e4fff2.

📒 Files selected for processing (16)

.github/workflows/deploy-environment.yaml
.github/workflows/deploy.yaml
docs/architecture/02-container-architecture.md
docs/architecture/04-observability-telemetry.md
docs/architecture/reference/matrix-gateway.md
docs/architecture/reference/matrix-rooivalk.md
docs/architecture/reference/slm-implementation-matrix.md
docs/architecture/reference/slm-management-plan.md
docs/architecture/reference/strategic/07-deployment-model.md
docs/architecture/systems/agentkit-forge.md
docs/architecture/systems/ai-gateway.md
docs/architecture/systems/codeflow-engine.md
docs/architecture/systems/cognitive-mesh.md
docs/architecture/systems/phoenix-rooivalk.md
docs/planning/request_to_token_attribution.md
infra/modules/aigateway_aca/outputs.tf

.github/workflows/deploy-environment.yaml

docs/architecture/04-observability-telemetry.md

- Add new workflow parameter to control sleep between model availability checks - Update container architecture diagram to include Webhook Auth component - Improve Prometheus metrics documentation clarity - Add migration note for Matrix Gateway response contract changes in v1.0.0

coderabbitai

🧹 Nitpick comments (4)

infra/modules/state_service_aca/main.tf (1)
46-52: Consider trimming the secret value for consistency.

The use_shared_token check now correctly uses trimspace() to detect non-empty tokens, but the secret value itself (line 50) still stores the raw var.state_service_shared_token which may contain trailing newlines. This could cause authentication mismatches if the consuming service performs a strict comparison.
♻️ Proposed fix to trim the secret value
   dynamic "secret" {
     for_each = local.use_shared_token ? [1] : []
     content {
       name  = "state-service-shared-token"
-      value = var.state_service_shared_token
+      value = trimspace(var.state_service_shared_token)
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/modules/state_service_aca/main.tf` around lines 46 - 52, The secret
value in the dynamic "secret" block uses the raw var.state_service_shared_token
which may include trailing whitespace/newlines; update the content.value to pass
the variable through trimspace (i.e., use
trimspace(var.state_service_shared_token)) so the stored secret is normalized,
matching the earlier use of trimspace() in local.use_shared_token and avoiding
auth mismatches.
infra/modules/dashboard_aca/main.tf (1)
37-43: Same concern: trim the secret value for consistency.

As noted in infra/modules/state_service_aca/main.tf, the secret value should also be trimmed to avoid storing whitespace-padded tokens.
♻️ Proposed fix to trim the secret value
   dynamic "secret" {
     for_each = local.use_shared_token ? [1] : []
     content {
       name  = "state-service-shared-token"
-      value = var.state_service_shared_token
+      value = trimspace(var.state_service_shared_token)
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/modules/dashboard_aca/main.tf` around lines 37 - 43, The dynamic
"secret" block that creates the "state-service-shared-token" currently uses
value = var.state_service_shared_token and may store whitespace-padded tokens;
update the content to use Terraform's trimspace function (value =
trimspace(var.state_service_shared_token)) so the stored secret is normalized;
this change should be made inside the dynamic "secret" block that is conditional
on local.use_shared_token to match the fix in the state_service_aca module.
.github/workflows/deploy-environment.yaml (2)
109-109: Hardcoded secrets expiration date may require future maintenance.

The value "2027-03-31T00:00:00Z" is hardcoded in the workflow. While currently valid (per the Terraform validation in infra/env/dev/variables.tf:88-95), this will require a workflow update before March 2027 to avoid deployment failures when the date is no longer in the future.

Consider parameterizing this as an input or secret to allow environment-specific configuration without workflow changes.
♻️ Suggested parameterization
+      secrets_expiration_date:
+        required: false
+        type: string
+        default: "2027-03-31T00:00:00Z"
+        description: Expiration date for Key Vault secrets (ISO-8601 UTC format)
...
-  TF_VAR_secrets_expiration_date: "2027-03-31T00:00:00Z"
+  TF_VAR_secrets_expiration_date: ${{ inputs.secrets_expiration_date }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-environment.yaml at line 109, The workflow
hardcodes TF_VAR_secrets_expiration_date which will become invalid in 2027;
change it to a configurable input or secret so it can be updated without editing
the workflow: replace the literal "2027-03-31T00:00:00Z" with a GitHub Actions
input (e.g., workflow_dispatch/input) or a repo/organization secret and fall
back to a computed value if not provided; update any usages referencing
TF_VAR_secrets_expiration_date accordingly so the workflow reads the value from
inputs or secrets instead of the hardcoded string.
110-110: Consider pinning dashboard image version in fallback.

The fallback ghcr.io/phoenixvc/ai-gateway-dashboard:latest uses the :latest tag, which can lead to unpredictable behavior if the image changes between deployments. While this is only used when DASHBOARD_CONTAINER_IMAGE secret is not configured, consider using a pinned version or SHA digest for reproducibility.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-environment.yaml at line 110, The fallback for
TF_VAR_dashboard_container_image currently uses the mutable tag ':latest';
update the fallback value used when DASHBOARD_CONTAINER_IMAGE is not provided to
a pinned image tag or SHA digest (e.g., replace
'ghcr.io/phoenixvc/ai-gateway-dashboard:latest' with a specific version tag or
`@sha256` digest) so deployments are reproducible; modify the
TF_VAR_dashboard_container_image assignment in the workflow to reference the
pinned tag/digest instead of ':latest' while keeping the secret override via
DASHBOARD_CONTAINER_IMAGE.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/deploy-environment.yaml:
- Line 109: The workflow hardcodes TF_VAR_secrets_expiration_date which will
become invalid in 2027; change it to a configurable input or secret so it can be
updated without editing the workflow: replace the literal "2027-03-31T00:00:00Z"
with a GitHub Actions input (e.g., workflow_dispatch/input) or a
repo/organization secret and fall back to a computed value if not provided;
update any usages referencing TF_VAR_secrets_expiration_date accordingly so the
workflow reads the value from inputs or secrets instead of the hardcoded string.
- Line 110: The fallback for TF_VAR_dashboard_container_image currently uses the
mutable tag ':latest'; update the fallback value used when
DASHBOARD_CONTAINER_IMAGE is not provided to a pinned image tag or SHA digest
(e.g., replace 'ghcr.io/phoenixvc/ai-gateway-dashboard:latest' with a specific
version tag or `@sha256` digest) so deployments are reproducible; modify the
TF_VAR_dashboard_container_image assignment in the workflow to reference the
pinned tag/digest instead of ':latest' while keeping the secret override via
DASHBOARD_CONTAINER_IMAGE.

In `@infra/modules/dashboard_aca/main.tf`:
- Around line 37-43: The dynamic "secret" block that creates the
"state-service-shared-token" currently uses value =
var.state_service_shared_token and may store whitespace-padded tokens; update
the content to use Terraform's trimspace function (value =
trimspace(var.state_service_shared_token)) so the stored secret is normalized;
this change should be made inside the dynamic "secret" block that is conditional
on local.use_shared_token to match the fix in the state_service_aca module.

In `@infra/modules/state_service_aca/main.tf`:
- Around line 46-52: The secret value in the dynamic "secret" block uses the raw
var.state_service_shared_token which may include trailing whitespace/newlines;
update the content.value to pass the variable through trimspace (i.e., use
trimspace(var.state_service_shared_token)) so the stored secret is normalized,
matching the earlier use of trimspace() in local.use_shared_token and avoiding
auth mismatches.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cfbc39a6-c9cd-4b1a-819a-6b28a9ebaa94

📥 Commits

Reviewing files that changed from the base of the PR and between 0e4fff2 and 19759df.

📒 Files selected for processing (6)

.github/workflows/deploy-environment.yaml
docs/architecture/02-container-architecture.md
docs/architecture/04-observability-telemetry.md
docs/architecture/reference/matrix-gateway.md
infra/modules/dashboard_aca/main.tf
infra/modules/state_service_aca/main.tf

🚧 Files skipped from review as they are similar to previous changes (2)

docs/architecture/reference/matrix-gateway.md
docs/architecture/02-container-architecture.md

Add comprehensive documentation for AI coding agents working in this repository, including: - Project overview and tech stack - Build/lint/test commands for each component - Code style guidelines for Python, JavaScript, and Terraform - Architecture overview and key files - Prerequisites and pre-commit checks

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Around line 186-197: The fenced code block containing the directory tree
starting with "docs/architecture/" is missing a language identifier and will
fail MD040; edit the block in AGENTS.md that begins with the triple backticks
before "docs/architecture/" and add a language tag (e.g., text) so the fence
reads ```text, leaving the directory content unchanged, then save the file to
satisfy the linter.
- Around line 78-79: The guidance contradicts itself: the sentence "Use absolute
imports within packages" conflicts with the example that uses a relative import
(from .routes import router). Update AGENTS.md so the rule and example
match—either change the rule to recommend relative imports within packages and
keep the example referencing .routes and router, or replace the example with an
absolute import form that references the package-level routes module and router
symbol; ensure the grouping guideline (stdlib → third-party → local) remains
consistent with the chosen import style.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d30d3a92-61be-43e3-bc4b-f5538a5e6c10

📥 Commits

Reviewing files that changed from the base of the PR and between 19759df and 9be91e5.

📒 Files selected for processing (1)

AGENTS.md

AGENTS.md

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

infra/env/dev/terraform.tfvars (1)

28-28: Avoid using :latest tag for container images in infrastructure code.

The :latest tag is non-deterministic and can lead to unexpected behavior when the image is updated. Consider pinning to a specific version or SHA digest for reproducible deployments.

♻️ Suggested improvement

-state_service_container_image   = "ghcr.io/phoenixvc/ai-gateway-state-service:latest"
+state_service_container_image   = "ghcr.io/phoenixvc/ai-gateway-state-service:v1.0.0"  # Pin to specific version

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@infra/env/dev/terraform.tfvars` at line 28, The image pinning uses a
non-deterministic :latest tag for state_service_container_image; replace the
"ghcr.io/phoenixvc/ai-gateway-state-service:latest" value in terraform.tfvars
with a specific immutable reference (either a versioned tag like v1.2.3 or a SHA
digest e.g. `@sha256`:...) to ensure reproducible deployments and update any
deployment automation that consumes state_service_container_image to accept the
pinned tag; reference the state_service_container_image variable when making the
change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@infra/env/dev/terraform.tfvars`:
- Around line 29-30: Remove the hardcoded credentials by deleting the
state_service_registry_username and state_service_registry_password assignments
from terraform.tfvars (they are currently set to "phoenixvc" and "ghp_xxx");
rely on the module's conditional auth flag use_registry_auth and supply
credentials at runtime via environment variables (e.g., export
TF_VAR_state_service_registry_username and
TF_VAR_state_service_registry_password in CI/CD) so no secrets are committed to
the repo.

---

Nitpick comments:
In `@infra/env/dev/terraform.tfvars`:
- Line 28: The image pinning uses a non-deterministic :latest tag for
state_service_container_image; replace the
"ghcr.io/phoenixvc/ai-gateway-state-service:latest" value in terraform.tfvars
with a specific immutable reference (either a versioned tag like v1.2.3 or a SHA
digest e.g. `@sha256`:...) to ensure reproducible deployments and update any
deployment automation that consumes state_service_container_image to accept the
pinned tag; reference the state_service_container_image variable when making the
change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d83fd7b1-d5b5-4046-87d4-65b3a4222721

📥 Commits

Reviewing files that changed from the base of the PR and between 9be91e5 and 72c004c.

📒 Files selected for processing (1)

infra/env/dev/terraform.tfvars

coderabbitai · 2026-03-15T15:27:08Z

infra/env/dev/terraform.tfvars

+state_service_registry_username = "phoenixvc"
+state_service_registry_password = "ghp_xxx"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Search for potential hardcoded credentials in tfvars files # Expected: No matches for credential patterns like ghp_, password =, token =, secret = echo "=== Checking for GitHub PAT patterns ===" rg -n 'ghp_[a-zA-Z0-9_]+' --glob '*.tfvars' echo "=== Checking for password/token assignments with non-empty values ===" rg -nP '(password|token|secret)\s*=\s*"[^"]+[^"]+"' --glob '*.tfvars' | grep -v '""'

Repository: phoenixvc/ai-gateway

Length of output: 330

🏁 Script executed:

#!/bin/bash # Find all .tf files in infra directory echo "=== Finding all .tf files in infra directory ===" find infra -name "*.tf" -type f echo "" echo "=== Searching for state_service_registry_password variable definition ===" rg -n 'state_service_registry_password' --type tf -B 2 -A 2 echo "" echo "=== Searching for state_service_registry_username variable definition ===" rg -n 'state_service_registry_username' --type tf -B 2 -A 2

Repository: phoenixvc/ai-gateway

Length of output: 5517

🏁 Script executed:

#!/bin/bash echo "=== Reading dev/variables.tf for state_service_registry variables ===" sed -n '189,210p' infra/env/dev/variables.tf echo "" echo "=== Checking if there are any defaults in these variable blocks ===" rg -A 10 'variable "state_service_registry_(username|password)"' infra/env/dev/variables.tf

Repository: phoenixvc/ai-gateway

Length of output: 1330

🏁 Script executed:

#!/bin/bash echo "=== Reading state_service_aca module main.tf to see credential usage ===" cat -n infra/modules/state_service_aca/main.tf echo "" echo "=== Checking state_service_aca variables.tf ===" cat -n infra/modules/state_service_aca/variables.tf

Repository: phoenixvc/ai-gateway

Length of output: 6408

Remove hardcoded credentials from terraform.tfvars.

The state_service_registry_password value (ghp_xxx) appears to be a GitHub Personal Access Token placeholder. Hardcoding credentials in tfvars files committed to version control is a security risk:

The pattern may be mistaken for a real (redacted) token

Developers may replace it with real credentials and accidentally commit

These variables have defaults of "" and the module gracefully handles empty credentials via conditional logic (use_registry_auth), so they can be safely omitted

Remove these values from the tfvars file and provide them at runtime via CI/CD environment variables (e.g., TF_VAR_state_service_registry_password).

Proposed fix

# State Service state_service_container_image = "ghcr.io/phoenixvc/ai-gateway-state-service:latest" -state_service_registry_username = "phoenixvc" -state_service_registry_password = "ghp_xxx" +# Registry credentials should be provided via TF_VAR_* environment variables +# state_service_registry_username = "" # Set via TF_VAR_state_service_registry_username +# state_service_registry_password = "" # Set via TF_VAR_state_service_registry_password (sensitive)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

state_service_registry_username = "phoenixvc"

state_service_registry_password = "ghp_xxx"

# State Service

state_service_container_image = "ghcr.io/phoenixvc/ai-gateway-state-service:latest"

# Registry credentials should be provided via TF_VAR_* environment variables

# state_service_registry_username = "" # Set via TF_VAR_state_service_registry_username

# state_service_registry_password = "" # Set via TF_VAR_state_service_registry_password (sensitive)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@infra/env/dev/terraform.tfvars` around lines 29 - 30, Remove the hardcoded credentials by deleting the state_service_registry_username and state_service_registry_password assignments from terraform.tfvars (they are currently set to "phoenixvc" and "ghp_xxx"); rely on the module's conditional auth flag use_registry_auth and supply credentials at runtime via environment variables (e.g., export TF_VAR_state_service_registry_username and TF_VAR_state_service_registry_password in CI/CD) so no secrets are committed to the repo.

JustAGhosT · 2026-03-15T15:28:32Z

@claude help here Verify each finding against the current code and only fix it if needed.

Inline comments:
In @infra/env/dev/terraform.tfvars:

Around line 29-30: Remove the hardcoded credentials by deleting the
state_service_registry_username and state_service_registry_password assignments
from terraform.tfvars (they are currently set to "phoenixvc" and "ghp_xxx");
rely on the module's conditional auth flag use_registry_auth and supply
credentials at runtime via environment variables (e.g., export
TF_VAR_state_service_registry_username and
TF_VAR_state_service_registry_password in CI/CD) so no secrets are committed to
the repo.

Nitpick comments:
In @infra/env/dev/terraform.tfvars:

Line 28: The image pinning uses a non-deterministic :latest tag for
state_service_container_image; replace the
"ghcr.io/phoenixvc/ai-gateway-state-service:latest" value in terraform.tfvars
with a specific immutable reference (either a versioned tag like v1.2.3 or a SHA
digest e.g. @sha256:...) to ensure reproducible deployments and update any
deployment automation that consumes state_service_container_image to accept the
pinned tag; reference the state_service_container_image variable when making the
change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Lead with what sluice is and why it exists (central AI data plane vs scattered direct Azure OpenAI calls) - Document all three components: Gateway, State Service, Dashboard - Add endpoint tables for gateway and state service - Add observability signals table (Prometheus, OTEL, Langfuse, Log Analytics) - Add environment table with default models per env - Move prerequisites to bottom; tighten secrets table - Add architecture ADRs to documentation table - Add retort to ecosystem table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 9-13: The fenced code block in README.md is missing a language tag
(MD040); update the opening fence that currently shows the ASCII diagram (lines
containing "your code  →  POST /v1/responses      ─┐" etc.) to use the "text"
language identifier (i.e., change "```" to "```text") so the diagram remains
readable and linter-clean while leaving the block contents unchanged.
- Line 174: The README and Terraform modules currently require "Terraform >=
1.14.0"; relax this to a lower minimum (e.g., ">= 1.10.0" or ">= 1.8.0") by
updating the README entry and every Terraform block that sets required_version
(look for terraform { required_version = ... } or a required_version setting) in
infra/env/{dev,prod,staging}/main.tf and infra/modules/*/main.tf so the
constraint matches the new minimum and aligns with azurerm provider
compatibility.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 68b242ca-edb5-4dbb-8e53-3e4d84e68adc

📥 Commits

Reviewing files that changed from the base of the PR and between 72c004c and b737f84.

📒 Files selected for processing (2)

CLAUDE.md
README.md

✅ Files skipped from review due to trivial changes (1)

CLAUDE.md

coderabbitai · 2026-03-31T22:27:40Z

README.md

+```
+your code  →  POST /v1/responses      ─┐
+               POST /v1/embeddings    ─┤  sluice (LiteLLM)  →  Azure OpenAI
+               GET  /metrics          ─┘
+```


⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced code block.

Line 9 opens a fenced block without a language identifier (MD040). Use text to keep the diagram readable and lint-clean.

Proposed fix

-``` +```text your code → POST /v1/responses ─┐ POST /v1/embeddings ─┤ sluice (LiteLLM) → Azure OpenAI GET /metrics ─┘

</details> <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.22.0)</summary> [warning] 9-9: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @README.md around lines 9 - 13, The fenced code block in README.md is missing
a language tag (MD040); update the opening fence that currently shows the ASCII
diagram (lines containing "your code → POST /v1/responses ─┐" etc.) to
use the "text" language identifier (i.e., change "" to "text") so the
diagram remains readable and linter-clean while leaving the block contents
unchanged.

</details>  

coderabbitai · 2026-03-31T22:27:40Z

README.md

+## Prerequisites
+
+- [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) — `az login`
+- [Terraform](https://www.terraform.io/downloads) >= 1.14.0


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

What is the latest stable Terraform version, and are there known features in Terraform 1.14.0 that are required compared with 1.8+ for typical AzureRM workflows?

💡 Result:

The latest stable Terraform version is 1.14.8. Terraform 1.14.0 introduces new features like List Resources (in *.tfquery.hcl files), the 'terraform query' command, GenerateResourceConfiguration RPC for providers during import, and top-level Actions blocks for provider-defined imperative operations outside CRUD. These are minor enhancements and experimental in nature, with no breaking changes affecting compatibility. For typical AzureRM workflows (using resources like azurerm_resource_group, azurerm_virtual_network, azurerm_linux_virtual_machine, etc.), no features from Terraform 1.14.0 are required compared to Terraform 1.8+. The AzureRM provider (latest ~4.66.0) uses the Terraform Plugin Framework and works with Terraform 1.x versions including 1.8+. Provider upgrade guides recommend the latest Terraform but do not mandate 1.14-specific features. Standard CRUD operations, imports, plans, and applies function identically across these versions without needing 1.14-exclusive capabilities.

Citations:

1: https://github.com/hashicorp/terraform/releases

2: https://developer.hashicorp.com/terraform/install

3: https://releases.hashicorp.com/terraform/

4: https://github.com/hashicorp/terraform/blob/v1.14.8/CHANGELOG.md

5: https://developer.hashicorp.com/terraform/language/upgrade-guides

6: https://learn.microsoft.com/en-us/azure/developer/terraform/provider-version-history-azurerm-4-0-0-to-current

7: https://releases.hashicorp.com/terraform-provider-azurerm

8: https://github.com/terraform-providers/terraform-provider-azurerm

9: https://registry.terraform.io/providers/hashicorp/Azurerm/4.64.0/docs/guides/4.0-overview

🏁 Script executed:

# Check README.md context around line 174 sed -n '160,190p' README.md

Repository: phoenixvc/sluice

Length of output: 1134

🏁 Script executed:

# Search for any references to Terraform 1.14+ specific features # (terraform query, tfquery.hcl, Actions blocks, GenerateResourceConfiguration) rg -i "terraform query|tfquery|actions\s*{|generateresource" --type-list | head -20

Repository: phoenixvc/sluice

Length of output: 449

🏁 Script executed:

# Look for terraform/infrastructure files to understand what features are actually used fd -e "tf$" -e "tfvars" --type f | head -20

Repository: phoenixvc/sluice

Length of output: 157

🏁 Script executed:

# Find all .tf files in the infra directory fd -e "tf" infra/ --type f

Repository: phoenixvc/sluice

Length of output: 389

🏁 Script executed:

# Search for terraform version constraints and provider configurations rg "required_version|required_providers|terraform\s*{" infra/ -A 5

Repository: phoenixvc/sluice

Length of output: 221

🏁 Script executed:

# Check for any 1.14-specific features in the infra code rg "terraform query|tfquery|actions\s*\{|GenerateResourceConfiguration" infra/

Repository: phoenixvc/sluice

Length of output: 42

🏁 Script executed:

# Find all .tf files in infra directory fd . 'infra/' -e tf

Repository: phoenixvc/sluice

Length of output: 741

🏁 Script executed:

# Search for terraform version requirements - simpler pattern rg "required_version" infra/

Repository: phoenixvc/sluice

Length of output: 495

🏁 Script executed:

# Search for provider blocks rg "terraform\s*\{" infra/ -A 3

Repository: phoenixvc/sluice

Length of output: 1612

🏁 Script executed:

# Look at the main terraform files to understand the configuration find infra -name "*.tf" -exec wc -l {} + | sort -rn | head -5

Repository: phoenixvc/sluice

Length of output: 231

🏁 Script executed:

# Look at one of the main.tf files to see all required providers head -50 infra/env/dev/main.tf

Repository: phoenixvc/sluice

Length of output: 1350

🏁 Script executed:

# Check for any use of 1.14-specific features (terraform query command, Actions blocks, etc.) rg "actions\s*{|terraform query|for_each.*=\s*data\.terraform_remote_state" infra/ -A 2

Repository: phoenixvc/sluice

Length of output: 223

🏁 Script executed:

# Check provider versions and see if any require 1.14+ rg "source.*=" infra/ | grep -E "(azurerm|azuread|grafana)" | head -10

Repository: phoenixvc/sluice

Length of output: 735

🏁 Script executed:

# Search for any comments explaining the 1.14 requirement rg "1\.14|version" infra/env/dev/main.tf -B 2 -A 2

Repository: phoenixvc/sluice

Length of output: 461

Lower the Terraform minimum version constraint to reduce unnecessary friction for contributors.

The infrastructure only uses standard AzureRM resources and providers (azurerm >= 4.62.0, grafana), with no usage of Terraform 1.14+ features (terraform query, Actions blocks, or GenerateResourceConfiguration). The AzureRM provider supports Terraform 1.8+, making the >= 1.14.0 requirement unnecessarily restrictive. Consider relaxing this to >= 1.10.0 or >= 1.8.0 across all terraform modules (infra/env/{dev,prod,staging}/main.tf and infra/modules/*/main.tf) to lower the barrier for contributors.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@README.md` at line 174, The README and Terraform modules currently require "Terraform >= 1.14.0"; relax this to a lower minimum (e.g., ">= 1.10.0" or ">= 1.8.0") by updating the README entry and every Terraform block that sets required_version (look for terraform { required_version = ... } or a required_version setting) in infra/env/{dev,prod,staging}/main.tf and infra/modules/*/main.tf so the constraint matches the new minimum and aligns with azurerm provider compatibility.

…uage - Fix 'absolute imports' → 'relative imports' (the example already used relative syntax, the label was wrong) - Add 'text' language identifier to architecture overview code fence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

JustAGhosT added 4 commits March 15, 2026 13:29

fix(workflows): rename smoke_models_wait_sleep to smoke_models_wait_a…

02a8bb7

…ttempts and update default value

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

.github/workflows/deploy-environment.yaml Show resolved Hide resolved

docs/architecture/04-observability-telemetry.md Outdated Show resolved Hide resolved

JustAGhosT had a problem deploying to dev March 15, 2026 12:48 — with GitHub Actions Failure

fix: spacing issues in yaml

19759df

JustAGhosT temporarily deployed to dev March 15, 2026 13:00 — with GitHub Actions Inactive

JustAGhosT had a problem deploying to dev March 15, 2026 13:02 — with GitHub Actions Failure

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

JustAGhosT temporarily deployed to dev March 15, 2026 14:37 — with GitHub Actions Inactive

JustAGhosT had a problem deploying to dev March 15, 2026 14:38 — with GitHub Actions Failure

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

AGENTS.md Outdated Show resolved Hide resolved

AGENTS.md Outdated Show resolved Hide resolved

feat(infra): add state service configuration to dev environment

72c004c

JustAGhosT deployed to dev March 15, 2026 15:23 — with GitHub Actions Active

JustAGhosT had a problem deploying to dev March 15, 2026 15:24 — with GitHub Actions Failure

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

JustAGhosT and others added 2 commits March 21, 2026 04:42

docs(readme): rename ai-gateway references to sluice

89b6560

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add ecosystem section linking to phoenixvc platform repos

b84ee04

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

JustAGhosT had a problem deploying to dev March 28, 2026 12:32 — with GitHub Actions Failure

JustAGhosT had a problem deploying to dev March 31, 2026 22:24 — with GitHub Actions Failure

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

JustAGhosT had a problem deploying to dev March 31, 2026 22:32 — with GitHub Actions Failure

JustAGhosT merged commit d2b30c4 into dev Mar 31, 2026
5 of 6 checks passed

JustAGhosT deleted the fix/terra-alignment branch March 31, 2026 22:50

		state_service_registry_username = "phoenixvc"
		state_service_registry_password = "ghp_xxx"

-state_service_registry_username = "phoenixvc"
-state_service_registry_password = "ghp_xxx"
+# State Service
+state_service_container_image   = "ghcr.io/phoenixvc/ai-gateway-state-service:latest"
+# Registry credentials should be provided via TF_VAR_* environment variables
+# state_service_registry_username = ""  # Set via TF_VAR_state_service_registry_username
+# state_service_registry_password = ""  # Set via TF_VAR_state_service_registry_password (sensitive)

Conversation

JustAGhosT commented Mar 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Summary

Validation

Deployment Notes

Staging Toggle (PRs to main)

Risk / Rollback

Summary by CodeRabbit

Uh oh!

blocksorg bot commented Mar 15, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 15, 2026

Uh oh!

coderabbitai bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

JustAGhosT commented Mar 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JustAGhosT commented Mar 15, 2026 •

edited by coderabbitai bot

Loading

Staging Toggle (PRs to `main`)

coderabbitai bot commented Mar 15, 2026 •

edited

Loading