Skip to content

Fix/terra alignment#69

Merged
JustAGhosT merged 12 commits intodevfrom
fix/terra-alignment
Mar 31, 2026
Merged

Fix/terra alignment#69
JustAGhosT merged 12 commits intodevfrom
fix/terra-alignment

Conversation

@JustAGhosT
Copy link
Copy Markdown
Contributor

@JustAGhosT JustAGhosT commented Mar 15, 2026

Pull Request Checklist

Summary

  • What changed?
  • Why was it needed?

Validation

  • Local checks run (if applicable)
  • Relevant workflow/jobs observed

Deployment Notes

  • No environment/config changes required
  • Environment/config changes required (describe below)

Staging Toggle (PRs to main)

  • Add label run-staging to this PR to enable staging deployment (deploy-staging).
  • Remove label run-staging to skip staging deployment.

Risk / Rollback

  • Risk level: low / medium / high
  • Rollback plan:

Summary by CodeRabbit

  • Documentation

    • Added Prometheus telemetry and retention guidance, new API routing docs for Azure OpenAI via LiteLLM, Webhook Auth in gateway docs, multiple formatting/clarity updates across architecture references, and a new AGENTS guide.
  • Improvements

    • Enhanced deployment workflows with per-environment selection, configurable smoke-test attempt count, expanded optional state/dashboard and monitoring configuration, and support for mapping additional credentials and environment variables.

- Add C3 connection in container architecture flowchart
- Rename "Azure Monitor" to "Application Insights" in telemetry diagram
- Add Prometheus as a telemetry sink with implementation details
- Document retention policies for Application Insights
- Update matrix gateway JSON example with new field names
- Fix confidence threshold in matrix-gateway from 0.70 to 0.75
- Update SOP suggestion threshold from 0.78 to 0.8
- Fix code block formatting in multiple files
Add environment input parameter to deploy-environment.yaml workflow to specify GitHub environment, improving deployment control and security. Replace hardcoded environment settings in deploy.yaml with the new parameter. Also fix code fences in documentation to use text format and update various documentation details.
…ttempts and update default value

# Pull Request Checklist

## Summary

- What changed?
- Why was it needed?

## Validation

- [ ] Local checks run (if applicable)
- [ ] Relevant workflow/jobs observed

## Deployment Notes

- [ ] No environment/config changes required
- [ ] Environment/config changes required (describe below)

## UAT Toggle (PRs to `main`)

- Add label `run-uat` to this PR to enable UAT deployment (`deploy-uat`).
- Remove label `run-uat` to skip UAT deployment.

## Risk / Rollback

- Risk level: low / medium / high
- Rollback plan:
@blocksorg
Copy link
Copy Markdown

blocksorg bot commented Mar 15, 2026

Mention Blocks like a regular teammate with your question or request:

@blocks review this pull request
@blocks make the following changes ...
@blocks create an issue from what was mentioned in the following comment ...
@blocks explain the following code ...
@blocks are there any security or performance concerns?

Run @blocks /help for more information.

Workspace settings | Disable this message

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Updates CI/CD workflows to add new inputs, per-job environments, secrets, and TF_VAR mappings (Azure auth, Terraform backend, state/dashboard images, Grafana, EXPECTED_AOAI_ENDPOINT_HOST), renames smoke test wait parameter, and applies related infra and documentation adjustments.

Changes

Cohort / File(s) Summary
GitHub Actions: deploy workflows
\.github/workflows/deploy-environment.yaml, \.github/workflows/deploy.yaml
Added inputs (smoke_models_wait_attempts, environment, state/dashboard TF_VARs), new per-job secrets (AZURE_, TF_BACKEND_, EXPECTED_AOAI_ENDPOINT_HOST, STATE_SERVICE_*, DASHBOARD/GRAFANA), mapped env vars/TF_VARs, set job-level environment, and updated smoke-test step to use the new attempts input.
Infra — Application Insights output
infra/modules/aigateway_aca/outputs.tf
Replaced sensitive application_insights_connection_string output with non-sensitive application_insights_name; guidance updated to retrieve connection string from Key Vault.
Infra — state/dashboard token trim change
infra/modules/dashboard_aca/main.tf, infra/modules/state_service_aca/main.tf
Changed whitespace trimming from trim to trimspace for use_shared_token checks, altering inclusion logic for shared-token secrets/env.
Infra env vars
infra/env/dev/terraform.tfvars
Added state_service_container_image, state_service_registry_username, state_service_registry_password assignments for dev environment.
Docs — container & observability
docs/architecture/02-container-architecture.md, docs/architecture/04-observability-telemetry.md
Added Webhook Auth node/edges to gateway diagram; introduced Prometheus as a telemetry sink with /metrics callbacks and retention guidance for Application Insights (Prod 90d, Non-prod 30d).
Docs — systems & routing
docs/architecture/systems/*, docs/architecture/systems/ai-gateway.md
Standardized code fence language to text and added a v1 API routing section describing Azure OpenAI via LiteLLM and routing rules for /v1/responses and /v1/embeddings.
Docs — reference & policies
docs/architecture/reference/*, docs/architecture/reference/strategic/07-deployment-model.md
Updated example payload fields (request_id, label, recommended_tier), added migration note, tightened thresholds (0.70→0.75, 0.78→0.80), expanded fallback pattern, and converted code fences to text.
Docs — planning
docs/planning/request_to_token_attribution.md
Terminology and labeling updates (Required→Recommended, Method A → Preferred), clarified Application Insights wiring via Key Vault, and updated action items.
Docs — minor formatting/others
docs/architecture/reference/slm-*, docs/architecture/systems/*, docs/architecture/systems/codeflow-engine.md
Converted several code fences to text, small formatting tweaks, and minor content adjustments (escaping braces, wording changes).
Ops guide
AGENTS.md
Added AGENTS.md with developer guidance for AI coding agents, tech stack, build/test commands, and style conventions.
Repo docs & README
README.md, CLAUDE.md
Project rename references updated (ai-gateway → sluice), README expanded with components, observability, CI/CD, and ecosystem links; CLAUDE onboarding links updated.
Misc — workflow per-env inputs
\.github/workflows/deploy.yaml (per-env blocks)
Removed inline environment lines in favor of explicit environment: entries per job and added per-environment input defaults for smoke attempts and optional state/dashboard values.

Sequence Diagram(s)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐇 I hop through YAML, neat and spry,
Secrets tucked where pipelines lie,
Grafana hums and metrics chime,
Key Vault keeps the strings in time,
I nibble bugs and push — goodbye! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely a blank template with unchecked items and no substantive content. The "What changed?" and "Why was it needed?" prompts are unanswered, leaving no context for reviewers. Fill in the Summary section with concrete details about what was changed (workflow updates, Terraform vars, documentation) and why (alignment, security, feature support). Complete or check relevant validation and deployment items.
Title check ❓ Inconclusive The title "Fix/terra alignment" is vague and overly generic. It uses a file/area reference ("terra") without clearly describing what the actual change is or why it matters. Revise the title to be more specific and descriptive of the main change, such as "Add state service configuration to deployment workflows" or "Update Terraform variables for state service integration".
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/terra-alignment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
docs/architecture/reference/matrix-gateway.md (1)

48-49: Document migration impact for renamed response fields.

recommended_tier/cacheable look consistent here, but this is a contract change for clients. Please add a short migration note (old → new fields, deprecation window/version) in this doc section to reduce integration risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture/reference/matrix-gateway.md` around lines 48 - 49, Add a
short migration note in the Matrix Gateway reference section describing the
rename/contract change for the response fields: explicitly state the old field
names (e.g., if any) and the new fields recommended_tier and cacheable, show
example mapping (old → new), and declare the deprecation window/version and
recommended upgrade steps for clients; update the section near the response
schema so consumers know the change, timeline, and how to handle backwards
compatibility (e.g., fallback logic or versioned endpoints).
docs/architecture/02-container-architecture.md (1)

60-60: Model webhook verification explicitly in the flow.

Line 60 introduces direct GitHub Webhook ingress; consider adding a distinct “signature verification/auth” step in the diagram so trust boundaries are explicit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture/02-container-architecture.md` at line 60, The diagram
currently connects C3 directly to G1 via "C3 --> G1"; insert an explicit
verification/auth step between them (e.g., create a node like "W_Verify" or
"WebhookAuth" and change the flow to "C3 --> WebhookAuth --> G1") and label that
node to indicate signature validation/authorization so the trust boundary for
GitHub webhook ingress is clear in the diagram and documentation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/deploy-environment.yaml:
- Around line 31-35: Add a missing input declaration for smoke_models_wait_sleep
to the reusable workflow inputs so inputs.smoke_models_wait_sleep is defined;
specifically add an input block named smoke_models_wait_sleep (e.g., required:
false, type: string, default: "10", description: Sleep seconds between model
availability checks) alongside the existing smoke_models_wait_attempts
declaration so callers that pass smoke_models_wait_sleep continue to work.

In `@docs/architecture/04-observability-telemetry.md`:
- Line 79: The description reverses Prometheus' role: update the sentence that
says "The Prometheus exporter exposes a `/metrics` endpoint that scrapes
application metrics" to indicate that Prometheus scrapes the `/metrics` endpoint
(e.g., "The Prometheus exporter exposes a `/metrics` endpoint which is scraped
by Prometheus") and ensure the surrounding text still references
`success_callback`/`failure_callback` containing "prometheus" and the container
config at `infra/modules/aigateway_aca/main.tf:95-113`.

---

Nitpick comments:
In `@docs/architecture/02-container-architecture.md`:
- Line 60: The diagram currently connects C3 directly to G1 via "C3 --> G1";
insert an explicit verification/auth step between them (e.g., create a node like
"W_Verify" or "WebhookAuth" and change the flow to "C3 --> WebhookAuth --> G1")
and label that node to indicate signature validation/authorization so the trust
boundary for GitHub webhook ingress is clear in the diagram and documentation.

In `@docs/architecture/reference/matrix-gateway.md`:
- Around line 48-49: Add a short migration note in the Matrix Gateway reference
section describing the rename/contract change for the response fields:
explicitly state the old field names (e.g., if any) and the new fields
recommended_tier and cacheable, show example mapping (old → new), and declare
the deprecation window/version and recommended upgrade steps for clients; update
the section near the response schema so consumers know the change, timeline, and
how to handle backwards compatibility (e.g., fallback logic or versioned
endpoints).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e51ac272-e68f-4e8e-a52c-d8a36251d57f

📥 Commits

Reviewing files that changed from the base of the PR and between 2bc7860 and 0e4fff2.

📒 Files selected for processing (16)
  • .github/workflows/deploy-environment.yaml
  • .github/workflows/deploy.yaml
  • docs/architecture/02-container-architecture.md
  • docs/architecture/04-observability-telemetry.md
  • docs/architecture/reference/matrix-gateway.md
  • docs/architecture/reference/matrix-rooivalk.md
  • docs/architecture/reference/slm-implementation-matrix.md
  • docs/architecture/reference/slm-management-plan.md
  • docs/architecture/reference/strategic/07-deployment-model.md
  • docs/architecture/systems/agentkit-forge.md
  • docs/architecture/systems/ai-gateway.md
  • docs/architecture/systems/codeflow-engine.md
  • docs/architecture/systems/cognitive-mesh.md
  • docs/architecture/systems/phoenix-rooivalk.md
  • docs/planning/request_to_token_attribution.md
  • infra/modules/aigateway_aca/outputs.tf

- Add new workflow parameter to control sleep between model availability checks
- Update container architecture diagram to include Webhook Auth component
- Improve Prometheus metrics documentation clarity
- Add migration note for Matrix Gateway response contract changes in v1.0.0
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
infra/modules/state_service_aca/main.tf (1)

46-52: Consider trimming the secret value for consistency.

The use_shared_token check now correctly uses trimspace() to detect non-empty tokens, but the secret value itself (line 50) still stores the raw var.state_service_shared_token which may contain trailing newlines. This could cause authentication mismatches if the consuming service performs a strict comparison.

♻️ Proposed fix to trim the secret value
   dynamic "secret" {
     for_each = local.use_shared_token ? [1] : []
     content {
       name  = "state-service-shared-token"
-      value = var.state_service_shared_token
+      value = trimspace(var.state_service_shared_token)
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/modules/state_service_aca/main.tf` around lines 46 - 52, The secret
value in the dynamic "secret" block uses the raw var.state_service_shared_token
which may include trailing whitespace/newlines; update the content.value to pass
the variable through trimspace (i.e., use
trimspace(var.state_service_shared_token)) so the stored secret is normalized,
matching the earlier use of trimspace() in local.use_shared_token and avoiding
auth mismatches.
infra/modules/dashboard_aca/main.tf (1)

37-43: Same concern: trim the secret value for consistency.

As noted in infra/modules/state_service_aca/main.tf, the secret value should also be trimmed to avoid storing whitespace-padded tokens.

♻️ Proposed fix to trim the secret value
   dynamic "secret" {
     for_each = local.use_shared_token ? [1] : []
     content {
       name  = "state-service-shared-token"
-      value = var.state_service_shared_token
+      value = trimspace(var.state_service_shared_token)
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/modules/dashboard_aca/main.tf` around lines 37 - 43, The dynamic
"secret" block that creates the "state-service-shared-token" currently uses
value = var.state_service_shared_token and may store whitespace-padded tokens;
update the content to use Terraform's trimspace function (value =
trimspace(var.state_service_shared_token)) so the stored secret is normalized;
this change should be made inside the dynamic "secret" block that is conditional
on local.use_shared_token to match the fix in the state_service_aca module.
.github/workflows/deploy-environment.yaml (2)

109-109: Hardcoded secrets expiration date may require future maintenance.

The value "2027-03-31T00:00:00Z" is hardcoded in the workflow. While currently valid (per the Terraform validation in infra/env/dev/variables.tf:88-95), this will require a workflow update before March 2027 to avoid deployment failures when the date is no longer in the future.

Consider parameterizing this as an input or secret to allow environment-specific configuration without workflow changes.

♻️ Suggested parameterization
+      secrets_expiration_date:
+        required: false
+        type: string
+        default: "2027-03-31T00:00:00Z"
+        description: Expiration date for Key Vault secrets (ISO-8601 UTC format)
...
-  TF_VAR_secrets_expiration_date: "2027-03-31T00:00:00Z"
+  TF_VAR_secrets_expiration_date: ${{ inputs.secrets_expiration_date }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-environment.yaml at line 109, The workflow
hardcodes TF_VAR_secrets_expiration_date which will become invalid in 2027;
change it to a configurable input or secret so it can be updated without editing
the workflow: replace the literal "2027-03-31T00:00:00Z" with a GitHub Actions
input (e.g., workflow_dispatch/input) or a repo/organization secret and fall
back to a computed value if not provided; update any usages referencing
TF_VAR_secrets_expiration_date accordingly so the workflow reads the value from
inputs or secrets instead of the hardcoded string.

110-110: Consider pinning dashboard image version in fallback.

The fallback ghcr.io/phoenixvc/ai-gateway-dashboard:latest uses the :latest tag, which can lead to unpredictable behavior if the image changes between deployments. While this is only used when DASHBOARD_CONTAINER_IMAGE secret is not configured, consider using a pinned version or SHA digest for reproducibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-environment.yaml at line 110, The fallback for
TF_VAR_dashboard_container_image currently uses the mutable tag ':latest';
update the fallback value used when DASHBOARD_CONTAINER_IMAGE is not provided to
a pinned image tag or SHA digest (e.g., replace
'ghcr.io/phoenixvc/ai-gateway-dashboard:latest' with a specific version tag or
`@sha256` digest) so deployments are reproducible; modify the
TF_VAR_dashboard_container_image assignment in the workflow to reference the
pinned tag/digest instead of ':latest' while keeping the secret override via
DASHBOARD_CONTAINER_IMAGE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/deploy-environment.yaml:
- Line 109: The workflow hardcodes TF_VAR_secrets_expiration_date which will
become invalid in 2027; change it to a configurable input or secret so it can be
updated without editing the workflow: replace the literal "2027-03-31T00:00:00Z"
with a GitHub Actions input (e.g., workflow_dispatch/input) or a
repo/organization secret and fall back to a computed value if not provided;
update any usages referencing TF_VAR_secrets_expiration_date accordingly so the
workflow reads the value from inputs or secrets instead of the hardcoded string.
- Line 110: The fallback for TF_VAR_dashboard_container_image currently uses the
mutable tag ':latest'; update the fallback value used when
DASHBOARD_CONTAINER_IMAGE is not provided to a pinned image tag or SHA digest
(e.g., replace 'ghcr.io/phoenixvc/ai-gateway-dashboard:latest' with a specific
version tag or `@sha256` digest) so deployments are reproducible; modify the
TF_VAR_dashboard_container_image assignment in the workflow to reference the
pinned tag/digest instead of ':latest' while keeping the secret override via
DASHBOARD_CONTAINER_IMAGE.

In `@infra/modules/dashboard_aca/main.tf`:
- Around line 37-43: The dynamic "secret" block that creates the
"state-service-shared-token" currently uses value =
var.state_service_shared_token and may store whitespace-padded tokens; update
the content to use Terraform's trimspace function (value =
trimspace(var.state_service_shared_token)) so the stored secret is normalized;
this change should be made inside the dynamic "secret" block that is conditional
on local.use_shared_token to match the fix in the state_service_aca module.

In `@infra/modules/state_service_aca/main.tf`:
- Around line 46-52: The secret value in the dynamic "secret" block uses the raw
var.state_service_shared_token which may include trailing whitespace/newlines;
update the content.value to pass the variable through trimspace (i.e., use
trimspace(var.state_service_shared_token)) so the stored secret is normalized,
matching the earlier use of trimspace() in local.use_shared_token and avoiding
auth mismatches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cfbc39a6-c9cd-4b1a-819a-6b28a9ebaa94

📥 Commits

Reviewing files that changed from the base of the PR and between 0e4fff2 and 19759df.

📒 Files selected for processing (6)
  • .github/workflows/deploy-environment.yaml
  • docs/architecture/02-container-architecture.md
  • docs/architecture/04-observability-telemetry.md
  • docs/architecture/reference/matrix-gateway.md
  • infra/modules/dashboard_aca/main.tf
  • infra/modules/state_service_aca/main.tf
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/architecture/reference/matrix-gateway.md
  • docs/architecture/02-container-architecture.md

Add comprehensive documentation for AI coding agents working in this repository, including:
- Project overview and tech stack
- Build/lint/test commands for each component
- Code style guidelines for Python, JavaScript, and Terraform
- Architecture overview and key files
- Prerequisites and pre-commit checks
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Around line 186-197: The fenced code block containing the directory tree
starting with "docs/architecture/" is missing a language identifier and will
fail MD040; edit the block in AGENTS.md that begins with the triple backticks
before "docs/architecture/" and add a language tag (e.g., text) so the fence
reads ```text, leaving the directory content unchanged, then save the file to
satisfy the linter.
- Around line 78-79: The guidance contradicts itself: the sentence "Use absolute
imports within packages" conflicts with the example that uses a relative import
(from .routes import router). Update AGENTS.md so the rule and example
match—either change the rule to recommend relative imports within packages and
keep the example referencing .routes and router, or replace the example with an
absolute import form that references the package-level routes module and router
symbol; ensure the grouping guideline (stdlib → third-party → local) remains
consistent with the chosen import style.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d30d3a92-61be-43e3-bc4b-f5538a5e6c10

📥 Commits

Reviewing files that changed from the base of the PR and between 19759df and 9be91e5.

📒 Files selected for processing (1)
  • AGENTS.md

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
infra/env/dev/terraform.tfvars (1)

28-28: Avoid using :latest tag for container images in infrastructure code.

The :latest tag is non-deterministic and can lead to unexpected behavior when the image is updated. Consider pinning to a specific version or SHA digest for reproducible deployments.

♻️ Suggested improvement
-state_service_container_image   = "ghcr.io/phoenixvc/ai-gateway-state-service:latest"
+state_service_container_image   = "ghcr.io/phoenixvc/ai-gateway-state-service:v1.0.0"  # Pin to specific version
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/env/dev/terraform.tfvars` at line 28, The image pinning uses a
non-deterministic :latest tag for state_service_container_image; replace the
"ghcr.io/phoenixvc/ai-gateway-state-service:latest" value in terraform.tfvars
with a specific immutable reference (either a versioned tag like v1.2.3 or a SHA
digest e.g. `@sha256`:...) to ensure reproducible deployments and update any
deployment automation that consumes state_service_container_image to accept the
pinned tag; reference the state_service_container_image variable when making the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@infra/env/dev/terraform.tfvars`:
- Around line 29-30: Remove the hardcoded credentials by deleting the
state_service_registry_username and state_service_registry_password assignments
from terraform.tfvars (they are currently set to "phoenixvc" and "ghp_xxx");
rely on the module's conditional auth flag use_registry_auth and supply
credentials at runtime via environment variables (e.g., export
TF_VAR_state_service_registry_username and
TF_VAR_state_service_registry_password in CI/CD) so no secrets are committed to
the repo.

---

Nitpick comments:
In `@infra/env/dev/terraform.tfvars`:
- Line 28: The image pinning uses a non-deterministic :latest tag for
state_service_container_image; replace the
"ghcr.io/phoenixvc/ai-gateway-state-service:latest" value in terraform.tfvars
with a specific immutable reference (either a versioned tag like v1.2.3 or a SHA
digest e.g. `@sha256`:...) to ensure reproducible deployments and update any
deployment automation that consumes state_service_container_image to accept the
pinned tag; reference the state_service_container_image variable when making the
change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d83fd7b1-d5b5-4046-87d4-65b3a4222721

📥 Commits

Reviewing files that changed from the base of the PR and between 9be91e5 and 72c004c.

📒 Files selected for processing (1)
  • infra/env/dev/terraform.tfvars

Comment on lines +29 to +30
state_service_registry_username = "phoenixvc"
state_service_registry_password = "ghp_xxx"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Search for potential hardcoded credentials in tfvars files
# Expected: No matches for credential patterns like ghp_, password =, token =, secret =

echo "=== Checking for GitHub PAT patterns ==="
rg -n 'ghp_[a-zA-Z0-9_]+' --glob '*.tfvars'

echo "=== Checking for password/token assignments with non-empty values ==="
rg -nP '(password|token|secret)\s*=\s*"[^"]+[^"]+"' --glob '*.tfvars' | grep -v '""'

Repository: phoenixvc/ai-gateway

Length of output: 330


🏁 Script executed:

#!/bin/bash

# Find all .tf files in infra directory
echo "=== Finding all .tf files in infra directory ==="
find infra -name "*.tf" -type f

echo ""
echo "=== Searching for state_service_registry_password variable definition ==="
rg -n 'state_service_registry_password' --type tf -B 2 -A 2

echo ""
echo "=== Searching for state_service_registry_username variable definition ==="
rg -n 'state_service_registry_username' --type tf -B 2 -A 2

Repository: phoenixvc/ai-gateway

Length of output: 5517


🏁 Script executed:

#!/bin/bash

echo "=== Reading dev/variables.tf for state_service_registry variables ==="
sed -n '189,210p' infra/env/dev/variables.tf

echo ""
echo "=== Checking if there are any defaults in these variable blocks ==="
rg -A 10 'variable "state_service_registry_(username|password)"' infra/env/dev/variables.tf

Repository: phoenixvc/ai-gateway

Length of output: 1330


🏁 Script executed:

#!/bin/bash

echo "=== Reading state_service_aca module main.tf to see credential usage ==="
cat -n infra/modules/state_service_aca/main.tf

echo ""
echo "=== Checking state_service_aca variables.tf ==="
cat -n infra/modules/state_service_aca/variables.tf

Repository: phoenixvc/ai-gateway

Length of output: 6408


Remove hardcoded credentials from terraform.tfvars.

The state_service_registry_password value (ghp_xxx) appears to be a GitHub Personal Access Token placeholder. Hardcoding credentials in tfvars files committed to version control is a security risk:

  1. The pattern may be mistaken for a real (redacted) token
  2. Developers may replace it with real credentials and accidentally commit
  3. These variables have defaults of "" and the module gracefully handles empty credentials via conditional logic (use_registry_auth), so they can be safely omitted

Remove these values from the tfvars file and provide them at runtime via CI/CD environment variables (e.g., TF_VAR_state_service_registry_password).

Proposed fix
 # State Service
 state_service_container_image   = "ghcr.io/phoenixvc/ai-gateway-state-service:latest"
-state_service_registry_username = "phoenixvc"
-state_service_registry_password = "ghp_xxx"
+# Registry credentials should be provided via TF_VAR_* environment variables
+# state_service_registry_username = ""  # Set via TF_VAR_state_service_registry_username
+# state_service_registry_password = ""  # Set via TF_VAR_state_service_registry_password (sensitive)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
state_service_registry_username = "phoenixvc"
state_service_registry_password = "ghp_xxx"
# State Service
state_service_container_image = "ghcr.io/phoenixvc/ai-gateway-state-service:latest"
# Registry credentials should be provided via TF_VAR_* environment variables
# state_service_registry_username = "" # Set via TF_VAR_state_service_registry_username
# state_service_registry_password = "" # Set via TF_VAR_state_service_registry_password (sensitive)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/env/dev/terraform.tfvars` around lines 29 - 30, Remove the hardcoded
credentials by deleting the state_service_registry_username and
state_service_registry_password assignments from terraform.tfvars (they are
currently set to "phoenixvc" and "ghp_xxx"); rely on the module's conditional
auth flag use_registry_auth and supply credentials at runtime via environment
variables (e.g., export TF_VAR_state_service_registry_username and
TF_VAR_state_service_registry_password in CI/CD) so no secrets are committed to
the repo.

@JustAGhosT
Copy link
Copy Markdown
Contributor Author

@claude help here Verify each finding against the current code and only fix it if needed.

Inline comments:
In @infra/env/dev/terraform.tfvars:

  • Around line 29-30: Remove the hardcoded credentials by deleting the
    state_service_registry_username and state_service_registry_password assignments
    from terraform.tfvars (they are currently set to "phoenixvc" and "ghp_xxx");
    rely on the module's conditional auth flag use_registry_auth and supply
    credentials at runtime via environment variables (e.g., export
    TF_VAR_state_service_registry_username and
    TF_VAR_state_service_registry_password in CI/CD) so no secrets are committed to
    the repo.

Nitpick comments:
In @infra/env/dev/terraform.tfvars:

  • Line 28: The image pinning uses a non-deterministic :latest tag for
    state_service_container_image; replace the
    "ghcr.io/phoenixvc/ai-gateway-state-service:latest" value in terraform.tfvars
    with a specific immutable reference (either a versioned tag like v1.2.3 or a SHA
    digest e.g. @sha256:...) to ensure reproducible deployments and update any
    deployment automation that consumes state_service_container_image to accept the
    pinned tag; reference the state_service_container_image variable when making the
    change.

JustAGhosT and others added 2 commits March 21, 2026 04:42
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Lead with what sluice is and why it exists (central AI data plane vs
  scattered direct Azure OpenAI calls)
- Document all three components: Gateway, State Service, Dashboard
- Add endpoint tables for gateway and state service
- Add observability signals table (Prometheus, OTEL, Langfuse, Log Analytics)
- Add environment table with default models per env
- Move prerequisites to bottom; tighten secrets table
- Add architecture ADRs to documentation table
- Add retort to ecosystem table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 9-13: The fenced code block in README.md is missing a language tag
(MD040); update the opening fence that currently shows the ASCII diagram (lines
containing "your code  →  POST /v1/responses      ─┐" etc.) to use the "text"
language identifier (i.e., change "```" to "```text") so the diagram remains
readable and linter-clean while leaving the block contents unchanged.
- Line 174: The README and Terraform modules currently require "Terraform >=
1.14.0"; relax this to a lower minimum (e.g., ">= 1.10.0" or ">= 1.8.0") by
updating the README entry and every Terraform block that sets required_version
(look for terraform { required_version = ... } or a required_version setting) in
infra/env/{dev,prod,staging}/main.tf and infra/modules/*/main.tf so the
constraint matches the new minimum and aligns with azurerm provider
compatibility.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 68b242ca-edb5-4dbb-8e53-3e4d84e68adc

📥 Commits

Reviewing files that changed from the base of the PR and between 72c004c and b737f84.

📒 Files selected for processing (2)
  • CLAUDE.md
  • README.md
✅ Files skipped from review due to trivial changes (1)
  • CLAUDE.md

Comment on lines +9 to +13
```
your code → POST /v1/responses ─┐
POST /v1/embeddings ─┤ sluice (LiteLLM) → Azure OpenAI
GET /metrics ─┘
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced code block.

Line 9 opens a fenced block without a language identifier (MD040). Use text to keep the diagram readable and lint-clean.

Proposed fix
-```
+```text
 your code  →  POST /v1/responses      ─┐
                POST /v1/embeddings    ─┤  sluice (LiteLLM)  →  Azure OpenAI
                GET  /metrics          ─┘
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>

[warning] 9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @README.md around lines 9 - 13, The fenced code block in README.md is missing
a language tag (MD040); update the opening fence that currently shows the ASCII
diagram (lines containing "your code → POST /v1/responses ─┐" etc.) to
use the "text" language identifier (i.e., change "" to "text") so the
diagram remains readable and linter-clean while leaving the block contents
unchanged.


</details>

<!-- fingerprinting:phantom:triton:hawk:c51e79e1-0301-404f-afa3-74b2f26ede7e -->

<!-- This is an auto-generated comment by CodeRabbit -->

## Prerequisites

- [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) — `az login`
- [Terraform](https://www.terraform.io/downloads) >= 1.14.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

What is the latest stable Terraform version, and are there known features in Terraform 1.14.0 that are required compared with 1.8+ for typical AzureRM workflows?

💡 Result:

The latest stable Terraform version is 1.14.8. Terraform 1.14.0 introduces new features like List Resources (in *.tfquery.hcl files), the 'terraform query' command, GenerateResourceConfiguration RPC for providers during import, and top-level Actions blocks for provider-defined imperative operations outside CRUD. These are minor enhancements and experimental in nature, with no breaking changes affecting compatibility. For typical AzureRM workflows (using resources like azurerm_resource_group, azurerm_virtual_network, azurerm_linux_virtual_machine, etc.), no features from Terraform 1.14.0 are required compared to Terraform 1.8+. The AzureRM provider (latest ~4.66.0) uses the Terraform Plugin Framework and works with Terraform 1.x versions including 1.8+. Provider upgrade guides recommend the latest Terraform but do not mandate 1.14-specific features. Standard CRUD operations, imports, plans, and applies function identically across these versions without needing 1.14-exclusive capabilities.

Citations:


🏁 Script executed:

# Check README.md context around line 174
sed -n '160,190p' README.md

Repository: phoenixvc/sluice

Length of output: 1134


🏁 Script executed:

# Search for any references to Terraform 1.14+ specific features
# (terraform query, tfquery.hcl, Actions blocks, GenerateResourceConfiguration)
rg -i "terraform query|tfquery|actions\s*{|generateresource" --type-list | head -20

Repository: phoenixvc/sluice

Length of output: 449


🏁 Script executed:

# Look for terraform/infrastructure files to understand what features are actually used
fd -e "tf$" -e "tfvars" --type f | head -20

Repository: phoenixvc/sluice

Length of output: 157


🏁 Script executed:

# Find all .tf files in the infra directory
fd -e "tf" infra/ --type f

Repository: phoenixvc/sluice

Length of output: 389


🏁 Script executed:

# Search for terraform version constraints and provider configurations
rg "required_version|required_providers|terraform\s*{" infra/ -A 5

Repository: phoenixvc/sluice

Length of output: 221


🏁 Script executed:

# Check for any 1.14-specific features in the infra code
rg "terraform query|tfquery|actions\s*\{|GenerateResourceConfiguration" infra/

Repository: phoenixvc/sluice

Length of output: 42


🏁 Script executed:

# Find all .tf files in infra directory
fd . 'infra/' -e tf

Repository: phoenixvc/sluice

Length of output: 741


🏁 Script executed:

# Search for terraform version requirements - simpler pattern
rg "required_version" infra/

Repository: phoenixvc/sluice

Length of output: 495


🏁 Script executed:

# Search for provider blocks
rg "terraform\s*\{" infra/ -A 3

Repository: phoenixvc/sluice

Length of output: 1612


🏁 Script executed:

# Look at the main terraform files to understand the configuration
find infra -name "*.tf" -exec wc -l {} + | sort -rn | head -5

Repository: phoenixvc/sluice

Length of output: 231


🏁 Script executed:

# Look at one of the main.tf files to see all required providers
head -50 infra/env/dev/main.tf

Repository: phoenixvc/sluice

Length of output: 1350


🏁 Script executed:

# Check for any use of 1.14-specific features (terraform query command, Actions blocks, etc.)
rg "actions\s*{|terraform query|for_each.*=\s*data\.terraform_remote_state" infra/ -A 2

Repository: phoenixvc/sluice

Length of output: 223


🏁 Script executed:

# Check provider versions and see if any require 1.14+
rg "source.*=" infra/ | grep -E "(azurerm|azuread|grafana)" | head -10

Repository: phoenixvc/sluice

Length of output: 735


🏁 Script executed:

# Search for any comments explaining the 1.14 requirement
rg "1\.14|version" infra/env/dev/main.tf -B 2 -A 2

Repository: phoenixvc/sluice

Length of output: 461


Lower the Terraform minimum version constraint to reduce unnecessary friction for contributors.

The infrastructure only uses standard AzureRM resources and providers (azurerm >= 4.62.0, grafana), with no usage of Terraform 1.14+ features (terraform query, Actions blocks, or GenerateResourceConfiguration). The AzureRM provider supports Terraform 1.8+, making the >= 1.14.0 requirement unnecessarily restrictive. Consider relaxing this to >= 1.10.0 or >= 1.8.0 across all terraform modules (infra/env/{dev,prod,staging}/main.tf and infra/modules/*/main.tf) to lower the barrier for contributors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 174, The README and Terraform modules currently require
"Terraform >= 1.14.0"; relax this to a lower minimum (e.g., ">= 1.10.0" or ">=
1.8.0") by updating the README entry and every Terraform block that sets
required_version (look for terraform { required_version = ... } or a
required_version setting) in infra/env/{dev,prod,staging}/main.tf and
infra/modules/*/main.tf so the constraint matches the new minimum and aligns
with azurerm provider compatibility.

…uage

- Fix 'absolute imports' → 'relative imports' (the example already used
  relative syntax, the label was wrong)
- Add 'text' language identifier to architecture overview code fence

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@JustAGhosT JustAGhosT merged commit d2b30c4 into dev Mar 31, 2026
5 of 6 checks passed
@JustAGhosT JustAGhosT deleted the fix/terra-alignment branch March 31, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant