Context file for Claude Code / Claude Agent SDK. This file provides Claude with deep understanding of the entire codebase to enable effective assistance.
- Name: Three Horizons Implementation Accelerator
- Version: 4.0.0
- License: MIT
- Author: Microsoft LATAM Platform Engineering
- Repository:
agentic-devops-platform/ - Scale: 949+ files, ~105,000 lines of production-ready code
- Partnership: Microsoft + GitHub + Red Hat
An enterprise-grade Agentic DevOps Platform that combines Infrastructure as Code (Terraform), GitOps (ArgoCD), AI Agents (GitHub Copilot), and an Internal Developer Platform (Red Hat Developer Hub) to accelerate software delivery. It follows a Three Horizons maturity model:
- H1 Foundation: Core Azure infrastructure (AKS, networking, databases, security, DR)
- H2 Enhancement: Platform services (ArgoCD, RHDH, observability, Golden Paths, runners)
- H3 Innovation: AI capabilities (Microsoft Foundry, Copilot agents, MCP servers, Developer Lightspeed)
- Terraform >= 1.5.0 with AzureRM >= 3.75, AzureAD >= 2.45, AzAPI >= 1.9
- AKS (Kubernetes 1.29+) with Calico network policy, Workload Identity, auto-scaling
- Azure Container Registry (Basic/Standard/Premium by deployment mode)
- Azure Key Vault with RBAC, soft delete, purge protection
- PostgreSQL Flexible Server v16 with geo-redundant backup
- Redis Cache with TLS 1.2+ enforcement
- Microsoft Foundry with o3, gpt-4.1, gpt-4o, gpt-4o-mini, text-embedding-3-large, AI Search, Content Safety. Claude models (Opus 4.6, Sonnet 4.6) available via Marketplace in eastus2/swedencentral.
- Microsoft Purview for data governance
- Microsoft Defender for containers and cloud posture
- Azure Backup Vault for disaster recovery
- Red Hat Developer Hub (RHDH) 1.8 — Backstage-based IDP with Dynamic Plugins
- ArgoCD 5.51.0 — GitOps with App-of-Apps pattern and sync waves
- Helm — Package management for all K8s deployments
- OPA Gatekeeper — Policy enforcement with 5 constraint templates
- External Secrets Operator 0.9.9 — Azure Key Vault sync via Workload Identity
- Prometheus + Grafana + Alertmanager — Full observability stack
- Jaeger — Distributed tracing
- GitHub Actions Self-Hosted Runners on AKS (ARC)
- 18 GitHub Copilot Chat Agents (.agent.md format) with handoff orchestration
- 14 MCP Servers for AI-tool communication (azure, github, terraform, kubernetes, helm, docker, git, bash, filesystem, defender, purview, entra, copilot, engineering-intelligence)
- 19 Reusable Skills for CLI operations
- 4 Chat Modes: Architect, Reviewer, SRE, Engineering Intelligence
- 10 Reusable Prompts: deploy-platform, create-service, review-code, generate-tests, generate-docs, deploy-service, troubleshoot-incident, collect-metrics, generate-dashboard, audit-security-posture
- Developer Lightspeed — AI chat in RHDH using Llama Stack + RAG
- HCL (Terraform) — Infrastructure definitions
- Go — Terratest infrastructure tests
- Python 3.11+ — AI agents, automation scripts (FastAPI, Pydantic, structlog)
- Bash — Operational scripts (strict mode:
set -euo pipefail) - YAML — K8s manifests, Helm values, GitHub Actions, ArgoCD configs
- Rego — OPA policy definitions
- JSON — Grafana dashboards, configurations
- Markdown — Documentation, agent specs
agentic-devops-platform/
├── .github/ # GitHub config
│ ├── agents/ # 18 Copilot Chat agents (.agent.md)
│ ├── chatmodes/ # 4 chat modes (.chatmode.md)
│ ├── instructions/ # 4 code-gen instructions (terraform, kubernetes, python, engineering-intelligence)
│ ├── prompts/ # 10 reusable prompts (.prompt.md)
│ ├── skills/ # 19 operational skills
│ ├── workflows/ # 10 GitHub Actions workflows
│ ├── ISSUE_TEMPLATE/ # 28 issue templates
│ └── copilot-instructions.md # Global Copilot context
├── terraform/ # 15 IaC modules
│ ├── main.tf # Root orchestration (providers, locals, module calls)
│ ├── variables.tf # All input variables with validation
│ ├── outputs.tf # All outputs
│ └── modules/ # 15 modules (see below)
├── golden-paths/ # RHDH Software Templates
│ ├── h1-foundation/ # 6 basic templates
│ ├── h2-enhancement/ # 9 advanced templates
│ └── h3-innovation/ # 8 AI/Agent templates
├── argocd/ # GitOps configuration
│ ├── app-of-apps/root-application.yaml
│ ├── apps/ # Individual ArgoCD apps
│ ├── secrets/cluster-secret-store.yaml
│ ├── sync-policies.yaml # 5 sync presets
│ └── repo-credentials.yaml # Multi-repo support
├── deploy/helm/ # Helm values and K8s manifests
├── config/ # Platform config (apm.yml, sizing-profiles.yaml, region-availability.yaml)
├── grafana/dashboards/ # 3 JSON dashboards
├── prometheus/ # alerting-rules.yaml (50+ rules), recording-rules.yaml (40+ rules)
├── policies/ # OPA/Rego for Terraform + Gatekeeper for K8s
├── mcp-servers/ # MCP config (mcp-config.json) + USAGE.md
├── scripts/ # 15+ operational scripts
├── platform/ # Platform documentation and configuration
├── agents-templates/ # Agent specification templates
├── new-features/ # RHDH-specific features (AI agents, homepage, dynamic plugins)
├── tests/terraform/modules/ # 16 Go test files (Terratest)
├── docs/ # Research, best practices, official Red Hat PDFs
└── images-logos/ # Brand assets
All modules are in terraform/modules/ and follow consistent patterns: main.tf, variables.tf, outputs.tf, versions.tf, README.md.
| Module | Path | Purpose |
|---|---|---|
| naming | modules/naming/ |
Resource naming convention: {customer}-{env}-{resource} |
| networking | modules/networking/ |
VNet, subnets (AKS nodes, pods, PE, bastion, AppGW), NSGs, private DNS zones, route tables |
| aks-cluster | modules/aks-cluster/ |
AKS with system + user node pools, Workload Identity, Azure Policy, Defender, auto-scaling |
| container-registry | modules/container-registry/ |
ACR with geo-replication, AKS integration (AcrPull role) |
| databases | modules/databases/ |
PostgreSQL Flexible Server + Redis Cache with private endpoints |
| security | modules/security/ |
Key Vault (RBAC, soft delete), Managed Identities, Workload Identity Federation |
| argocd | modules/argocd/ |
ArgoCD via Helm with HA, SSO, RBAC, ingress |
| observability | modules/observability/ |
Log Analytics, Container Insights, Azure Managed Grafana, action groups |
| external-secrets | modules/external-secrets/ |
ESO via Helm + ClusterSecretStore linked to Key Vault |
| github-runners | modules/github-runners/ |
ARC (Actions Runner Controller) self-hosted runners |
| ai-foundry | modules/ai-foundry/ |
Cognitive Account (OpenAI), AI Search, Content Safety, model deployments |
| purview | modules/purview/ |
Microsoft Purview account with private endpoints |
| defender | modules/defender/ |
Defender for Containers, Servers, Storage, Key Vault |
| cost-management | modules/cost-management/ |
Budgets with alerts at 50/75/90/100% |
| disaster-recovery | modules/disaster-recovery/ |
Backup Vault, geo-replication, cross-region DR |
The root main.tf uses a deployment_mode variable with three presets:
| Mode | AKS Nodes | VM Size | HA | Monitoring | AI |
|---|---|---|---|---|---|
| express | 3 | Standard_D4s_v5 | No | Yes | No |
| standard | 5 | Standard_D4s_v5 | Yes | Yes | Yes |
| enterprise | 10 | Standard_D8s_v5 | Yes | Yes | Yes |
customer_name # 3-20 lowercase alphanumeric, e.g. "contoso"
environment # "dev" | "staging" | "prod"
azure_subscription_id # Azure subscription
azure_tenant_id # Microsoft Entra ID tenant
admin_group_id # Microsoft Entra ID admin group
github_org # GitHub organization
github_token # GitHub PAT (sensitive)enable_databases = true # PostgreSQL + Redis
enable_container_registry = true # ACR
enable_argocd = true # GitOps
enable_external_secrets = true # ESO
enable_observability = true # Prometheus/Grafana
enable_github_runners = false # Self-hosted runners
enable_ai_foundry = false # Microsoft Foundry (H3)
enable_defender = false # Microsoft Defender
enable_purview = false # Data governance
enable_cost_management = false # Budget alerts
enable_disaster_recovery = false # DR confignetworking → security → aks-cluster → databases
→ container-registry
→ ai-foundry
→ observability → argocd
→ external-secrets
→ github-runners
Located in .github/agents/. Each agent uses YAML frontmatter with tools, infer, skills, handoffs and a three-tier boundary system (ALWAYS / ASK FIRST / NEVER).
| Agent | Invoke | Domain |
|---|---|---|
@architect |
@architect Design a microservice |
System architecture, Microsoft Foundry, multi-agent design |
@platform |
@platform Register a Golden Path |
RHDH portal, IDP, developer experience |
@devops |
@devops Set up GitOps |
CI/CD, pipelines, GitOps, MLOps |
@sre |
@sre Create runbook |
Observability, SLOs, incident response |
@terraform |
@terraform Create AKS module |
Infrastructure as Code |
@security |
@security Scan for vulnerabilities |
Compliance, policies, scanning |
@reviewer |
@reviewer Review this PR |
Code review, quality gates |
@deploy |
@deploy Deploy to dev |
End-to-end deployment orchestration |
@test |
@test Generate tests |
Testing, validation, QA |
@docs |
@docs Generate API docs |
Documentation |
@onboarding |
@onboarding Set up new team |
Team onboarding guidance |
@template-engineer |
@template-engineer Create template |
Golden Path / Software Template creation |
@context-architect |
@context-architect Plan changes |
Multi-file change planning, dependency tracing |
@github-integration |
@github-integration Setup GHAS |
GitHub App, org discovery, Actions, Packages |
@ado-integration |
@ado-integration Migrate from ADO |
Azure DevOps PAT, repos, pipelines, boards |
@hybrid-scenarios |
@hybrid-scenarios Scenario A |
GitHub + ADO coexistence (scenarios A/B/C) |
@azure-portal-deploy |
@azure-portal-deploy Provision AKS |
Azure portal AKS, Key Vault, PostgreSQL, ACR |
@engineering-intelligence |
@engineering-intelligence Collect DORA metrics |
DORA metrics, Copilot analytics, GHAS security posture, developer productivity |
@rhdh-architect |
@rhdh-architect Design a custom plugin |
RHDH/Backstage plugin architecture, frontend wiring, component specs, ADRs |
Deployment: @onboarding → @architect → @terraform → @deploy → @sre
Security: @reviewer → @security → @devops (remediate) → @test
Templates: @platform → @template-engineer → @devops → @security
Multi-file: Any agent → @context-architect → @test → @docs
Hybrid: @github-integration + @ado-integration → @hybrid-scenarios → @deploy
Intelligence: @engineering-intelligence → @platform (RHDH dashboard) → @sre (SLO correlation)
RHDH Portal: @rhdh-architect → @platform → @deploy → @sre
Defined in mcp-servers/mcp-config.json. Access matrix in mcp-servers/USAGE.md.
Core (14): azure, github, terraform, kubernetes, helm, docker, git, bash, filesystem, defender, purview, entra, copilot, engineering-intelligence, backstage
New (2): openshift (oc CLI for ARO deployments), argocd (ArgoCD CLI for GitOps)
Read-only (always allowed): az resource list/show, kubectl get, oc get, gh pr/issue view, helm list, terraform state list/show, argocd app list/get
Requires confirmation: terraform apply/destroy, kubectl apply/delete, oc apply/delete, helm install/upgrade/uninstall, az resource delete, az aks scale, az aro create/delete, argocd app sync
Forbidden (never): kubectl delete namespace production, terraform destroy -auto-approve, kubectl get secret -o yaml, az keyvault secret show --query value, az role assignment create --role Owner
- Terraform >= 1.5.0, always pin provider versions
- snake_case for variables and resources
- Tag ALL resources with:
environment,project,owner,cost-center - Use Workload Identity (never service principal secrets)
- Enable private endpoints for all PaaS services
- Validate variables with
validation {}blocks - Document every variable with
description
- kebab-case for names and labels
- Always set resource
requestsandlimits - Run containers as non-root, read-only rootfs
- Use standard labels:
app.kubernetes.io/{name,instance,version} - Configure liveness and readiness probes
- Apply network policies per namespace
- Python 3.11+, FastAPI for APIs, Pydantic for validation, structlog for logging
- Follow PEP 8, use Black + isort + Flake8
#!/usr/bin/env bashwithset -euo pipefail- Include usage instructions, validate inputs
- Use
readonlyfor constants, meaningful variable names
<type>(<scope>): <description>
Types: feat, fix, docs, refactor, test, chore, ci, infra
Scopes: terraform, k8s, argocd, agents, golden-paths, scripts, docs
main— Protected, requires PR + approvalfeature/*,bugfix/*,hotfix/*,release/*
| File | Purpose |
|---|---|
config/apm.yml |
Agent Package Manager manifest (dependencies, instructions, prompts, agents, compilation targets for VSCode/Claude/Codex) |
config/sizing-profiles.yaml |
T-shirt sizing (Small/Medium/Large/XLarge) with detailed infra specs per profile |
config/region-availability.yaml |
Azure region matrix with Tier 1/2 support, AI model availability per region, quota requirements per deployment mode, and deployment patterns. Agents MUST consult before recommending any Azure service in a region. |
.pre-commit-config.yaml |
14 pre-commit hooks (terraform, shell, K8s, YAML, markdown, secrets) |
.tflint.hcl |
TFLint rules (Azure-specific) |
.yamllint.yml |
YAML lint rules (200 char max, comments allowed) |
.markdownlint.json |
Markdown lint rules (proper names: Kubernetes, Azure, RHDH) |
.secrets.baseline |
detect-secrets baseline with 13+ detectors |
.terraform-docs.yml |
Auto-generated Terraform module documentation |
CODEOWNERS |
Code ownership (@platform-team, @infra-team, @security-team, @devops-team, @ai-team, @docs-team) |
Root Application in argocd/app-of-apps/root-application.yaml manages all child apps. Sync waves ensure correct deployment order:
- cert-manager, external-dns (Wave 1)
- ingress-nginx (Wave 2)
- prometheus, jaeger (Wave 3)
- Red Hat Developer Hub (Wave 4)
- Team namespaces, applications (Wave 5+)
| Preset | Auto-Sync | Self-Heal | Prune | Use |
|---|---|---|---|---|
dev-auto-sync |
Yes | Yes | Yes | Dev environments |
staging-auto-sync |
Yes | Yes | Yes | Staging |
prod-manual-sync |
No | No | No | Production (manual approval) |
infra-careful-sync |
Yes | Yes | No | Critical infrastructure |
preview-aggressive-sync |
Yes | Yes | Yes | Ephemeral/preview environments |
- alerting-rules.yaml: 50+ alerts across infrastructure (CPU, memory, disk, nodes), applications (error rate, latency), AI agents (token usage, LLM latency), GitOps (sync failures), security (cert expiry, login failures), SLOs (burn rates at 5m/1h/24h/30d windows)
- recording-rules.yaml: 40+ pre-calculated metrics for cluster utilization, app RED metrics (p50/p90/p99), SLO availability, GitOps success rates, AI agent performance
platform-overview.json— Cluster health, node status, pod distribution, resource usagecost-management.json— Budget utilization, cost trends, resource costs by taggolden-path-application.json— App RED metrics, latency, error rates, deployment frequency
Configured for: RHDH (:7007/metrics), ArgoCD (server, repo-server, controller), ingress-nginx (:10254), cert-manager (:9402), external-secrets (:8080)
In policies/terraform/azure.rego, enforced via Conftest in CI:
- Required tags (environment, project, owner, cost-center)
- TLS 1.2 enforcement on storage and PostgreSQL
- Encryption at rest for storage and Key Vault
- No public access on Storage, Key Vault, AKS
- HTTPS-only for storage accounts
- AKS must have RBAC, Managed Identity, Azure Policy, Defender
- Geo-redundant backups for PostgreSQL
- Warns on expensive VM sizes (Standard_E, Standard_M series)
5 ConstraintTemplates in policies/kubernetes/constraint-templates/:
- K8sRequiredLabels — Mandatory labels with regex validation
- K8sContainerResources — CPU/memory requests and limits required
- K8sDenyPrivileged — Block privileged containers
- K8sRequireNonRoot — Enforce non-root execution
- K8sAllowedRegistries — Restrict to approved registries only
16 test files in tests/terraform/modules/, one per module plus integration_test.go. Each test:
- Runs
t.Parallel() - Defines Terraform variables
- Calls
terraform.Init()→terraform.Validate()→terraform.Plan() - Asserts plan outputs (resource names, configurations, properties)
10 GitHub Actions workflows in .github/workflows/:
| Workflow | Trigger |
|---|---|
ci-cd.yml |
Push/PR to main — full pipeline |
ci.yml |
Push/PR — lint, test, validate |
cd.yml |
Merge to main — deploy staging/prod |
terraform-test.yml |
Changes in terraform/ — Terratest |
validate-agents.yml |
Changes in agents/ — agent spec validation |
release.yml |
Tag creation — release automation |
agent-router.yml |
Issue creation — route to correct agent |
issue-ops.yml |
Issue events — automation |
branch-protection.yml |
Scheduled — enforce branch rules |
engineering-intelligence.yml |
Scheduled (6h) + manual — collect DORA, Copilot, GHAS metrics |
scripts/deploy-full.sh— End-to-end:--environment dev --dry-runscripts/platform-bootstrap.sh— Platform setup (RHDH, ArgoCD, monitoring)scripts/bootstrap.sh— H1 infrastructure setup,--register-templates
scripts/validate-prerequisites.sh— Check CLIs (az >= 2.50, terraform >= 1.5, kubectl >= 1.28, helm >= 3.12, gh >= 2.30)scripts/validate-config.sh— Config files (tfvars, sizing, regions)scripts/validate-deployment.sh— Post-deploy health (pods, services, endpoints)scripts/validate-agents.sh— Agent specs (YAML frontmatter, boundaries, handoffs)scripts/validate-docs.sh— Documentation (links, formatting, completeness)scripts/validate-substitutions.sh— Template substitution validation
scripts/setup-github-app.sh— GitHub App for RHDH/ArgoCD authscripts/setup-identity-federation.sh— OIDC Workload Identity Federationscripts/setup-pre-commit.sh— Install pre-commit hooks (--install-toolsfor full setup)scripts/setup-branch-protection.sh— GitHub branch rulesscripts/setup-terraform-backend.sh— Azure Storage Account for remote statescripts/setup-portal.sh— RHDH portal setup and configuration
scripts/engineering-intelligence/collect-github-metrics.sh— PR cycle time, deployment frequency, contributor statsscripts/engineering-intelligence/collect-copilot-metrics.sh— Copilot acceptance rate, language/editor breakdown, seat utilizationscripts/engineering-intelligence/collect-security-metrics.sh— GHAS code scanning, secret scanning, Dependabot aggregation
scripts/migration/ado-to-github-migration.sh— ADO → GitHub migration in 6 phases
basic-cicd, security-baseline, documentation-site, web-application, new-microservice, infrastructure-provisioning
microservice (full), api-microservice, event-driven-microservice, data-pipeline, batch-job, api-gateway, gitops-deployment, ado-to-github-migration (6-phase), reusable-workflows
foundry-agent, sre-agent-integration, mlops-pipeline, multi-agent-system, copilot-extension, rag-application, ai-evaluation-pipeline, engineering-intelligence-dashboard
- Workload Identity for all AKS workloads (no static secrets)
- Managed Identity for all Azure services
- OIDC Federation for GitHub Actions
- Microsoft Entra ID SSO for RHDH, ArgoCD, Grafana
- GitHub OAuth for RHDH authentication
- Private endpoints for ALL PaaS services (Key Vault, PostgreSQL, Redis, ACR, AI Search, OpenAI)
- NSGs with deny-by-default
- VNet isolation with dedicated subnets
- No public access on any PaaS service in standard/enterprise mode
- Non-root execution enforced via Gatekeeper
- No privilege escalation
- Read-only root filesystem
- Approved container registries only
- Image scanning via Trivy
- Azure Key Vault as single source of truth
- External Secrets Operator syncs to K8s Secrets
- detect-secrets + gitleaks in pre-commit and CI
- Never store secrets in code, env vars, or Git
- Primary regions:
centralus(main),eastus(DR),eastus2(Microsoft Foundry) - LGPD: Available as opt-in compliance when deploying to
brazilsouth - SOC 2: Audit trails, access controls, monitoring
- PCI-DSS: Network segmentation, encryption
- CIS Benchmarks: Azure + Kubernetes hardening
- Dynamic Plugins: Enable/disable via YAML, no rebuild needed
- Built-in RBAC: Admin, Developer, Viewer roles with CSV policies
- Developer Lightspeed: Native AI chat with Llama Stack + RAG + BYOM support
- Enterprise Support: Red Hat commercial backing
All 31 official RHDH 1.8 documentation PDFs are converted to Markdown and available in docs/official-docs/rhdh/markdown/. Agents MUST consult these docs before recommending, troubleshooting, executing, configuring, installing, customizing, or integrating any RHDH component.
6 domain-specific skills in .github/skills/ segment the docs by topic:
rhdh-installation— Install, setup, first instance, sizingrhdh-configuration— app-config.yaml, branding, monitoring, telemetry, auditrhdh-plugins— Dynamic plugins, wiring, MCP tools, AI connectors, Orchestratorrhdh-auth-rbac— Authentication providers, RBAC policies, permissionsrhdh-catalog-templates— Software Catalog, Templates, TechDocs, GitHub integration, Scorecardsrhdh-operations— Release notes, GitOps patterns, DX best practices, upgrades
The platform supports deployment on both AKS and ARO. ARO-specific skill (aro-deployment) covers provisioning, RHDH Operator install, and ARO vs AKS differences. Additional CLIs: oc >= 4.14 (conditional), oras >= 1.1 (for custom plugins).
configs/— Dynamic plugin configurations, RBAC policies, Helm valuesfoundry/— Python-based AI agents for RHDHhomepage/— Customized RHDH homepage with quick actions, links, status widgets
# Option A: Agent-guided
@deploy Deploy the platform to dev environment
# Option B: Automated
./scripts/deploy-full.sh --environment dev --dry-run
./scripts/deploy-full.sh --environment dev
# Option C: Manual Terraform
cd terraform && terraform init
terraform plan -var-file=environments/dev.tfvars -out=tfplan
terraform apply tfplan./scripts/validate-prerequisites.shpre-commit run --all-files # All hooks
pre-commit run terraform_fmt --all-files # Specific hookcd tests/terraform && go test -v ./modules/..../scripts/setup-portal.sh./scripts/bootstrap.sh --register-templates- APM compilation targets: The
config/apm.ymlexplicitly listsCLAUDE.mdand.claude/commands/and.claude/skills/as Claude compilation targets - Deployment order matters: Terraform modules have strict dependency chains (networking → security → AKS → everything else)
- Feature flags control scope: Most modules are behind
enable_*variables — don't assume everything is deployed - Three deployment modes: express (minimal), standard (production), enterprise (HA/multi-zone) — affects sizing, HA, and which modules are enabled
- Security is non-negotiable: No public access, no static secrets, Workload Identity everywhere, TLS 1.2+ always
- Agent handoffs are key: The 17 agents are designed to collaborate — understand the orchestration flows before suggesting changes
- Golden Paths are RHDH Software Templates: They follow Backstage template format with
template.yaml,skeleton/, and parameters - Observability is comprehensive: 50+ alert rules, 40+ recording rules, 3 dashboards — changes should maintain this coverage
- Policy as Code is enforced: OPA policies for Terraform (Conftest in CI) and Gatekeeper constraints for K8s runtime — all code must comply
- US-primary: Default region is
centralus(Central US). DR region:eastus. Microsoft Foundry useseastus2for model availability. LGPD compliance available as opt-in withbrazilsouth - Engineering Intelligence is Faros AI-inspired: The
@engineering-intelligenceagent provides DORA metrics, Copilot analytics, GHAS security posture, and developer productivity dashboards — all sourced from GitHub APIs and displayed as RHDH dynamic plugin tabs