Not sure where to start? This guide maps the landscape and recommends tools by role.
| Layer | Open Source | Commercial | CNCF Projects |
|---|---|---|---|
| Coding Agents | Aider, Cline, Continue | Claude Code, Cursor, Copilot | - |
| Kubernetes | K8sGPT, kubectl-ai, Headlamp | Komodor, Robusta | K8sGPT, Kagent, KAITO |
| IaC and Terraform | OpenTofu, Infracost, Checkov | Spacelift, Env0, Firefly | - |
| Incident Response | HolmesGPT, IncidentFox, Tracecat | Rootly, PagerDuty AIOps | HolmesGPT |
| Monitoring | Grafana, Prometheus | Datadog, Dynatrace, Splunk | Prometheus |
| Security | Trivy, Falco, Checkov, Semgrep | Snyk, Wiz, Prisma Cloud | Falco |
| Cost and FinOps | OpenCost, Kubecost | CAST AI, Vantage, CloudZero | OpenCost |
| MCP Servers | MCP Reference, Kubernetes MCP | AWS MCP, GitHub MCP | - |
| CI/CD | ArgoCD, Tekton, Dagger | GitLab Duo, Harness | ArgoCD, Tekton |
| Platform Engineering | Backstage, Kratix | Port, Humanitec, Cortex | Backstage |
| GitOps | Flux, Kustomize, Helm | Weave GitOps, Codefresh | Flux, Helm |
| Chaos Engineering | Chaos Mesh, Litmus | Gremlin, Steadybit | Chaos Mesh, Litmus |
- Daily IaC work: Start with Claude Code or GitHub Copilot for writing Terraform, Kubernetes manifests, and Dockerfiles.
- Cluster troubleshooting: Add K8sGPT to scan clusters and explain issues in plain English.
- Cost visibility: Use Infracost for cost estimates in Terraform PRs.
- Incident investigation: HolmesGPT combines observability telemetry with LLM reasoning for root cause analysis.
- Observability: Grafana AI provides AI-assisted query generation and SRE agents.
- Resilience testing: Chaos Mesh for fault injection in Kubernetes.
- Developer portal: Backstage for service catalogs and templates.
- GitOps delivery: ArgoCD for continuous deployment to Kubernetes.
- Continuous reconciliation: Flux for automated image updates and Helm releases.
- Vulnerability scanning: Trivy for containers, IaC, and code.
- Runtime security: Falco for threat detection in containers.
- Supply chain: Docker Scout for image analysis and CVE remediation.
- Kubernetes costs: OpenCost for vendor-neutral cost monitoring.
- Terraform costs: Infracost for cost estimates in pull requests.
- Multi-cloud visibility: Vantage for recommendations across cloud providers.
- Agent framework: LangChain or CrewAI for building custom DevOps agents.
- Tool integrations: Explore MCP Servers for connecting AI to infrastructure tools.
- Orchestration: Temporal for durable execution of long-running workflows.