Autonomous AI-Powered Kubernetes Incident Management Platform
Open-source AIOps: eBPF telemetry + causal graphs + LLM agents = automated incident lifecycle
Scorching automates the full incident lifecycle in Kubernetes: Observe → Analyze → Plan → Apply → Verify. Combines eBPF telemetry (Tetragon), causal graph analysis (Neo4j), and LLM decision-making (LangGraph + Ollama).
- Observe — eBPF kernel telemetry (Tetragon) + OpenTelemetry + Prometheus
- Analyze — Causal RCA with Neo4j, neuro-symbolic reasoning, business impact prediction
- Plan — LLM-generated remediation via LangGraph orchestrator
- Apply — Automated K8s remediation (scale, restart, rollback, canary via Argo Rollouts)
- Verify — Health checks with retry, alert suppression, model drift detection
- AI Chat — DevInfra agent with model selection (qwen3.5, deepseek-r1, llama3.2)
- Dashboard — Incidents, metrics, forecasts, audit trail
- Governance — GDPR/SOC2 compliance before autonomous actions
- Security — eBPF threat detection, NetworkPolicy, RBAC
- GitOps — ArgoCD ApplicationSet with auto-sync
| Feature | Commercial AIOps | Scorching |
|---|---|---|
| Cost | $50k–$500k/year | Free (open-source) |
| eBPF telemetry | Partial | Full (Tetragon) |
| Causal graph RCA | Proprietary | Neo4j (open) |
| LLM agent | Limited | LangGraph + local LLM |
| Self-hosted | Limited | Yes (kind/K8s) |
- Docker (8GB+ RAM, 4+ CPU cores)
- Linux kernel ≥ 5.8 (for eBPF/Tetragon)
kubectl,helm,kind(auto-installed if missing)
git clone https://gitverse.ru/necrustulum/scorching-aiops.git
cd scorching-aiops
./deploy-all.shOne command deploys: kind cluster → Kafka → ArgoCD → ClickHouse → Neo4j → Prometheus → Grafana → Tetragon → 12 microservices → WebUI → LLM model. ~15–30 min.
./apps/webui/aiops.sh portforward| Service | URL | Credentials |
|---|---|---|
| WebUI | http://localhost:9090 | — |
| Ingress | http://localhost | — |
| ArgoCD | https://localhost:8080 | admin / (see output) |
| Grafana | http://localhost:3000 | admin / aiops-admin |
| Neo4j | http://localhost:7474 | neo4j / neo4j-aiops-password |
./deploy-all.sh --freshWebUI (Next.js) → Backend (FastAPI) → Neo4j + ClickHouse + Kafka
↓
aiops-orchestrator (LangGraph: Observe→Analyze→Plan→Apply→Verify)
├── causal-ai-correlator ├── remediation-controller (Go)
├── llm-router → Ollama ├── governance-agent
├── security-agent ├── business-impact-predictor
├── neuro-symbolic-reasoner ├── alert-suppression-service
├── model-maintenance └── Tetragon (eBPF)
| Service | Lang | Role |
|---|---|---|
| aiops-orchestrator | Python | LangGraph agent: full OODA loop |
| causal-ai-correlator | Python | Builds RCA graphs in Neo4j |
| remediation-controller | Go | Executes kubectl (scale/restart/rollback) |
| llm-router | Python | Routes to Ollama/vLLM/API |
| governance-agent | Python | Policy compliance checks |
| security-agent | Python | Threat detection |
| business-impact-predictor | Python | Revenue impact estimation |
| neuro-symbolic-reasoner | Python | Hybrid RCA |
| alert-suppression-service | Python | Dynamic noise reduction |
| model-maintenance-service | Python | Drift detection |
| webui-backend | Python | FastAPI backend (47+ endpoints) |
| webui-frontend | TypeScript | Next.js dashboard + AI chat |
Infrastructure: Kubernetes (kind) · Kafka (KRaft) · ArgoCD · Argo Rollouts · cert-manager Data: ClickHouse · Neo4j · OpenTelemetry Collector AI/ML: LangGraph · Ollama (qwen3.5:2b) · Neuro-symbolic reasoning Observability: Prometheus · Grafana · Tetragon (eBPF) Security: Tetragon TracingPolicy · NetworkPolicy · RBAC · Governance Agent
# Full platform verification
./verify-all.sh
# AI chat test
curl -s http://localhost/api/devinfra/chat \
-X POST -H "Content-Type: application/json" \
-d '{"message":"cluster status","namespace":"all","thinking":false}'
# Chaos test
kubectl delete pod -n platform-webui -l app=webui-backendScorching — self-hosted AIOps-платформа для Kubernetes. Полный цикл управления инцидентами: Observe → Analyze → Plan → Apply → Verify. eBPF (Tetragon) + Neo4j (каузальный граф) + LangGraph (LLM-агент) + Ollama (qwen3.5:2b).
git clone https://gitverse.ru/necrustulum/scorching-aiops.git
cd scorching-aiops
./deploy-all.shОдна команда разворачивает: kind-кластер, Kafka, ArgoCD, ClickHouse, Neo4j, Prometheus, Grafana, Tetragon, 12 микросервисов, WebUI и LLM-модель.
- Полный цикл AIOps: Observe → Analyze → Plan → Apply → Verify
- eBPF: Tetragon DaemonSet для безагентного сбора событий ядра
- Каузальный граф: Neo4j для причинно-следственных связей
- LLM-агент: LangGraph-оркестратор с локальной моделью
- AI-чат: DevInfra-агент с выбором модели и режимом рассуждений
- GitOps: ArgoCD с автосинком и самовосстановлением
- 12 микросервисов: Каждый решает конкретную подзадачу AIOps
See CONTRIBUTING.md
See SECURITY.md