Scorching AIOps

Autonomous AI-Powered Kubernetes Incident Management Platform

Open-source AIOps: eBPF telemetry + causal graphs + LLM agents = automated incident lifecycle

English · Русский · Quick Start · Architecture

Overview

Scorching automates the full incident lifecycle in Kubernetes: Observe → Analyze → Plan → Apply → Verify. Combines eBPF telemetry (Tetragon), causal graph analysis (Neo4j), and LLM decision-making (LangGraph + Ollama).

Features

Observe — eBPF kernel telemetry (Tetragon) + OpenTelemetry + Prometheus
Analyze — Causal RCA with Neo4j, neuro-symbolic reasoning, business impact prediction
Plan — LLM-generated remediation via LangGraph orchestrator
Apply — Automated K8s remediation (scale, restart, rollback, canary via Argo Rollouts)
Verify — Health checks with retry, alert suppression, model drift detection
AI Chat — DevInfra agent with model selection (qwen3.5, deepseek-r1, llama3.2)
Dashboard — Incidents, metrics, forecasts, audit trail
Governance — GDPR/SOC2 compliance before autonomous actions
Security — eBPF threat detection, NetworkPolicy, RBAC
GitOps — ArgoCD ApplicationSet with auto-sync

Comparison

Feature	Commercial AIOps	Scorching
Cost	$50k–$500k/year	Free (open-source)
eBPF telemetry	Partial	Full (Tetragon)
Causal graph RCA	Proprietary	Neo4j (open)
LLM agent	Limited	LangGraph + local LLM
Self-hosted	Limited	Yes (kind/K8s)

Quick Start

Prerequisites

Docker (8GB+ RAM, 4+ CPU cores)
Linux kernel ≥ 5.8 (for eBPF/Tetragon)
kubectl, helm, kind (auto-installed if missing)

Deploy

git clone https://gitverse.ru/necrustulum/scorching-aiops.git
cd scorching-aiops
./deploy-all.sh

One command deploys: kind cluster → Kafka → ArgoCD → ClickHouse → Neo4j → Prometheus → Grafana → Tetragon → 12 microservices → WebUI → LLM model. ~15–30 min.

Access

./apps/webui/aiops.sh portforward

Service	URL	Credentials
WebUI	http://localhost:9090	—
Ingress	http://localhost	—
ArgoCD	https://localhost:8080	admin / (see output)
Grafana	http://localhost:3000	admin / aiops-admin
Neo4j	http://localhost:7474	neo4j / neo4j-aiops-password

Fresh Reinstall

./deploy-all.sh --fresh

Architecture

WebUI (Next.js) → Backend (FastAPI) → Neo4j + ClickHouse + Kafka
                                              ↓
    aiops-orchestrator (LangGraph: Observe→Analyze→Plan→Apply→Verify)
         ├── causal-ai-correlator     ├── remediation-controller (Go)
         ├── llm-router → Ollama      ├── governance-agent
         ├── security-agent           ├── business-impact-predictor
         ├── neuro-symbolic-reasoner  ├── alert-suppression-service
         ├── model-maintenance        └── Tetragon (eBPF)

12 Microservices

Service	Lang	Role
aiops-orchestrator	Python	LangGraph agent: full OODA loop
causal-ai-correlator	Python	Builds RCA graphs in Neo4j
remediation-controller	Go	Executes kubectl (scale/restart/rollback)
llm-router	Python	Routes to Ollama/vLLM/API
governance-agent	Python	Policy compliance checks
security-agent	Python	Threat detection
business-impact-predictor	Python	Revenue impact estimation
neuro-symbolic-reasoner	Python	Hybrid RCA
alert-suppression-service	Python	Dynamic noise reduction
model-maintenance-service	Python	Drift detection
webui-backend	Python	FastAPI backend (47+ endpoints)
webui-frontend	TypeScript	Next.js dashboard + AI chat

Tech Stack

Infrastructure: Kubernetes (kind) · Kafka (KRaft) · ArgoCD · Argo Rollouts · cert-manager Data: ClickHouse · Neo4j · OpenTelemetry Collector AI/ML: LangGraph · Ollama (qwen3.5:2b) · Neuro-symbolic reasoning Observability: Prometheus · Grafana · Tetragon (eBPF) Security: Tetragon TracingPolicy · NetworkPolicy · RBAC · Governance Agent

Testing

# Full platform verification
./verify-all.sh

# AI chat test
curl -s http://localhost/api/devinfra/chat \
  -X POST -H "Content-Type: application/json" \
  -d '{"message":"cluster status","namespace":"all","thinking":false}'

# Chaos test
kubectl delete pod -n platform-webui -l app=webui-backend

Обзор

Scorching — self-hosted AIOps-платформа для Kubernetes. Полный цикл управления инцидентами: Observe → Analyze → Plan → Apply → Verify. eBPF (Tetragon) + Neo4j (каузальный граф) + LangGraph (LLM-агент) + Ollama (qwen3.5:2b).

Быстрый старт

git clone https://gitverse.ru/necrustulum/scorching-aiops.git
cd scorching-aiops
./deploy-all.sh

Одна команда разворачивает: kind-кластер, Kafka, ArgoCD, ClickHouse, Neo4j, Prometheus, Grafana, Tetragon, 12 микросервисов, WebUI и LLM-модель.

Возможности

Полный цикл AIOps: Observe → Analyze → Plan → Apply → Verify
eBPF: Tetragon DaemonSet для безагентного сбора событий ядра
Каузальный граф: Neo4j для причинно-следственных связей
LLM-агент: LangGraph-оркестратор с локальной моделью
AI-чат: DevInfra-агент с выбором модели и режимом рассуждений
GitOps: ArgoCD с автосинком и самовосстановлением
12 микросервисов: Каждый решает конкретную подзадачу AIOps

Contributing

See CONTRIBUTING.md

Security

See SECURITY.md

License

Apache License 2.0

Built with 🔥 by necrustulum

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 7,269 Commits
.github/workflows		.github/workflows
.podman-wrapper		.podman-wrapper
.refact		.refact
apps		apps
charts/aiops-correlator		charts/aiops-correlator
demo		demo
docs		docs
hack		hack
infra		infra
k8s		k8s
mcp/server		mcp/server
ops		ops
platform-aiops-correlator		platform-aiops-correlator
platform		platform
refact-agent		refact-agent
refact-server		refact-server
scenarios		scenarios
scripts		scripts
services		services
tests		tests
.argocdignore		.argocdignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
PRODUCTION-CHECKLIST.md		PRODUCTION-CHECKLIST.md
README-platform.md		README-platform.md
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
aiops-e2e-test.sh		aiops-e2e-test.sh
aiops.sh		aiops.sh
apps-devinfra.yaml		apps-devinfra.yaml
build-and-push.sh		build-and-push.sh
demo.sh		demo.sh
deploy-all.sh		deploy-all.sh
deploy.sh		deploy.sh
docker-compose.dev.yml		docker-compose.dev.yml
fix-podman-kind.sh		fix-podman-kind.sh
install.ps1		install.ps1
install.sh		install.sh
platform-config-prod.yaml		platform-config-prod.yaml
platform-devinfra-policies.yaml		platform-devinfra-policies.yaml
platform-prod.yaml		platform-prod.yaml
platform-project.yaml		platform-project.yaml
platform-secrets.yaml		platform-secrets.yaml
platform.yaml		platform.yaml
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
quick-start-dev.sh		quick-start-dev.sh
quick-start-prod.ps1		quick-start-prod.ps1
quick-start-prod_wsl.sh		quick-start-prod_wsl.sh
refact-agent.iml		refact-agent.iml
setup.sh		setup.sh
test-windows.ps1		test-windows.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scorching AIOps

Overview

Features

Comparison

Quick Start

Prerequisites

Deploy

Access

Fresh Reinstall

Architecture

12 Microservices

Tech Stack

Testing

Обзор

Быстрый старт

Возможности

Contributing

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scorching AIOps

Overview

Features

Comparison

Quick Start

Prerequisites

Deploy

Access

Fresh Reinstall

Architecture

12 Microservices

Tech Stack

Testing

Обзор

Быстрый старт

Возможности

Contributing

Security

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages