Skip to content

necrustulum/scorching-aiops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7,269 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Scorching AIOps

Scorching AIOps

Autonomous AI-Powered Kubernetes Incident Management Platform

Open-source AIOps: eBPF telemetry + causal graphs + LLM agents = automated incident lifecycle

English · Русский · Quick Start · Architecture

Kubernetes License LangGraph eBPF LLM


Overview

Scorching automates the full incident lifecycle in Kubernetes: Observe → Analyze → Plan → Apply → Verify. Combines eBPF telemetry (Tetragon), causal graph analysis (Neo4j), and LLM decision-making (LangGraph + Ollama).

Features

  • Observe — eBPF kernel telemetry (Tetragon) + OpenTelemetry + Prometheus
  • Analyze — Causal RCA with Neo4j, neuro-symbolic reasoning, business impact prediction
  • Plan — LLM-generated remediation via LangGraph orchestrator
  • Apply — Automated K8s remediation (scale, restart, rollback, canary via Argo Rollouts)
  • Verify — Health checks with retry, alert suppression, model drift detection
  • AI Chat — DevInfra agent with model selection (qwen3.5, deepseek-r1, llama3.2)
  • Dashboard — Incidents, metrics, forecasts, audit trail
  • Governance — GDPR/SOC2 compliance before autonomous actions
  • Security — eBPF threat detection, NetworkPolicy, RBAC
  • GitOps — ArgoCD ApplicationSet with auto-sync

Comparison

Feature Commercial AIOps Scorching
Cost $50k–$500k/year Free (open-source)
eBPF telemetry Partial Full (Tetragon)
Causal graph RCA Proprietary Neo4j (open)
LLM agent Limited LangGraph + local LLM
Self-hosted Limited Yes (kind/K8s)

Quick Start

Prerequisites

  • Docker (8GB+ RAM, 4+ CPU cores)
  • Linux kernel ≥ 5.8 (for eBPF/Tetragon)
  • kubectl, helm, kind (auto-installed if missing)

Deploy

git clone https://gitverse.ru/necrustulum/scorching-aiops.git
cd scorching-aiops
./deploy-all.sh

One command deploys: kind cluster → Kafka → ArgoCD → ClickHouse → Neo4j → Prometheus → Grafana → Tetragon → 12 microservices → WebUI → LLM model. ~15–30 min.

Access

./apps/webui/aiops.sh portforward
Service URL Credentials
WebUI http://localhost:9090
Ingress http://localhost
ArgoCD https://localhost:8080 admin / (see output)
Grafana http://localhost:3000 admin / aiops-admin
Neo4j http://localhost:7474 neo4j / neo4j-aiops-password

Fresh Reinstall

./deploy-all.sh --fresh

Architecture

WebUI (Next.js) → Backend (FastAPI) → Neo4j + ClickHouse + Kafka
                                              ↓
    aiops-orchestrator (LangGraph: Observe→Analyze→Plan→Apply→Verify)
         ├── causal-ai-correlator     ├── remediation-controller (Go)
         ├── llm-router → Ollama      ├── governance-agent
         ├── security-agent           ├── business-impact-predictor
         ├── neuro-symbolic-reasoner  ├── alert-suppression-service
         ├── model-maintenance        └── Tetragon (eBPF)

12 Microservices

Service Lang Role
aiops-orchestrator Python LangGraph agent: full OODA loop
causal-ai-correlator Python Builds RCA graphs in Neo4j
remediation-controller Go Executes kubectl (scale/restart/rollback)
llm-router Python Routes to Ollama/vLLM/API
governance-agent Python Policy compliance checks
security-agent Python Threat detection
business-impact-predictor Python Revenue impact estimation
neuro-symbolic-reasoner Python Hybrid RCA
alert-suppression-service Python Dynamic noise reduction
model-maintenance-service Python Drift detection
webui-backend Python FastAPI backend (47+ endpoints)
webui-frontend TypeScript Next.js dashboard + AI chat

Tech Stack

Infrastructure: Kubernetes (kind) · Kafka (KRaft) · ArgoCD · Argo Rollouts · cert-manager Data: ClickHouse · Neo4j · OpenTelemetry Collector AI/ML: LangGraph · Ollama (qwen3.5:2b) · Neuro-symbolic reasoning Observability: Prometheus · Grafana · Tetragon (eBPF) Security: Tetragon TracingPolicy · NetworkPolicy · RBAC · Governance Agent


Testing

# Full platform verification
./verify-all.sh

# AI chat test
curl -s http://localhost/api/devinfra/chat \
  -X POST -H "Content-Type: application/json" \
  -d '{"message":"cluster status","namespace":"all","thinking":false}'

# Chaos test
kubectl delete pod -n platform-webui -l app=webui-backend

Обзор

Scorching AIOps

Scorching — self-hosted AIOps-платформа для Kubernetes. Полный цикл управления инцидентами: Observe → Analyze → Plan → Apply → Verify. eBPF (Tetragon) + Neo4j (каузальный граф) + LangGraph (LLM-агент) + Ollama (qwen3.5:2b).

Быстрый старт

git clone https://gitverse.ru/necrustulum/scorching-aiops.git
cd scorching-aiops
./deploy-all.sh

Одна команда разворачивает: kind-кластер, Kafka, ArgoCD, ClickHouse, Neo4j, Prometheus, Grafana, Tetragon, 12 микросервисов, WebUI и LLM-модель.

Возможности

  • Полный цикл AIOps: Observe → Analyze → Plan → Apply → Verify
  • eBPF: Tetragon DaemonSet для безагентного сбора событий ядра
  • Каузальный граф: Neo4j для причинно-следственных связей
  • LLM-агент: LangGraph-оркестратор с локальной моделью
  • AI-чат: DevInfra-агент с выбором модели и режимом рассуждений
  • GitOps: ArgoCD с автосинком и самовосстановлением
  • 12 микросервисов: Каждый решает конкретную подзадачу AIOps

Contributing

See CONTRIBUTING.md

Security

See SECURITY.md

License

Apache License 2.0


Scorching
Built with 🔥 by necrustulum

(back to top)