AI-Powered Kubernetes Infrastructure Management Platform
Built on Agno · Pluggable MCP + Skills Architecture · K3s Native
English · 中文 · Project Page
Agentic Infra turns natural language into infrastructure operations. It uses LLM-powered agents to handle Kubernetes cluster deployment (Day 0/1) and intelligent operations (Day 2) — from bootstrapping a K3s cluster via SSH to diagnosing pod crashes through conversational AI.
- K3s Auto-Deploy — Provision lightweight Kubernetes clusters across bare-metal or VMs via SSH, no manual intervention
- Pluggable MCP Servers — 6 MCP servers (87+ tools) dynamically discovered and loaded from YAML config
- Pluggable Skills — Domain knowledge hot-loaded from filesystem, auto-matched to user queries
- Multi-Agent Teams — 5 specialized agents organized into 2 teams with routing and coordination
- GitOps Native — ArgoCD-driven deployments with human-approval gates
- Full Observability — VictoriaMetrics + kube-prometheus-stack + VictoriaLogs
┌─────────────────────────────────────────────────────────┐
│ Web Portal │
│ Next.js + AG-UI Protocol │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────┐
│ AgentOS Runtime │
│ Agno Framework + FastAPI │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ OpsTeam │ │ InfraTeam │ │ Workflows │ │
│ │ (route) │ │ (coordinate) │ │ │ │
│ │ ┌─────────┐ │ │ ┌──────────┐ │ │ Bootstrap │ │
│ │ │Monitor │ │ │ │InfraDeploy│ │ │ ComponentDeploy│ │
│ │ │Logging │ │ │ │Middleware│ │ │ │ │
│ │ │Maintain │ │ │ │Monitor │ │ │ │ │
│ │ └─────────┘ │ │ └──────────┘ │ └───────────────┘ │
│ └─────────────┘ └──────────────┘ │
│ │ │ ┌──────────────┐ │
│ │ Skills │ │ PromptBuilder│ │
│ │ ┌──────────┐ │ └──────────────┘ │
│ │ │ k3s-ops │ │ │
│ │ │ k8s-diag │ │ │
│ │ │ monitor │ │ │
│ │ │ logging │ │ │
│ │ │ infra │ │ │
│ │ │ middleware│ │ │
│ │ └──────────┘ │ │
└─────────┼─────────────────┼─────────────────────────────┘
│ │
┌─────────▼─────────────────▼─────────────────────────────┐
│ MCP Servers │
│ │
│ ┌────────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ linux-mcp │ │ k8s-mcp │ │argocd-mcp│ │deploy-mcp│ │
│ │ SSH ops │ │apiserver │ │ GitOps │ │Helm/kubectl││
│ └────────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ┌────────────┐ ┌──────────┐ │
│ │ gitops-mcp │ │aianalysis│ │
│ │ Git/manifests│ │Prometheus│ │
│ └────────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘
agentic-infra/
├── agent-os/ # AgentOS core service
│ ├── agents/ # 5 specialized agents
│ ├── teams/ # 2 team orchestrators (route + coordinate)
│ ├── workflows/ # Cluster bootstrap + component deploy
│ ├── tools/ # MCPRegistry, AgentFactory, PromptBuilder
│ ├── skills/ # 6 pluggable skill packs
│ │ ├── k3s-ops/ # K3s deploy & maintenance
│ │ ├── k8s-diagnostic/ # K8s troubleshooting
│ │ ├── monitoring-ops/ # VictoriaMetrics operations
│ │ ├── logging-ops/ # VictoriaLogs operations
│ │ ├── infra-deploy-guide/# Infrastructure deployment guide
│ │ └── middleware-ops/ # Middleware operations
│ ├── prompts/ # Layered prompt builder
│ └── knowledge/ # Runbooks
├── mcp-servers/ # 6 MCP servers (87+ tools)
│ ├── linux-mcp/ # SSH remote ops (22 tools)
│ ├── k8s-mcp/ # K8s apiserver direct (19 tools)
│ ├── argocd-mcp/ # ArgoCD management (11 tools)
│ ├── deploy-mcp/ # Helm + kubectl (21 tools)
│ ├── gitops-mcp/ # Git + manifest gen (14 tools)
│ └── aianalysis-mcp/ # Observability queries
├── charts/ # Helm values templates
├── portal/ # Next.js web UI
└── deploy/ # Docker Compose + K8s manifests
- Python 3.12+
- Docker & Docker Compose
- At least one LLM API key (Anthropic or DeepSeek)
# Clone
git clone git@github.com:clcc2019/agentic-infra.git
cd agentic-infra
# Configure
cp .env.example .env
# Edit .env — set at least one LLM API key
# Launch full stack
cd deploy && docker-compose up -d
# Access
# Portal: http://localhost:3000
# AgentOS API: http://localhost:8000/docsuv venv --python 3.12 && source .venv/bin/activate
uv pip install -e .
fastapi dev agent-os/app.py| Agent | Model | Responsibility | MCP Tools |
|---|---|---|---|
| MonitorAgent | DeepSeek | Metrics queries, alert rules, dashboards | aianalysis-mcp |
| LoggingAgent | DeepSeek | Log search & analysis, collection config | aianalysis-mcp |
| MaintenanceAgent | Claude | Pod/node diagnostics, cluster health checks | aianalysis + k8s-mcp + linux-mcp |
| InfraDeployAgent | Claude | K3s deploy, infra bootstrap (ArgoCD/VM/VL) | linux + k8s + argocd + deploy + gitops |
| MiddlewareAgent | Claude | Middleware lifecycle (Redis/MySQL/Kafka) | argocd + deploy + gitops |
| Server | Port | Tools | Description |
|---|---|---|---|
| linux-mcp | 8085 | 22 | SSH remote execution, system info, service management, network checks |
| k8s-mcp | 8086 | 19 | K8s apiserver direct — Pod/Node/Namespace/Resource CRUD, exec, logs |
| argocd-mcp | 8082 | 11 | ArgoCD app lifecycle, sync, rollback, repo management |
| deploy-mcp | 8083 | 21 | Helm install/upgrade/rollback, kubectl resource ops |
| gitops-mcp | 8084 | 14 | Git clone/commit/push, Helm values update, manifest gen |
| aianalysis-mcp | 8081 | — | Prometheus, VictoriaLogs, alerting, tracing |
All MCP servers are declared in mcp_servers.yaml — add new servers without touching agent code:
servers:
- name: my-custom-mcp
url: http://localhost:9090/mcp
description: My custom operations server
enabled: true
tags: [custom, ops]
trigger_keywords: [custom, special]Runtime management via Admin API:
# Register new MCP at runtime
curl -X POST http://localhost:8000/api/admin/mcp/register \
-H 'Content-Type: application/json' \
-d '{"name":"my-mcp","url":"http://localhost:9090/mcp"}'
# Check status
curl http://localhost:8000/api/admin/mcp/statusSkills are self-contained knowledge packs (Markdown + scripts + references) auto-discovered from filesystem:
skills/k3s-ops/
├── SKILL.md # Instructions + trigger keywords
├── scripts/
│ ├── install_k3s_server.sh
│ ├── install_k3s_agent.sh
│ ├── k3s_health_check.sh
│ └── k3s_backup.sh
└── references/
└── k3s_config_options.md
Progressive loading: Discovery (keyword match) → Activation (inject into prompt) → Execution (run scripts)
Validate Cluster → Deploy ArgoCD → Deploy Monitoring → Deploy Logging → Global Verify
↑ ↑ ↑
Human Approval Human Approval Human Approval
User: "Deploy K3s cluster on 3 servers"
│
├─ linux_get_system_info → Pre-flight checks
├─ linux_upload_script → Upload install script
├─ linux_execute_command → Install K3s server
├─ linux_read_file → Read node token
├─ linux_execute_command → Join worker nodes
└─ nodes_list + pods_list → Verify cluster
| Layer | Technology |
|---|---|
| Agent Framework | Agno — AgentOS + Teams + Workflows |
| LLM | Claude (complex reasoning) + DeepSeek (lightweight queries) |
| Tool Protocol | MCP — Model Context Protocol |
| Frontend | Next.js + AG-UI Protocol |
| K8s Distribution | K3s — Lightweight Kubernetes |
| IaC | Helm Charts + ArgoCD GitOps |
| Monitoring | VictoriaMetrics + kube-prometheus-stack |
| Logging | VictoriaLogs |