Skip to content

clcc2019/agentic-infra

Repository files navigation

Agentic Infra

AI-Powered Kubernetes Infrastructure Management Platform

Built on Agno · Pluggable MCP + Skills Architecture · K3s Native

License Python K3s

English · 中文 · Project Page


Overview

Agentic Infra turns natural language into infrastructure operations. It uses LLM-powered agents to handle Kubernetes cluster deployment (Day 0/1) and intelligent operations (Day 2) — from bootstrapping a K3s cluster via SSH to diagnosing pod crashes through conversational AI.

Key Features

  • K3s Auto-Deploy — Provision lightweight Kubernetes clusters across bare-metal or VMs via SSH, no manual intervention
  • Pluggable MCP Servers — 6 MCP servers (87+ tools) dynamically discovered and loaded from YAML config
  • Pluggable Skills — Domain knowledge hot-loaded from filesystem, auto-matched to user queries
  • Multi-Agent Teams — 5 specialized agents organized into 2 teams with routing and coordination
  • GitOps Native — ArgoCD-driven deployments with human-approval gates
  • Full Observability — VictoriaMetrics + kube-prometheus-stack + VictoriaLogs

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Web Portal                           │
│              Next.js + AG-UI Protocol                   │
└──────────────────────┬──────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────┐
│                 AgentOS Runtime                          │
│              Agno Framework + FastAPI                    │
│                                                         │
│  ┌─────────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │  OpsTeam    │  │  InfraTeam   │  │   Workflows   │  │
│  │  (route)    │  │ (coordinate) │  │               │  │
│  │ ┌─────────┐ │  │ ┌──────────┐ │  │ Bootstrap     │  │
│  │ │Monitor  │ │  │ │InfraDeploy│ │  │ ComponentDeploy│ │
│  │ │Logging  │ │  │ │Middleware│ │  │               │  │
│  │ │Maintain │ │  │ │Monitor   │ │  │               │  │
│  │ └─────────┘ │  │ └──────────┘ │  └───────────────┘  │
│  └─────────────┘  └──────────────┘                      │
│         │                  │          ┌──────────────┐   │
│         │    Skills        │          │ PromptBuilder│   │
│         │  ┌──────────┐   │          └──────────────┘   │
│         │  │ k3s-ops  │   │                             │
│         │  │ k8s-diag │   │                             │
│         │  │ monitor  │   │                             │
│         │  │ logging  │   │                             │
│         │  │ infra    │   │                             │
│         │  │ middleware│  │                             │
│         │  └──────────┘   │                             │
└─────────┼─────────────────┼─────────────────────────────┘
          │                 │
┌─────────▼─────────────────▼─────────────────────────────┐
│                    MCP Servers                           │
│                                                         │
│  ┌────────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│  │ linux-mcp  │ │ k8s-mcp  │ │argocd-mcp│ │deploy-mcp│ │
│  │  SSH ops   │ │apiserver │ │  GitOps   │ │Helm/kubectl││
│  └────────────┘ └──────────┘ └──────────┘ └──────────┘ │
│  ┌────────────┐ ┌──────────┐                            │
│  │ gitops-mcp │ │aianalysis│                            │
│  │ Git/manifests│ │Prometheus│                           │
│  └────────────┘ └──────────┘                            │
└─────────────────────────────────────────────────────────┘

Project Structure

agentic-infra/
├── agent-os/                  # AgentOS core service
│   ├── agents/                # 5 specialized agents
│   ├── teams/                 # 2 team orchestrators (route + coordinate)
│   ├── workflows/             # Cluster bootstrap + component deploy
│   ├── tools/                 # MCPRegistry, AgentFactory, PromptBuilder
│   ├── skills/                # 6 pluggable skill packs
│   │   ├── k3s-ops/           #   K3s deploy & maintenance
│   │   ├── k8s-diagnostic/    #   K8s troubleshooting
│   │   ├── monitoring-ops/    #   VictoriaMetrics operations
│   │   ├── logging-ops/       #   VictoriaLogs operations
│   │   ├── infra-deploy-guide/#   Infrastructure deployment guide
│   │   └── middleware-ops/    #   Middleware operations
│   ├── prompts/               # Layered prompt builder
│   └── knowledge/             # Runbooks
├── mcp-servers/               # 6 MCP servers (87+ tools)
│   ├── linux-mcp/             #   SSH remote ops (22 tools)
│   ├── k8s-mcp/               #   K8s apiserver direct (19 tools)
│   ├── argocd-mcp/            #   ArgoCD management (11 tools)
│   ├── deploy-mcp/            #   Helm + kubectl (21 tools)
│   ├── gitops-mcp/            #   Git + manifest gen (14 tools)
│   └── aianalysis-mcp/        #   Observability queries
├── charts/                    # Helm values templates
├── portal/                    # Next.js web UI
└── deploy/                    # Docker Compose + K8s manifests

Quick Start

Prerequisites

  • Python 3.12+
  • Docker & Docker Compose
  • At least one LLM API key (Anthropic or DeepSeek)

Local Development

# Clone
git clone git@github.com:clcc2019/agentic-infra.git
cd agentic-infra

# Configure
cp .env.example .env
# Edit .env — set at least one LLM API key

# Launch full stack
cd deploy && docker-compose up -d

# Access
# Portal:      http://localhost:3000
# AgentOS API: http://localhost:8000/docs

AgentOS Only (Dev Mode)

uv venv --python 3.12 && source .venv/bin/activate
uv pip install -e .
fastapi dev agent-os/app.py

Agent Capabilities

Agent Model Responsibility MCP Tools
MonitorAgent DeepSeek Metrics queries, alert rules, dashboards aianalysis-mcp
LoggingAgent DeepSeek Log search & analysis, collection config aianalysis-mcp
MaintenanceAgent Claude Pod/node diagnostics, cluster health checks aianalysis + k8s-mcp + linux-mcp
InfraDeployAgent Claude K3s deploy, infra bootstrap (ArgoCD/VM/VL) linux + k8s + argocd + deploy + gitops
MiddlewareAgent Claude Middleware lifecycle (Redis/MySQL/Kafka) argocd + deploy + gitops

MCP Servers

Server Port Tools Description
linux-mcp 8085 22 SSH remote execution, system info, service management, network checks
k8s-mcp 8086 19 K8s apiserver direct — Pod/Node/Namespace/Resource CRUD, exec, logs
argocd-mcp 8082 11 ArgoCD app lifecycle, sync, rollback, repo management
deploy-mcp 8083 21 Helm install/upgrade/rollback, kubectl resource ops
gitops-mcp 8084 14 Git clone/commit/push, Helm values update, manifest gen
aianalysis-mcp 8081 Prometheus, VictoriaLogs, alerting, tracing

Pluggable MCP Config

All MCP servers are declared in mcp_servers.yaml — add new servers without touching agent code:

servers:
  - name: my-custom-mcp
    url: http://localhost:9090/mcp
    description: My custom operations server
    enabled: true
    tags: [custom, ops]
    trigger_keywords: [custom, special]

Runtime management via Admin API:

# Register new MCP at runtime
curl -X POST http://localhost:8000/api/admin/mcp/register \
  -H 'Content-Type: application/json' \
  -d '{"name":"my-mcp","url":"http://localhost:9090/mcp"}'

# Check status
curl http://localhost:8000/api/admin/mcp/status

Skills System

Skills are self-contained knowledge packs (Markdown + scripts + references) auto-discovered from filesystem:

skills/k3s-ops/
├── SKILL.md              # Instructions + trigger keywords
├── scripts/
│   ├── install_k3s_server.sh
│   ├── install_k3s_agent.sh
│   ├── k3s_health_check.sh
│   └── k3s_backup.sh
└── references/
    └── k3s_config_options.md

Progressive loading: Discovery (keyword match) → Activation (inject into prompt) → Execution (run scripts)

Workflows

ClusterBootstrap — Full cluster initialization

Validate Cluster → Deploy ArgoCD → Deploy Monitoring → Deploy Logging → Global Verify
                      ↑                  ↑                  ↑
                 Human Approval     Human Approval     Human Approval

K3s Deployment — End-to-end via SSH

User: "Deploy K3s cluster on 3 servers"
  │
  ├─ linux_get_system_info  →  Pre-flight checks
  ├─ linux_upload_script    →  Upload install script
  ├─ linux_execute_command  →  Install K3s server
  ├─ linux_read_file        →  Read node token
  ├─ linux_execute_command  →  Join worker nodes
  └─ nodes_list + pods_list →  Verify cluster

Tech Stack

Layer Technology
Agent Framework Agno — AgentOS + Teams + Workflows
LLM Claude (complex reasoning) + DeepSeek (lightweight queries)
Tool Protocol MCP — Model Context Protocol
Frontend Next.js + AG-UI Protocol
K8s Distribution K3s — Lightweight Kubernetes
IaC Helm Charts + ArgoCD GitOps
Monitoring VictoriaMetrics + kube-prometheus-stack
Logging VictoriaLogs

License

Apache License 2.0

About

AI-Powered Kubernetes Infrastructure Management Platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors