NOC Agent - Multi-LLM Network Operations Center

A sophisticated, multi-agent system for automated alert processing and incident response using Large Language Models (LLMs) and the Model Context Protocol (MCP).

🏗️ Architecture

The NOC Agent implements a supervisor + specialized agents pattern with MCP-based communication:

Alerts → Manager Agent → Metrics Evaluation → RCA Agent → Triage Agent → Notifications
                   ↓              ↓              ↓           ↓
               MCP Servers    Enrichment     Analysis    Ownership

Agents

Manager Agent: Orchestrates workflow and makes routing decisions
Metrics Evaluation Agent: Enriches alerts with metrics, logs, and traces
RCA Agent: Performs root cause analysis using enriched data
Triage Agent: Determines ownership, escalation, and notifications

Technology Stack

Framework: LangGraph for workflow orchestration
LLMs: OpenAI GPT-4, Google Gemini
Communication: Model Context Protocol (MCP)
API: FastAPI with async support
Storage: PostgreSQL + Redis
Monitoring: Grafana integration via MCP

Logical Architecture Layers:

Core AI (Yellow) - The "brain" of the system
Integration (Green) - How data flows in/out
Data (Purple) - Where information is stored/processed
Infrastructure (Red) - Where everything runs
Monitoring (Pink) - How system health is tracked

🚀 Quick Start

Prerequisites

Python 3.11+
Docker & Docker Compose
OpenAI API key
Google Gemini API key
Grafana instance (optional)

Installation

Clone and setup:

git clone <repository>
cd noc
cp .env.example .env

Configure environment variables:

# Edit .env with your API keys
OPENAI_API_KEY=your_openai_key
GEMINI_API_KEY=your_gemini_key
GRAFANA_URL=your_grafana_url
GRAFANA_API_KEY=your_grafana_key

Install dependencies:

pip install -e .

Running with Docker Compose

# Start all services
docker-compose up -d

# Check health
curl http://localhost:8000/health

# View logs
docker-compose logs -f noc-agent

Running Locally

# Start dependencies
docker-compose up -d redis postgres

# Install dependencies
pip install -r requirements.txt

# Start the API server
noc-agent serve

# Or run directly
python -m noc_agent.api.main

📝 Usage

CLI Commands

# Start the API server
noc-agent serve --host 0.0.0.0 --port 8000

# Process a single alert
noc-agent process-alert "High CPU Usage" "CPU usage exceeded threshold" --severity high

# Test individual agents
noc-agent test-agents

# Show configuration
noc-agent config

API Endpoints

Create Alert

curl -X POST http://localhost:8000/alerts \
  -H "Content-Type: application/json" \
  -d '{
    "title": "High Memory Usage",
    "description": "Memory usage exceeded 85%",
    "severity": "high",
    "source_system": "monitoring",
    "source_component": "web-server-01",
    "labels": {"service": "web", "team": "platform"}
  }'

Check Processing Status

curl http://localhost:8000/alerts/{alert_id}/status

Get Alert Details

curl http://localhost:8000/alerts/{alert_id}

Demo Workflow

Run the complete demo with sample alerts:

python examples/sample_alerts.py

This will:

Send 5 different types of alerts
Monitor their processing in real-time
Display a summary of results

🔧 Configuration

Environment Variables

Variable	Description	Default
`OPENAI_API_KEY`	OpenAI API key	Required
`GEMINI_API_KEY`	Google Gemini API key	Required
`GRAFANA_URL`	Grafana instance URL	Required for metrics
`GRAFANA_API_KEY`	Grafana API key	Required for metrics
`DATABASE_URL`	PostgreSQL connection string	Local default
`REDIS_URL`	Redis connection string	Local default
`LOG_LEVEL`	Logging level	INFO
`DEBUG`	Enable debug mode	false

Agent Models

Configure which LLM models each agent uses:

MANAGER_AGENT_MODEL=gpt-4
METRICS_AGENT_MODEL=gemini-1.5-pro
RCA_AGENT_MODEL=gpt-4
TRIAGE_AGENT_MODEL=gpt-3.5-turbo

MCP Configuration

The system uses MCP servers for external data sources:

{
  "mcpServers": {
    "staging-grafana": {
      "type": "stdio",
      "command": "docker",
      "args": ["run", "-i", "--rm", "-e", "GRAFANA_URL", "-e", "GRAFANA_API_KEY", "mcp/grafana:latest", "-t", "stdio"],
      "env": {
        "GRAFANA_URL": "https://grafana.internal.example.com",
        "GRAFANA_API_KEY": "your_api_key"
      }
    }
  }
}

🔍 Monitoring

Health Checks

# API health
curl http://localhost:8000/health

# System metrics
curl http://localhost:8000/metrics

Logging

Structured JSON logging is used throughout:

# View real-time logs
docker-compose logs -f noc-agent

# Filter by agent
docker-compose logs noc-agent | grep "manager"

Grafana Dashboards

Access Grafana at http://localhost:3000 (if using monitoring profile):

Username: admin
Password: admin

🧪 Testing

Unit Tests

pytest tests/

Integration Tests

pytest tests/integration/

Agent Testing

# Test individual agents
noc-agent test-agents

# Test with specific alert
python -c "
from noc_agent.agents.manager import ManagerAgent
import asyncio

async def test():
    agent = ManagerAgent()
    # Test implementation here

asyncio.run(test())
"

Deploymnt Diagram

🔒 Security

API keys are never logged or exposed
All external communications use HTTPS
Input validation on all endpoints
Rate limiting implemented
Secrets management via environment variables

📊 Performance

Benchmarks

Typical processing times per alert:

Simple alerts: 5-15 seconds
Complex alerts: 15-45 seconds
Critical alerts: < 30 seconds (prioritized)

Scaling

Horizontal scaling via multiple API instances
Redis for shared state and queuing
PostgreSQL for persistent storage
Async processing throughout

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docker		docker
docs/images		docs/images
examples		examples
noc-agent		noc-agent
src/noc_agent		src/noc_agent
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
analyze_real_pod_crash.py		analyze_real_pod_crash.py
architecture.png		architecture.png
check_pod_crashes.py		check_pod_crashes.py
comprehensive_performance_test.py		comprehensive_performance_test.py
comprehensive_test.py		comprehensive_test.py
demo_complete_analysis.py		demo_complete_analysis.py
demo_pod_crash_detection.py		demo_pod_crash_detection.py
demo_real_grafana.py		demo_real_grafana.py
docker-compose-weaviate.yml		docker-compose-weaviate.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
get_real_grafana_alerts.py		get_real_grafana_alerts.py
mock_data_generator.py		mock_data_generator.py
proposal.pdf		proposal.pdf
pyproject.toml		pyproject.toml
quick_test.py		quick_test.py
requirements.txt		requirements.txt
run_complete_analysis.py		run_complete_analysis.py
setup_pod_crash_monitoring.py		setup_pod_crash_monitoring.py
system_demo.py		system_demo.py
test_agent_grafana_integration.py		test_agent_grafana_integration.py
test_agent_workflow.py		test_agent_workflow.py
test_async_processing.py		test_async_processing.py
test_demo.py		test_demo.py
test_deployment.sh		test_deployment.sh
test_grafana_http.py		test_grafana_http.py
test_grafana_mcp.py		test_grafana_mcp.py
test_incident_routing.py		test_incident_routing.py
test_mcp_integration.py		test_mcp_integration.py
test_noc_simple.py		test_noc_simple.py
test_pod_crash_curl.sh		test_pod_crash_curl.sh
test_weaviate_integration.py		test_weaviate_integration.py
webhook_test_client.py		webhook_test_client.py

Folders and files

Latest commit

History

Repository files navigation

NOC Agent - Multi-LLM Network Operations Center

🏗️ Architecture

Agents

Technology Stack

Logical Architecture Layers:

🚀 Quick Start

Prerequisites

Installation

Running with Docker Compose

Running Locally

📝 Usage

CLI Commands

API Endpoints

Create Alert

Check Processing Status

Get Alert Details

Demo Workflow

🔧 Configuration

Environment Variables

Agent Models

MCP Configuration

🔍 Monitoring

Health Checks

Logging

Grafana Dashboards

🧪 Testing

Unit Tests

Integration Tests

Agent Testing

Deploymnt Diagram

🔒 Security

📊 Performance

Benchmarks

Scaling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages