Skip to content

sonali-rajput/noc-ai-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NOC Agent - Multi-LLM Network Operations Center

A sophisticated, multi-agent system for automated alert processing and incident response using Large Language Models (LLMs) and the Model Context Protocol (MCP).

πŸ—οΈ Architecture

The NOC Agent implements a supervisor + specialized agents pattern with MCP-based communication:

Alerts β†’ Manager Agent β†’ Metrics Evaluation β†’ RCA Agent β†’ Triage Agent β†’ Notifications
                   ↓              ↓              ↓           ↓
               MCP Servers    Enrichment     Analysis    Ownership

Architecture Diagram

Agents

  • Manager Agent: Orchestrates workflow and makes routing decisions
  • Metrics Evaluation Agent: Enriches alerts with metrics, logs, and traces
  • RCA Agent: Performs root cause analysis using enriched data
  • Triage Agent: Determines ownership, escalation, and notifications

Technology Stack

  • Framework: LangGraph for workflow orchestration
  • LLMs: OpenAI GPT-4, Google Gemini
  • Communication: Model Context Protocol (MCP)
  • API: FastAPI with async support
  • Storage: PostgreSQL + Redis
  • Monitoring: Grafana integration via MCP

Techstack Diagram

Logical Architecture Layers:

  1. Core AI (Yellow) - The "brain" of the system
  2. Integration (Green) - How data flows in/out
  3. Data (Purple) - Where information is stored/processed
  4. Infrastructure (Red) - Where everything runs
  5. Monitoring (Pink) - How system health is tracked

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • Docker & Docker Compose
  • OpenAI API key
  • Google Gemini API key
  • Grafana instance (optional)

Installation

  1. Clone and setup:
git clone <repository>
cd noc
cp .env.example .env
  1. Configure environment variables:
# Edit .env with your API keys
OPENAI_API_KEY=your_openai_key
GEMINI_API_KEY=your_gemini_key
GRAFANA_URL=your_grafana_url
GRAFANA_API_KEY=your_grafana_key
  1. Install dependencies:
pip install -e .

Running with Docker Compose

# Start all services
docker-compose up -d

# Check health
curl http://localhost:8000/health

# View logs
docker-compose logs -f noc-agent

Running Locally

# Start dependencies
docker-compose up -d redis postgres

# Install dependencies
pip install -r requirements.txt

# Start the API server
noc-agent serve

# Or run directly
python -m noc_agent.api.main

πŸ“ Usage

CLI Commands

# Start the API server
noc-agent serve --host 0.0.0.0 --port 8000

# Process a single alert
noc-agent process-alert "High CPU Usage" "CPU usage exceeded threshold" --severity high

# Test individual agents
noc-agent test-agents

# Show configuration
noc-agent config

API Endpoints

Create Alert

curl -X POST http://localhost:8000/alerts \
  -H "Content-Type: application/json" \
  -d '{
    "title": "High Memory Usage",
    "description": "Memory usage exceeded 85%",
    "severity": "high",
    "source_system": "monitoring",
    "source_component": "web-server-01",
    "labels": {"service": "web", "team": "platform"}
  }'

Check Processing Status

curl http://localhost:8000/alerts/{alert_id}/status

Get Alert Details

curl http://localhost:8000/alerts/{alert_id}

Demo Workflow

Run the complete demo with sample alerts:

python examples/sample_alerts.py

This will:

  1. Send 5 different types of alerts
  2. Monitor their processing in real-time
  3. Display a summary of results

πŸ”§ Configuration

Environment Variables

Variable Description Default
OPENAI_API_KEY OpenAI API key Required
GEMINI_API_KEY Google Gemini API key Required
GRAFANA_URL Grafana instance URL Required for metrics
GRAFANA_API_KEY Grafana API key Required for metrics
DATABASE_URL PostgreSQL connection string Local default
REDIS_URL Redis connection string Local default
LOG_LEVEL Logging level INFO
DEBUG Enable debug mode false

Agent Models

Configure which LLM models each agent uses:

MANAGER_AGENT_MODEL=gpt-4
METRICS_AGENT_MODEL=gemini-1.5-pro
RCA_AGENT_MODEL=gpt-4
TRIAGE_AGENT_MODEL=gpt-3.5-turbo

MCP Configuration

The system uses MCP servers for external data sources:

{
  "mcpServers": {
    "staging-grafana": {
      "type": "stdio",
      "command": "docker",
      "args": ["run", "-i", "--rm", "-e", "GRAFANA_URL", "-e", "GRAFANA_API_KEY", "mcp/grafana:latest", "-t", "stdio"],
      "env": {
        "GRAFANA_URL": "https://grafana.internal.example.com",
        "GRAFANA_API_KEY": "your_api_key"
      }
    }
  }
}

πŸ” Monitoring

Health Checks

# API health
curl http://localhost:8000/health

# System metrics
curl http://localhost:8000/metrics

Logging

Structured JSON logging is used throughout:

# View real-time logs
docker-compose logs -f noc-agent

# Filter by agent
docker-compose logs noc-agent | grep "manager"

Grafana Dashboards

Access Grafana at http://localhost:3000 (if using monitoring profile):

  • Username: admin
  • Password: admin Dashoard Diagram

πŸ§ͺ Testing

Unit Tests

pytest tests/

Integration Tests

pytest tests/integration/

Agent Testing

# Test individual agents
noc-agent test-agents

# Test with specific alert
python -c "
from noc_agent.agents.manager import ManagerAgent
import asyncio

async def test():
    agent = ManagerAgent()
    # Test implementation here

asyncio.run(test())
"

Deploymnt Diagram

deployment Diagram

πŸ”’ Security

  • API keys are never logged or exposed
  • All external communications use HTTPS
  • Input validation on all endpoints
  • Rate limiting implemented
  • Secrets management via environment variables

πŸ“Š Performance

Benchmarks

Typical processing times per alert:

  • Simple alerts: 5-15 seconds
  • Complex alerts: 15-45 seconds
  • Critical alerts: < 30 seconds (prioritized)

Scaling

  • Horizontal scaling via multiple API instances
  • Redis for shared state and queuing
  • PostgreSQL for persistent storage
  • Async processing throughout

About

A sophisticated, multi-agent system for automated alert processing and incident response using Large Language Models (LLMs) and the Model Context Protocol (MCP).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages