CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Development Commands

Running the Application

# Start all services (recommended for development)
npm run start:all    # Starts Next.js, Prometheus exporter, and monitors with nodemon

# Individual services
npm run dev          # Next.js development server (port 3000)
npm run exporter     # Prometheus metrics exporter (port 3002)
npm run prometheus   # Prometheus in Docker (port 9090)

# Production
npm run build        # Build Next.js for production
npm run start        # Start Next.js production server
docker-compose up    # Deploy all services via Docker

Testing

npm run test         # Run test environment with test endpoints

Architecture Overview

This is a three-tier monitoring system:

Prometheus Exporter (server/exporter.js): Express server that polls configured endpoints and exposes metrics
Prometheus Server: Stores time-series data (runs in Docker)
Next.js Frontend (app/): Displays real-time status and historical data

Key Architectural Patterns

Service Configuration: All monitored endpoints are defined in server/endpoints.json
Metrics Collection: The exporter polls services concurrently (3 at a time) with retry logic (3 attempts, exponential backoff)
Alert System: 5-minute grace period before Slack notifications to avoid alert fatigue from transient failures
Data Flow: Exporter → Prometheus → Next.js API routes → React components

Frontend Architecture

App Router: Uses Next.js 14 App Router with TypeScript
API Routes: Located in app/api/ for fetching Prometheus data
Custom Hooks: Reusable data fetching logic in hooks/
Component Library: Uses shadcn/ui components in components/ui/
State Management: React hooks for local state, no global state library

Important Implementation Details

Response Validation: Services can define expected response structure in endpoints.json
Downtime Tracking: Maintains in-memory state for downtime duration and alert status
Real-time Updates: Frontend polls every second for live status
Historical Data: 30-day uptime history with daily aggregation

Common Development Tasks

Adding a New Service to Monitor

Edit server/endpoints.json to add the service configuration
Include URL, name, group, and optionally expectedResponse/body/headers
Restart the exporter to pick up changes

Modifying UI Components

Main components: components/UptimeMonitor.tsx, components/LatencyMonitor.tsx
Use existing UI components from components/ui/ (shadcn/ui)
Follow the established pattern of using TypeScript interfaces in types/

Working with Metrics

Metrics are exposed at http://localhost:3002/metrics
Two main metrics: service_up (0/1) and service_response_time (milliseconds)
Prometheus queries are constructed in the API routes and hooks

Environment Configuration

Key environment variables (set in .env.local for development):

SLACK_WEBHOOK_URL: For downtime notifications
PROMETHEUS_URL: Default is http://localhost:9090
GRACE_PERIOD_MS: Alert delay (default: 300000ms = 5 minutes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Development Commands

Running the Application

Testing

Architecture Overview

Key Architectural Patterns

Frontend Architecture

Important Implementation Details

Common Development Tasks

Adding a New Service to Monitor

Modifying UI Components

Working with Metrics

Environment Configuration

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Development Commands

Running the Application

Testing

Architecture Overview

Key Architectural Patterns

Frontend Architecture

Important Implementation Details

Common Development Tasks

Adding a New Service to Monitor

Modifying UI Components

Working with Metrics

Environment Configuration