Monitoring Guide

Hancock ships with a full observability stack: structured logging, Prometheus metrics, Grafana dashboards, and alerting rules.

Architecture Overview
Prometheus Metrics
Grafana Dashboards
Alerting Rules
Health Checks
Structured Logging
Local Stack Setup

Architecture Overview

Hancock (port 5000)
  └── /metrics  ──────────►  Prometheus (port 9090)
                                   └──────────►  Grafana (port 3000)
                                   └──────────►  Alertmanager

All monitoring code lives in monitoring/:

File	Purpose
`monitoring/metrics_exporter.py`	Prometheus metric definitions and helpers
`monitoring/health_check.py`	Deep health checks with 30 s TTL caching
`monitoring/logging_config.py`	Structured JSON logging with request-ID correlation
`monitoring/prometheus_dashboard.py`	Programmatic Grafana dashboard generator
`monitoring/alerting_rules.yaml`	Prometheus alerting rule groups
`monitoring/grafana_dashboard.json`	Pre-built Grafana dashboard (generated)

Prometheus Metrics

Metrics are exposed at GET /metrics and collected by monitoring/metrics_exporter.py.

Available Metrics

The /metrics endpoint exposes four core counters:

Metric	Type	Labels	Description
`hancock_requests_total`	Counter	—	Total HTTP requests
`hancock_errors_total`	Counter	—	Total 4xx/5xx errors
`hancock_requests_by_endpoint`	Counter	`endpoint`	Requests per endpoint
`hancock_requests_by_mode`	Counter	`mode`	Requests per specialist mode

monitoring/metrics_exporter.py defines additional metrics (histograms, gauges) that become available when wired into the agent via middleware:

Metric	Type	Labels	Description
`hancock_request_duration_seconds`	Histogram	`method`, `endpoint`, `status_code`	HTTP request latency
`hancock_model_response_time_seconds`	Histogram	`model`, `operation`	LLM model response time
`hancock_rate_limit_exceeded_total`	Counter	`endpoint`, `client_id`	Rate limit violations
`hancock_memory_usage_bytes`	Gauge	—	Process memory usage
`hancock_active_connections`	Gauge	—	Current active connections

Scrape Configuration

Add Hancock to your prometheus.yml:

scrape_configs:
  - job_name: 'hancock'
    static_configs:
      - targets: ['hancock:5000']
    scrape_interval: 15s
    metrics_path: /metrics

The Kubernetes service.yaml includes Prometheus annotations so the Prometheus Kubernetes SD will auto-discover Hancock pods:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "5000"
  prometheus.io/path: "/metrics"

Using the Metrics Helpers

from monitoring.metrics_exporter import track_request, track_model_call

# Track an HTTP request (context manager)
with track_request(endpoint="/chat", method="POST"):
    ...

# Track a model response
with track_model_call(model="llama3.1:8b", operation="pentest"):
    response = llm.chat(...)

Grafana Dashboards

The pre-built dashboard is at monitoring/grafana_dashboard.json. Import it directly into Grafana:

Open Grafana → Dashboards → Import
Upload monitoring/grafana_dashboard.json
Select your Prometheus data source
Click Import

Regenerating the Dashboard

The dashboard is generated from monitoring/prometheus_dashboard.py:

python monitoring/prometheus_dashboard.py
# Outputs monitoring/grafana_dashboard.json

Dashboard Panels

The dashboard contains 10 panels:

Panel	Visualization	Query
Request Rate	Time series	`rate(hancock_requests_total[2m])`
Error Rate %	Time series	`rate(hancock_errors_total[2m]) / rate(hancock_requests_total[2m]) * 100`
Requests by Endpoint	Time series	`hancock_requests_by_endpoint`
Requests by Mode	Time series	`hancock_requests_by_mode`
Memory Usage	Time series	`hancock_memory_usage_bytes`
Active Connections	Time series	`hancock_active_connections`
Total Requests (stat)	Stat	`hancock_requests_total`
Total Errors (stat)	Stat	`hancock_errors_total`
Current Memory (stat)	Stat	`hancock_memory_usage_bytes`
Active Connections (stat)	Stat	`hancock_active_connections`

Dashboard refresh interval is 30 s.

Alerting Rules

Alert rules are defined in monitoring/alerting_rules.yaml and organised into three groups.

Loading the Rules

# prometheus.yml
rule_files:
  - /etc/prometheus/alerting_rules.yaml

Rule Groups

`hancock.requests` — HTTP Traffic

Alert	Condition	Severity	Description
`HancockHighErrorRate`	Error rate > 5% over 5 min	warning	Too many errors

`hancock.availability` — Service Health

Alert	Condition	Severity	Description
`HancockNoTraffic`	No requests for 5 min	critical	Service may be down

`hancock.memory` — Resource Usage

Alert	Condition	Severity	Description
`HancockMemoryGrowth`	Memory growing > 50 MiB/min over 10 min	warning	Possible memory leak
`HancockHighMemoryUsage`	Absolute memory > 1 GiB	critical	Memory ceiling breached

Alertmanager Integration

Configure Alertmanager receivers to route alerts to Slack, PagerDuty, or email. Example routing:

route:
  receiver: 'slack-critical'
  group_by: ['alertname', 'severity']
  routes:
    - match:
        severity: critical
      receiver: 'pagerduty'
    - match:
        severity: warning
      receiver: 'slack-warnings'

Health Checks

monitoring/health_check.py provides deep health checks with 30 s TTL caching on the GET /health endpoint.

Checked Components

Component	Check	Thresholds
Ollama	HTTP reachability + model list	—
NVIDIA NIM	API reachability	—
Memory	Available system memory	Warn < 512 MiB
Disk	Available disk space	Warn < 1 GiB
Prometheus	Metrics endpoint reachability	—

Response Format

{
  "status": "ok",
  "checks": {
    "ollama": { "status": "ok", "latency_ms": 12 },
    "memory": { "status": "ok", "detail": "available_mb=4096" },
    "disk":   { "status": "ok", "detail": "available_gb=42" }
  }
}

Statuses: ok | degraded | error

HTTP status codes: 200 (ok/degraded), 503 (error).

Structured Logging

monitoring/logging_config.py emits structured JSON logs with automatic request-ID injection.

Log Format

{
  "timestamp": "2024-01-15T10:23:45.123Z",
  "level": "INFO",
  "request_id": "req-a1b2c3d4",
  "message": "request_completed",
  "event": "request_completed",
  "method": "POST",
  "endpoint": "/v1/chat",
  "mode": "soc",
  "status": 200,
  "latency_ms": 142.31
}

Configuration

from monitoring.logging_config import configure_logging

configure_logging(app, log_level="INFO")

The request_id is generated per request and injected into every log line via RequestIdFilter. Noisy third-party libraries (urllib3, werkzeug, httpx) are silenced by default.

Request Correlation Behavior

When Flask logging hooks are enabled with init_flask_logging(app):

Every inbound request reads X-Request-ID if present; otherwise Hancock creates a UUIDv4.
request_started and request_completed JSON log events are emitted with:
- endpoint
- mode (from JSON payload where available)
- status (after request)
- latency_ms (after request)
- request_id
The active request ID is echoed in every API response header as X-Request-ID.
All API error responses include both:

{
  "error": "message required",
  "request_id": "7b3bbf54-d0a4-4db2-9d79-b93f8f2cd67d"
}

Webhook failures (invalid signature, empty model response, Slack/Teams notification errors) are logged with explicit request_id and structured event fields for ingestion by Grafana/Loki/ELK.

Log Level

Set via the LOG_LEVEL environment variable (DEBUG, INFO, WARNING, ERROR). Default: INFO.

Local Stack Setup

The full observability stack (Hancock + Prometheus + Grafana) is defined in deploy/docker-compose.yml:

cd deploy
docker compose up -d

# Access
# Hancock:    http://localhost:5000
# Prometheus: http://localhost:9090
# Grafana:    http://localhost:3000  (admin / admin)

Import monitoring/grafana_dashboard.json into Grafana on first run to get the pre-built dashboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring Guide

Table of Contents

Architecture Overview

Prometheus Metrics

Available Metrics

Scrape Configuration

Using the Metrics Helpers

Grafana Dashboards

Regenerating the Dashboard

Dashboard Panels

Alerting Rules

Loading the Rules

Rule Groups

`hancock.requests` — HTTP Traffic

`hancock.availability` — Service Health

`hancock.memory` — Resource Usage

Alertmanager Integration

Health Checks

Checked Components

Response Format

Structured Logging

Log Format

Configuration

Request Correlation Behavior

Log Level

Local Stack Setup

FilesExpand file tree

monitoring.md

Latest commit

History

monitoring.md

File metadata and controls

Monitoring Guide

Table of Contents

Architecture Overview

Prometheus Metrics

Available Metrics

Scrape Configuration

Using the Metrics Helpers

Grafana Dashboards

Regenerating the Dashboard

Dashboard Panels

Alerting Rules

Loading the Rules

Rule Groups

hancock.requests — HTTP Traffic

hancock.availability — Service Health

hancock.memory — Resource Usage

Alertmanager Integration

Health Checks

Checked Components

Response Format

Structured Logging

Log Format

Configuration

Request Correlation Behavior

Log Level

Local Stack Setup

`hancock.requests` — HTTP Traffic

`hancock.availability` — Service Health

`hancock.memory` — Resource Usage