Hancock ships with a full observability stack: structured logging, Prometheus metrics, Grafana dashboards, and alerting rules.
- Architecture Overview
- Prometheus Metrics
- Grafana Dashboards
- Alerting Rules
- Health Checks
- Structured Logging
- Local Stack Setup
Hancock (port 5000)
└── /metrics ──────────► Prometheus (port 9090)
└──────────► Grafana (port 3000)
└──────────► Alertmanager
All monitoring code lives in monitoring/:
| File | Purpose |
|---|---|
monitoring/metrics_exporter.py |
Prometheus metric definitions and helpers |
monitoring/health_check.py |
Deep health checks with 30 s TTL caching |
monitoring/logging_config.py |
Structured JSON logging with request-ID correlation |
monitoring/prometheus_dashboard.py |
Programmatic Grafana dashboard generator |
monitoring/alerting_rules.yaml |
Prometheus alerting rule groups |
monitoring/grafana_dashboard.json |
Pre-built Grafana dashboard (generated) |
Metrics are exposed at GET /metrics and collected by monitoring/metrics_exporter.py.
The /metrics endpoint exposes four core counters:
| Metric | Type | Labels | Description |
|---|---|---|---|
hancock_requests_total |
Counter | — | Total HTTP requests |
hancock_errors_total |
Counter | — | Total 4xx/5xx errors |
hancock_requests_by_endpoint |
Counter | endpoint |
Requests per endpoint |
hancock_requests_by_mode |
Counter | mode |
Requests per specialist mode |
monitoring/metrics_exporter.py defines additional metrics (histograms, gauges) that become available when wired into the agent via middleware:
| Metric | Type | Labels | Description |
|---|---|---|---|
hancock_request_duration_seconds |
Histogram | method, endpoint, status_code |
HTTP request latency |
hancock_model_response_time_seconds |
Histogram | model, operation |
LLM model response time |
hancock_rate_limit_exceeded_total |
Counter | endpoint, client_id |
Rate limit violations |
hancock_memory_usage_bytes |
Gauge | — | Process memory usage |
hancock_active_connections |
Gauge | — | Current active connections |
Add Hancock to your prometheus.yml:
scrape_configs:
- job_name: 'hancock'
static_configs:
- targets: ['hancock:5000']
scrape_interval: 15s
metrics_path: /metricsThe Kubernetes service.yaml includes Prometheus annotations so the Prometheus Kubernetes SD will auto-discover Hancock pods:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "5000"
prometheus.io/path: "/metrics"from monitoring.metrics_exporter import track_request, track_model_call
# Track an HTTP request (context manager)
with track_request(endpoint="/chat", method="POST"):
...
# Track a model response
with track_model_call(model="llama3.1:8b", operation="pentest"):
response = llm.chat(...)The pre-built dashboard is at monitoring/grafana_dashboard.json. Import it directly into Grafana:
- Open Grafana → Dashboards → Import
- Upload
monitoring/grafana_dashboard.json - Select your Prometheus data source
- Click Import
The dashboard is generated from monitoring/prometheus_dashboard.py:
python monitoring/prometheus_dashboard.py
# Outputs monitoring/grafana_dashboard.jsonThe dashboard contains 10 panels:
| Panel | Visualization | Query |
|---|---|---|
| Request Rate | Time series | rate(hancock_requests_total[2m]) |
| Error Rate % | Time series | rate(hancock_errors_total[2m]) / rate(hancock_requests_total[2m]) * 100 |
| Requests by Endpoint | Time series | hancock_requests_by_endpoint |
| Requests by Mode | Time series | hancock_requests_by_mode |
| Memory Usage | Time series | hancock_memory_usage_bytes |
| Active Connections | Time series | hancock_active_connections |
| Total Requests (stat) | Stat | hancock_requests_total |
| Total Errors (stat) | Stat | hancock_errors_total |
| Current Memory (stat) | Stat | hancock_memory_usage_bytes |
| Active Connections (stat) | Stat | hancock_active_connections |
Dashboard refresh interval is 30 s.
Alert rules are defined in monitoring/alerting_rules.yaml and organised into three groups.
# prometheus.yml
rule_files:
- /etc/prometheus/alerting_rules.yaml| Alert | Condition | Severity | Description |
|---|---|---|---|
HancockHighErrorRate |
Error rate > 5% over 5 min | warning | Too many errors |
| Alert | Condition | Severity | Description |
|---|---|---|---|
HancockNoTraffic |
No requests for 5 min | critical | Service may be down |
| Alert | Condition | Severity | Description |
|---|---|---|---|
HancockMemoryGrowth |
Memory growing > 50 MiB/min over 10 min | warning | Possible memory leak |
HancockHighMemoryUsage |
Absolute memory > 1 GiB | critical | Memory ceiling breached |
Configure Alertmanager receivers to route alerts to Slack, PagerDuty, or email. Example routing:
route:
receiver: 'slack-critical'
group_by: ['alertname', 'severity']
routes:
- match:
severity: critical
receiver: 'pagerduty'
- match:
severity: warning
receiver: 'slack-warnings'monitoring/health_check.py provides deep health checks with 30 s TTL caching on the GET /health endpoint.
| Component | Check | Thresholds |
|---|---|---|
| Ollama | HTTP reachability + model list | — |
| NVIDIA NIM | API reachability | — |
| Memory | Available system memory | Warn < 512 MiB |
| Disk | Available disk space | Warn < 1 GiB |
| Prometheus | Metrics endpoint reachability | — |
{
"status": "ok",
"checks": {
"ollama": { "status": "ok", "latency_ms": 12 },
"memory": { "status": "ok", "detail": "available_mb=4096" },
"disk": { "status": "ok", "detail": "available_gb=42" }
}
}Statuses: ok | degraded | error
HTTP status codes: 200 (ok/degraded), 503 (error).
monitoring/logging_config.py emits structured JSON logs with automatic request-ID injection.
{
"timestamp": "2024-01-15T10:23:45.123Z",
"level": "INFO",
"request_id": "req-a1b2c3d4",
"message": "request_completed",
"event": "request_completed",
"method": "POST",
"endpoint": "/v1/chat",
"mode": "soc",
"status": 200,
"latency_ms": 142.31
}from monitoring.logging_config import configure_logging
configure_logging(app, log_level="INFO")The request_id is generated per request and injected into every log line via RequestIdFilter. Noisy third-party libraries (urllib3, werkzeug, httpx) are silenced by default.
When Flask logging hooks are enabled with init_flask_logging(app):
- Every inbound request reads
X-Request-IDif present; otherwise Hancock creates a UUIDv4. request_startedandrequest_completedJSON log events are emitted with:endpointmode(from JSON payload where available)status(after request)latency_ms(after request)request_id
- The active request ID is echoed in every API response header as
X-Request-ID. - All API error responses include both:
{
"error": "message required",
"request_id": "7b3bbf54-d0a4-4db2-9d79-b93f8f2cd67d"
}- Webhook failures (invalid signature, empty model response, Slack/Teams notification errors) are logged with explicit
request_idand structuredeventfields for ingestion by Grafana/Loki/ELK.
Set via the LOG_LEVEL environment variable (DEBUG, INFO, WARNING, ERROR). Default: INFO.
The full observability stack (Hancock + Prometheus + Grafana) is defined in deploy/docker-compose.yml:
cd deploy
docker compose up -d
# Access
# Hancock: http://localhost:5000
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin / admin)Import monitoring/grafana_dashboard.json into Grafana on first run to get the pre-built dashboard.