-
Notifications
You must be signed in to change notification settings - Fork 0
Description
feat: Add Grafana dashboard for monitoring
Provide a pre-built Grafana dashboard for monitoring gatekeeperd instances, with automatic provisioning support for Kubernetes deployments.
Motivation
Gatekeeperd exposes Prometheus metrics (requests, latency, verification failures, relay stats, etc.) but there's no pre-built dashboard to visualize them. Users currently need to build dashboards from scratch.
Proposed Approach
Provide the dashboard in multiple ways to support different deployment scenarios. These options are complementary, not mutually exclusive:
| Layer | What it provides | Who uses it |
|---|---|---|
| Dashboard JSON file | Source of truth, manual import | Everyone |
| Helm ConfigMap | Auto-provisioning via Grafana sidecar | Kubernetes + Grafana Helm chart |
| ServiceMonitor | Auto-discovery of metrics endpoint | Kubernetes + Prometheus Operator |
A typical kube-prometheus-stack user would enable both Helm options. A Docker user would just grab the JSON file.
1. Dashboard JSON file (all users)
Add a standalone dashboard JSON file that can be imported manually:
dashboards/
grafana-gatekeeperd.json
This works for any Grafana deployment (Kubernetes, Docker, bare metal).
2. Helm chart ConfigMap with sidecar label (Kubernetes users)
The standard Kubernetes pattern for Grafana dashboard provisioning uses a ConfigMap with a specific label. The Grafana Helm chart (and kube-prometheus-stack) includes a sidecar that watches for ConfigMaps labeled grafana_dashboard: "1" and automatically loads them.
Add to Helm values:
grafana:
# Create a ConfigMap with the dashboard for Grafana sidecar auto-discovery
dashboard:
enabled: false
# Label for Grafana sidecar to discover the dashboard
# Match your Grafana sidecar configuration (default: grafana_dashboard)
sidecarLabel: grafana_dashboard
# Namespace where Grafana is deployed (for cross-namespace discovery)
# Leave empty to create in the release namespace
namespace: ""
# Additional labels for the ConfigMap
labels: {}
# Additional annotations for the ConfigMap
annotations: {}Add template charts/gatekeeperd/templates/grafana-dashboard.yaml:
{{- if .Values.grafana.dashboard.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "gatekeeperd.fullname" . }}-grafana-dashboard
{{- if .Values.grafana.dashboard.namespace }}
namespace: {{ .Values.grafana.dashboard.namespace }}
{{- end }}
labels:
{{- include "gatekeeperd.labels" . | nindent 4 }}
{{ .Values.grafana.dashboard.sidecarLabel }}: "1"
{{- with .Values.grafana.dashboard.labels }}
{{- toYaml . | nindent 4 }}
{{- end }}
{{- with .Values.grafana.dashboard.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
data:
gatekeeperd.json: |-
{{ .Files.Get "dashboards/gatekeeperd.json" | nindent 4 }}
{{- end }}3. ServiceMonitor for Prometheus Operator (optional, related)
While we're adding observability features, consider also adding a ServiceMonitor for Prometheus Operator users. This is separate from the dashboard but often requested together:
serviceMonitor:
enabled: false
# Namespace for the ServiceMonitor (defaults to release namespace)
namespace: ""
# Interval for scraping metrics
interval: 30s
# Additional labels for ServiceMonitor (e.g., for Prometheus selection)
labels: {}Dashboard Panels
The dashboard should include panels for:
Overview Row
- Request rate (total requests/sec)
- Success rate (2xx/3xx percentage)
- Error rate (4xx/5xx)
- Active relay clients
Request Metrics Row
- Requests by hostname (stacked area)
- Requests by status code (stacked bar)
- Request latency (p50, p95, p99)
- Request latency heatmap
Security Row
- Verification failures by verifier and reason
- IP filter denials by allowlist
- Validation failures
Relay Row (if relay is used)
- Webhooks queued vs delivered
- Relay delivery latency
- Delivery errors by reason
- Pending queue depth (Redis mode)
- Connected clients per token
System Row
- IP ranges loaded per allowlist
- IP range fetch errors
Variables
Dashboard should include template variables:
datasource- Prometheus datasource selectorhostname- Filter by webhook hostnamenamespace- Kubernetes namespace (for multi-tenant)instance- Pod instance selector
Alternatives Considered
Separate Helm chart for dashboard
- Overkill; a single ConfigMap doesn't warrant a separate chart
Grafana API provisioning
- Requires Grafana credentials
- Not the Kubernetes-native approach
- Less portable
Dashboard embedded in docs only
- Harder to keep in sync with metrics changes
- No automatic provisioning
Acceptance Criteria
- Dashboard JSON file at
dashboards/grafana-gatekeeperd.json - Dashboard covers all metrics from
internal/metrics/metrics.go - Helm values for enabling dashboard ConfigMap
- ConfigMap template with sidecar label
- Dashboard uses variables for datasource, hostname, namespace
- Documentation in README or docs/ explaining how to use
- (Optional) ServiceMonitor template for Prometheus Operator
Notes
- The Grafana sidecar approach requires Grafana to be configured with sidecar enabled (this is the default in kube-prometheus-stack)
- For cross-namespace dashboard discovery, Grafana's sidecar needs RBAC to list ConfigMaps in other namespaces
- Dashboard JSON should be validated with Grafana's dashboard schema