Skip to content

feat: Add Grafana dashboard for monitoring #15

@nickmarden

Description

@nickmarden

feat: Add Grafana dashboard for monitoring

Provide a pre-built Grafana dashboard for monitoring gatekeeperd instances, with automatic provisioning support for Kubernetes deployments.

Motivation

Gatekeeperd exposes Prometheus metrics (requests, latency, verification failures, relay stats, etc.) but there's no pre-built dashboard to visualize them. Users currently need to build dashboards from scratch.

Proposed Approach

Provide the dashboard in multiple ways to support different deployment scenarios. These options are complementary, not mutually exclusive:

Layer What it provides Who uses it
Dashboard JSON file Source of truth, manual import Everyone
Helm ConfigMap Auto-provisioning via Grafana sidecar Kubernetes + Grafana Helm chart
ServiceMonitor Auto-discovery of metrics endpoint Kubernetes + Prometheus Operator

A typical kube-prometheus-stack user would enable both Helm options. A Docker user would just grab the JSON file.

1. Dashboard JSON file (all users)

Add a standalone dashboard JSON file that can be imported manually:

dashboards/
  grafana-gatekeeperd.json

This works for any Grafana deployment (Kubernetes, Docker, bare metal).

2. Helm chart ConfigMap with sidecar label (Kubernetes users)

The standard Kubernetes pattern for Grafana dashboard provisioning uses a ConfigMap with a specific label. The Grafana Helm chart (and kube-prometheus-stack) includes a sidecar that watches for ConfigMaps labeled grafana_dashboard: "1" and automatically loads them.

Add to Helm values:

grafana:
  # Create a ConfigMap with the dashboard for Grafana sidecar auto-discovery
  dashboard:
    enabled: false
    # Label for Grafana sidecar to discover the dashboard
    # Match your Grafana sidecar configuration (default: grafana_dashboard)
    sidecarLabel: grafana_dashboard
    # Namespace where Grafana is deployed (for cross-namespace discovery)
    # Leave empty to create in the release namespace
    namespace: ""
    # Additional labels for the ConfigMap
    labels: {}
    # Additional annotations for the ConfigMap
    annotations: {}

Add template charts/gatekeeperd/templates/grafana-dashboard.yaml:

{{- if .Values.grafana.dashboard.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "gatekeeperd.fullname" . }}-grafana-dashboard
  {{- if .Values.grafana.dashboard.namespace }}
  namespace: {{ .Values.grafana.dashboard.namespace }}
  {{- end }}
  labels:
    {{- include "gatekeeperd.labels" . | nindent 4 }}
    {{ .Values.grafana.dashboard.sidecarLabel }}: "1"
    {{- with .Values.grafana.dashboard.labels }}
    {{- toYaml . | nindent 4 }}
    {{- end }}
  {{- with .Values.grafana.dashboard.annotations }}
  annotations:
    {{- toYaml . | nindent 4 }}
  {{- end }}
data:
  gatekeeperd.json: |-
    {{ .Files.Get "dashboards/gatekeeperd.json" | nindent 4 }}
{{- end }}

3. ServiceMonitor for Prometheus Operator (optional, related)

While we're adding observability features, consider also adding a ServiceMonitor for Prometheus Operator users. This is separate from the dashboard but often requested together:

serviceMonitor:
  enabled: false
  # Namespace for the ServiceMonitor (defaults to release namespace)
  namespace: ""
  # Interval for scraping metrics
  interval: 30s
  # Additional labels for ServiceMonitor (e.g., for Prometheus selection)
  labels: {}

Dashboard Panels

The dashboard should include panels for:

Overview Row

  • Request rate (total requests/sec)
  • Success rate (2xx/3xx percentage)
  • Error rate (4xx/5xx)
  • Active relay clients

Request Metrics Row

  • Requests by hostname (stacked area)
  • Requests by status code (stacked bar)
  • Request latency (p50, p95, p99)
  • Request latency heatmap

Security Row

  • Verification failures by verifier and reason
  • IP filter denials by allowlist
  • Validation failures

Relay Row (if relay is used)

  • Webhooks queued vs delivered
  • Relay delivery latency
  • Delivery errors by reason
  • Pending queue depth (Redis mode)
  • Connected clients per token

System Row

  • IP ranges loaded per allowlist
  • IP range fetch errors

Variables

Dashboard should include template variables:

  • datasource - Prometheus datasource selector
  • hostname - Filter by webhook hostname
  • namespace - Kubernetes namespace (for multi-tenant)
  • instance - Pod instance selector

Alternatives Considered

Separate Helm chart for dashboard

  • Overkill; a single ConfigMap doesn't warrant a separate chart

Grafana API provisioning

  • Requires Grafana credentials
  • Not the Kubernetes-native approach
  • Less portable

Dashboard embedded in docs only

  • Harder to keep in sync with metrics changes
  • No automatic provisioning

Acceptance Criteria

  • Dashboard JSON file at dashboards/grafana-gatekeeperd.json
  • Dashboard covers all metrics from internal/metrics/metrics.go
  • Helm values for enabling dashboard ConfigMap
  • ConfigMap template with sidecar label
  • Dashboard uses variables for datasource, hostname, namespace
  • Documentation in README or docs/ explaining how to use
  • (Optional) ServiceMonitor template for Prometheus Operator

Notes

  • The Grafana sidecar approach requires Grafana to be configured with sidecar enabled (this is the default in kube-prometheus-stack)
  • For cross-namespace dashboard discovery, Grafana's sidecar needs RBAC to list ConfigMaps in other namespaces
  • Dashboard JSON should be validated with Grafana's dashboard schema

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions