Skip to content

mahowlin/saif-splunk-dashboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAIF Splunk Dashboard

Splunk Observability Cloud dashboards for monitoring the Secure AI Factory (SAIF) Platform.

Overview

This repository contains dashboard specifications, chart definitions, and import scripts for the SAIF Platform Splunk Observability dashboards. These dashboards provide real-time visibility into infrastructure health, AI workload performance, and security posture across all clusters.

Dashboards

Infrastructure

Cluster resource monitoring including node CPU/memory utilization, storage capacity, and UCS hardware health metrics.

  • Location: dashboards/infrastructure/
  • Data source: Splunk OTEL Collector (DaemonSet metrics)

AI Workloads

GPU utilization, NIM inference performance, model latency, and throughput metrics for AI/ML workloads.

  • Location: dashboards/ai-workloads/
  • Data source: DCGM Exporter metrics via Splunk OTEL Collector

Secure AI Factory

Main overview dashboard with status tiles, Cilium/Hubble network flow metrics, Tetragon security event counts, and platform health summary.

  • Location: dashboards/secure-ai-factory/
  • Data source: Splunk OTEL Collector, CronJob text chart updates

Directory Structure

saif-splunk-dashboard/
├── charts/                    # Individual chart JSON definitions
├── dashboards/
│   ├── infrastructure/        # Infrastructure dashboard spec + import script
│   ├── ai-workloads/          # AI workloads dashboard
│   ├── secure-ai-factory/     # Main overview dashboard
│   └── isovalent-redesign/    # Cilium/Hubble-focused dashboard
├── docs/
│   ├── ARCHITECTURE.md        # Dashboard structure and data sources
│   └── EXECUTION_PLAN.md      # Implementation phases
└── README.md

Deployment

The dashboards are populated by two data paths:

  1. Metrics: Splunk OTEL Collector (deployed as a DaemonSet on each cluster) scrapes Prometheus endpoints and forwards metrics to Splunk Observability Cloud.

  2. Text charts: A CronJob queries cluster state and updates Splunk text charts via the Splunk API.

Both components are deployed via ArgoCD from the saif-gitops repository:

  • apps/splunk-otel/ -- OTEL Collector configuration
  • apps/splunk-reporter/ -- CronJob for text chart updates

Related Repositories

Repository Relationship
saif-gitops CronJob and OTEL Collector deployment
saif-platform Platform orchestration and SBOM

Documentation

License

This project is licensed under the Cisco Sample Code License, Version 1.1. See LICENSE for details.

About

Splunk Observability Cloud dashboard specifications and chart definitions

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages