Skip to content

Conversation

@saswatamcode
Copy link
Member

This commit enables listing Perses dashboards from a cluster, getting a specific one, as well as returning out of the box dashboards that we ship with OpenShift platform.

The idea is to for the LLM to first look at Out of the box dashboards, to see if it can answer a user's question using panels from those. And if not search wider for any custom dashboards a user might have.

We also allow users (and ourselves!) to specify an LLM-friendly description of PersesDashboard objects using an annotation operator.perses.dev/mcp-help! This would help the LLM accurately filter and select dashboards that match a user's query.

This commit enables listing Perses dashboards from a cluster, getting a
specific one, as well as returning out of the box dashboards that we
ship with OpenShift platform.

The idea is to for the LLM to first look at Out of the box dashboards,
to see if it can answer a user's question using panels from those. And
if not search wider for any custom dashboards a user might have.

We also allow users (and ourselves!) to specify an LLM-friendly
description of PersesDashboard objects using an annotation
operator.perses.dev/mcp-help! This would help the LLM accurately filter
and select dashboards that match a user's query.

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
@saswatamcode saswatamcode requested a review from a team December 11, 2025 10:45
var guardrails = flag.String("guardrails", "all", "Guardrails configuration: 'all' (default), 'none', or comma-separated list of guardrails to enable (disallow-explicit-name-label, require-label-matcher, disallow-blanket-regex)")
var maxMetricCardinality = flag.Uint64("guardrails.max-metric-cardinality", 20000, "Maximum allowed series count per metric (0 = disabled)")
var maxLabelCardinality = flag.Uint64("guardrails.max-label-cardinality", 500, "Maximum allowed label value count for blanket regex (0 = always disallow blanket regex). Only takes effect if disallow-blanket-regex is enabled.")
var ootbDashboards = flag.String("ootb-dashboards", "", "Path to YAML file containing out-of-the-box PersesDashboard definitions")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be estimated dynamically from inside the cluster (based on certain OCP-specific annotations)?

I'm concerned about version skews leading to the MCP server thinking that something exists when it doesn't (or vice-versa)? Internal estimation would get us rid of having to update this file each release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be feasible to reuse the listing capability introduced in this PR to get all such in-house dashboards, and prepare an "OOTB" structured output from those?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that might actually be better, I guess we'll need some specific label/annotation!
Maybe operator.dev.perses/ocp-platform?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would operator.dev.perses/mcp-discovery make sense (keeping it in-line with the operator.dev.perses/mcp-help annotation introduced here)? No strong opinions, though.


const (
// PersesMCPHelpAnnotation is the annotation key for MCP help description
PersesMCPHelpAnnotation = "operator.perses.dev/mcp-help"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the MCP server will prompt the next-gen UI to draw out rather frequently used dashboards, i.e., Grafana ones as it won't be aware of them, which are shipped with CMO but not usable by the next-gen UI, as I believe that only works with Perses dashboards.

Just thought it was worth mentioning here. Should still be nice to be able to leverage most of the dashboards downstream that are based on Perses, though!

PS. Even if the next-gen UI doesn't work with Grafana dashboards directly, we'd benefit from "telling" (making aware?) the MCP server to redirect users to a dashboard that's relevant to their query? The URLs may be constructed dynamically from the dashboard ConfigMaps found in the openshift-config-managed namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I'm hesitant to be accounting for Grafana here, is that we seem to be converging on perses for everything, internally (including ACM).

Also I'm assuming this MCP will somehow be packaged with COO, instead of CMO, which should ideally ship with the default dashboards in Perses format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We can always come back to this discussion if that's not the case, but this sounds good.

@@ -0,0 +1,124 @@
package k8s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for making k8s/ dependant on perses/? I'd prefer keeping this in perses/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I wanted to keep all k8s client code in k8s package.

But yeah, probably don't want to make it dependent, I'll move the type building elsewhere and just return the list directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3a07d75 does this.

Do you mean we shouldn't depend on Perses operator in this package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I'm nitpicking at this point, but I'd push for Perses clients (perses.go) under perses/, and only any K8s-specific clients (for resources shipped with vanilla K8s) under k8s/.

)

// GetPersesKubeClient returns a controller-runtime client with Perses types registered
func GetPersesKubeClient() (client.Client, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func GetPersesKubeClient() (client.Client, error) {
func GetPersesClient() (client.Client, error) {

(nit)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a Perses client means something different to me, like a Perses SDK client object that allows me to interact with Perses APIs specifically. Which is why I wanted to specify the Kube.

Maybe instead we can rename to GetPersesCtrlClient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Sounds good.

return "", "", nil, fmt.Errorf("failed to convert spec to map: %w", err)
}

return dashboard.Name, dashboard.Namespace, specMap, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So we are responsible for supplying the UI with the entire specMap for the dashboard? As such, any dashboard we want the UI to sketch out, we need to supply that as a whole to it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, not sure here, which is why I'm returning everything. I think we'll know more once we have greater idea of the UI side, and what is planned.

But, any eventual UI may only want to render a few panels and discard the rest. Might make sense then to pass back the entire object to an LLM, which can then call a "UI render" MCP tool, which would shave down this PersesDashboard object to just variables + required panels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense. Right, I guess we'll have to wait for a better picture on the UI side of things to assess this.

Copy link
Member

@rexagod rexagod Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude tells me that we may want to refactor the returned value a bit here, based on what the UI side expects.

  Architecture Overview

  ┌─────────────────┐
  │  obs-mcp server │ ← PR #2 adds Perses dashboard tools
  └────────┬────────┘
           │ JSON-RPC (MCP protocol)
           ↓
  ┌─────────────────┐
  │ DashboardMCP    │ ← genie-plugin UI's MCP client
  │ Client          │
  └────────┬────────┘
           │ Events/Tool Results
           ↓
  ┌─────────────────┐
  │ WidgetRenderer  │ ← Expects DashboardWidget objects
  └────────┬────────┘
           │
           ↓
  ┌─────────────────┐
  │ Perses          │ ← PersesTimeSeries, PersesPieChart, etc.
  │ Components      │
  └─────────────────┘

  The Data Format Mismatch

  What PR #2 Returns (Perses Dashboard Spec):
  {
    "name": "namespace-by-pod",
    "namespace": "openshift-monitoring",
    "spec": {
      "display": { "name": "Network / Namespace by Pod" },
      "panels": {
        "panel-1": {
          "kind": "Panel",
          "spec": {
            "display": { "name": "Packet Loss" },
            "queries": [{
              "kind": "TimeSeriesQuery",
              "spec": {
                "query": "container_network_receive_packets_dropped_total{namespace=\"$namespace\"}"
              }
            }]
          }
        }
      },
      "variables": [...],
      "layouts": [...]
    }
  }

  What the UI Expects (DashboardWidget):
  {
    id: "widget-123",
    componentType: "chart",
    position: { x: 0, y: 0, w: 6, h: 4 },
    props: {
      title: "Packet Loss",
      query: "container_network_receive_packets_dropped_total{namespace=\"openshift-marketplace\"}",
      duration: "1h",
      step: "1m",
      persesComponent: "TimeSeriesChart"
    },
    breakpoint: "lg"
  }

I'll do some work as a part of GIE-228 to understand the bigger picture better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds about right! What I'm wondering tho is how ephemeral would the responses be? Like will it just show widgets and not allow a user access to actual config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure what you mean by config here? Do you mean the actual dashboard or one of its widgets that gets rendered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both, in case a user wants to "save" this dashboard to their gitops pipeline somehow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nevermind. I see you referred to the ability to be able to copy YAMLs in the call. I'll look into all customizations offered by the UI to see if we can do something like that for panels.

Comment on lines 75 to 80
// Extract MCP help description from annotation if present
if annotations := db.GetAnnotations(); annotations != nil {
if description, ok := annotations[PersesMCPHelpAnnotation]; ok {
dbInfo.Description = description
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, keeping the list entries short while allowing the MCP server to Get a dashboard based on its description makes sense! We'd need to ensure the descriptions are up-to-date with the dashboards, and descriptive, but only as much as they need to be, since overlapping may confuse the LLM futher.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, which is why I'm thinking of pushing this responsibility upstream somehow. I might start a thread with Perses team about the same.

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants