-
Notifications
You must be signed in to change notification settings - Fork 7
Add tools for Perses dashboard #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit enables listing Perses dashboards from a cluster, getting a specific one, as well as returning out of the box dashboards that we ship with OpenShift platform. The idea is to for the LLM to first look at Out of the box dashboards, to see if it can answer a user's question using panels from those. And if not search wider for any custom dashboards a user might have. We also allow users (and ourselves!) to specify an LLM-friendly description of PersesDashboard objects using an annotation operator.perses.dev/mcp-help! This would help the LLM accurately filter and select dashboards that match a user's query. Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
| var guardrails = flag.String("guardrails", "all", "Guardrails configuration: 'all' (default), 'none', or comma-separated list of guardrails to enable (disallow-explicit-name-label, require-label-matcher, disallow-blanket-regex)") | ||
| var maxMetricCardinality = flag.Uint64("guardrails.max-metric-cardinality", 20000, "Maximum allowed series count per metric (0 = disabled)") | ||
| var maxLabelCardinality = flag.Uint64("guardrails.max-label-cardinality", 500, "Maximum allowed label value count for blanket regex (0 = always disallow blanket regex). Only takes effect if disallow-blanket-regex is enabled.") | ||
| var ootbDashboards = flag.String("ootb-dashboards", "", "Path to YAML file containing out-of-the-box PersesDashboard definitions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be estimated dynamically from inside the cluster (based on certain OCP-specific annotations)?
I'm concerned about version skews leading to the MCP server thinking that something exists when it doesn't (or vice-versa)? Internal estimation would get us rid of having to update this file each release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be feasible to reuse the listing capability introduced in this PR to get all such in-house dashboards, and prepare an "OOTB" structured output from those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that might actually be better, I guess we'll need some specific label/annotation!
Maybe operator.dev.perses/ocp-platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would operator.dev.perses/mcp-discovery make sense (keeping it in-line with the operator.dev.perses/mcp-help annotation introduced here)? No strong opinions, though.
|
|
||
| const ( | ||
| // PersesMCPHelpAnnotation is the annotation key for MCP help description | ||
| PersesMCPHelpAnnotation = "operator.perses.dev/mcp-help" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the MCP server will prompt the next-gen UI to draw out rather frequently used dashboards, i.e., Grafana ones as it won't be aware of them, which are shipped with CMO but not usable by the next-gen UI, as I believe that only works with Perses dashboards.
Just thought it was worth mentioning here. Should still be nice to be able to leverage most of the dashboards downstream that are based on Perses, though!
PS. Even if the next-gen UI doesn't work with Grafana dashboards directly, we'd benefit from "telling" (making aware?) the MCP server to redirect users to a dashboard that's relevant to their query? The URLs may be constructed dynamically from the dashboard ConfigMaps found in the openshift-config-managed namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason I'm hesitant to be accounting for Grafana here, is that we seem to be converging on perses for everything, internally (including ACM).
Also I'm assuming this MCP will somehow be packaged with COO, instead of CMO, which should ideally ship with the default dashboards in Perses format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. We can always come back to this discussion if that's not the case, but this sounds good.
| @@ -0,0 +1,124 @@ | |||
| package k8s | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for making k8s/ dependant on perses/? I'd prefer keeping this in perses/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I wanted to keep all k8s client code in k8s package.
But yeah, probably don't want to make it dependent, I'll move the type building elsewhere and just return the list directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3a07d75 does this.
Do you mean we shouldn't depend on Perses operator in this package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I'm nitpicking at this point, but I'd push for Perses clients (perses.go) under perses/, and only any K8s-specific clients (for resources shipped with vanilla K8s) under k8s/.
| ) | ||
|
|
||
| // GetPersesKubeClient returns a controller-runtime client with Perses types registered | ||
| func GetPersesKubeClient() (client.Client, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func GetPersesKubeClient() (client.Client, error) { | |
| func GetPersesClient() (client.Client, error) { |
(nit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a Perses client means something different to me, like a Perses SDK client object that allows me to interact with Perses APIs specifically. Which is why I wanted to specify the Kube.
Maybe instead we can rename to GetPersesCtrlClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Sounds good.
| return "", "", nil, fmt.Errorf("failed to convert spec to map: %w", err) | ||
| } | ||
|
|
||
| return dashboard.Name, dashboard.Namespace, specMap, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So we are responsible for supplying the UI with the entire specMap for the dashboard? As such, any dashboard we want the UI to sketch out, we need to supply that as a whole to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, not sure here, which is why I'm returning everything. I think we'll know more once we have greater idea of the UI side, and what is planned.
But, any eventual UI may only want to render a few panels and discard the rest. Might make sense then to pass back the entire object to an LLM, which can then call a "UI render" MCP tool, which would shave down this PersesDashboard object to just variables + required panels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes a lot of sense. Right, I guess we'll have to wait for a better picture on the UI side of things to assess this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude tells me that we may want to refactor the returned value a bit here, based on what the UI side expects.
Architecture Overview
┌─────────────────┐
│ obs-mcp server │ ← PR #2 adds Perses dashboard tools
└────────┬────────┘
│ JSON-RPC (MCP protocol)
↓
┌─────────────────┐
│ DashboardMCP │ ← genie-plugin UI's MCP client
│ Client │
└────────┬────────┘
│ Events/Tool Results
↓
┌─────────────────┐
│ WidgetRenderer │ ← Expects DashboardWidget objects
└────────┬────────┘
│
↓
┌─────────────────┐
│ Perses │ ← PersesTimeSeries, PersesPieChart, etc.
│ Components │
└─────────────────┘
The Data Format Mismatch
What PR #2 Returns (Perses Dashboard Spec):
{
"name": "namespace-by-pod",
"namespace": "openshift-monitoring",
"spec": {
"display": { "name": "Network / Namespace by Pod" },
"panels": {
"panel-1": {
"kind": "Panel",
"spec": {
"display": { "name": "Packet Loss" },
"queries": [{
"kind": "TimeSeriesQuery",
"spec": {
"query": "container_network_receive_packets_dropped_total{namespace=\"$namespace\"}"
}
}]
}
}
},
"variables": [...],
"layouts": [...]
}
}
What the UI Expects (DashboardWidget):
{
id: "widget-123",
componentType: "chart",
position: { x: 0, y: 0, w: 6, h: 4 },
props: {
title: "Packet Loss",
query: "container_network_receive_packets_dropped_total{namespace=\"openshift-marketplace\"}",
duration: "1h",
step: "1m",
persesComponent: "TimeSeriesChart"
},
breakpoint: "lg"
}
I'll do some work as a part of GIE-228 to understand the bigger picture better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds about right! What I'm wondering tho is how ephemeral would the responses be? Like will it just show widgets and not allow a user access to actual config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not sure what you mean by config here? Do you mean the actual dashboard or one of its widgets that gets rendered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both, in case a user wants to "save" this dashboard to their gitops pipeline somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nevermind. I see you referred to the ability to be able to copy YAMLs in the call. I'll look into all customizations offered by the UI to see if we can do something like that for panels.
pkg/k8s/perses.go
Outdated
| // Extract MCP help description from annotation if present | ||
| if annotations := db.GetAnnotations(); annotations != nil { | ||
| if description, ok := annotations[PersesMCPHelpAnnotation]; ok { | ||
| dbInfo.Description = description | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, keeping the list entries short while allowing the MCP server to Get a dashboard based on its description makes sense! We'd need to ensure the descriptions are up-to-date with the dashboards, and descriptive, but only as much as they need to be, since overlapping may confuse the LLM futher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, which is why I'm thinking of pushing this responsibility upstream somehow. I might start a thread with Perses team about the same.
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
This commit enables listing Perses dashboards from a cluster, getting a specific one, as well as returning out of the box dashboards that we ship with OpenShift platform.
The idea is to for the LLM to first look at Out of the box dashboards, to see if it can answer a user's question using panels from those. And if not search wider for any custom dashboards a user might have.
We also allow users (and ourselves!) to specify an LLM-friendly description of PersesDashboard objects using an annotation operator.perses.dev/mcp-help! This would help the LLM accurately filter and select dashboards that match a user's query.