Skip to content

Conversation

@bentito
Copy link

@bentito bentito commented Jan 23, 2026

WIP:
This PR is part of a spike to see if openshift-mcp-server will be a good home for the NIDS (NetEdge) toolset.
For background on the toolset effort, please see: https://docs.google.com/document/d/19hToi_iUN5wu4LBXdxMu4cPW7hOEFy8LIJ7vGv_DKOw/edit?tab=t.0

To have a concrete task to establish the toolset, this PR is implementing a first MCP tool, query_prometheus which will eventually handle querying Prometheus for data relevant to network edge diagnostics.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 23, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link

openshift-ci bot commented Jan 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bentito
Once this PR has been reviewed and has the lgtm label, please assign matzew for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Cali0707
Copy link

@bentito what is the difference between the query_prometheus tool coming in here and the prometheus_query tool seen in #117 (from @sakhoury )

Maybe we can align around prometheus tooling between the two to not have duplicate/similar tools?

@sakhoury
Copy link

@Cali0707 Thanks for flagging the overlap. I've reviewed both approaches and wanted to share what my PR #117 introduces:

My observability toolset (#117) provides:

  • prometheus_query - Instant PromQL queries against Thanos Querier
  • prometheus_query_range - Range queries with support for relative times (e.g., -1h, -30m, now) and configurable step resolution
  • alertmanager_alerts - Query active, silenced, and inhibited alerts with filter support

Key differences from the query_prometheus in the netedge toolset:

  1. Auto-discovery: My implementation discovers the Thanos Querier and Alertmanager endpoints via OpenShift routes, so no manual prometheus_url configuration is required
  2. Range queries: Support for historical/time-series analysis via prometheus_query_range
  3. Alertmanager integration: Direct access to cluster alerts

The goal of my PR is to establish a general-purpose observability toolset that can house all monitoring/alerting tools in one place. This aligns with how toolsets are organized in this project (e.g., core, helm, kiali).

Question for @bentito: Would the tools in my PR satisfy your use case for NetEdge diagnostics? If there are specific Prometheus query patterns or additional parameters your workflow needs that aren't covered, I'm open to extending the observability toolset to support them. Alternatively, if NetEdge needs a completely separate toolset with domain-specific logic beyond just Prometheus queries, we could potentially have your netedge toolset consume the shared observability infrastructure.

Let me know your thoughts - I'd rather we collaborate and avoid duplicating effort.

@bentito
Copy link
Author

bentito commented Jan 23, 2026

@bentito what is the difference between the query_prometheus tool coming in here and the prometheus_query tool seen in #117 (from @sakhoury )

Maybe we can align around prometheus tooling between the two to not have duplicate/similar tools?

Yeah we should definitely have shared prometheus machinery and then the separate tools can have unique promql to do the things needed. I am about to put up a commit that will add that specialization. I'll still put that up, and then let's consider the following:

@sakhoury what do think about combining to a single tool and then have a "mode" parameter flag? This allows a single set of code for prom querying then whatever promql you want can go into your mode. For instance we'd expect agents running the tool for our usage to run it as "query_prometheus --mode netedge", in the tool code we can branch in cmd, have a share pkg for setting up to call prometheus and separate packages holding the promql or other logic.

@bentito
Copy link
Author

bentito commented Jan 26, 2026

@sakhoury does my mode triggered by a flag like --mode netedge idea work? Maybe relocate the prometheus code to a /toolsets/util location just so it's obvious it's shared functionality and more than one group is responsible for maintaining the code?

@sakhoury
Copy link

@bentito Thanks for the follow-up. To answer your questions directly:

On the --mode netedge flag: I don't think a mode flag on a shared tool is the right approach. MCP tools take JSON parameters, not CLI flags, and coupling domain-specific logic (NetEdge queries) into a general tool would make it harder to maintain as more teams add their own modes.

That said, looking at your latest commit, what you actually implemented - a diagnostic_target parameter with pre-defined queries for ingress/dns - is a better pattern. It keeps the NetEdge-specific logic in the NetEdge toolset where it belongs. I think that approach makes more sense than a shared tool with modes.

On relocating prometheus code to /toolsets/util: I agree we should share the Prometheus client code. I'd suggest pkg/prometheus/ rather than /toolsets/util since it's infrastructure that could be used beyond just toolsets (and matches the existing layout like pkg/helm/, pkg/kubernetes/). But I'm not strongly attached to the location - happy to go with whatever the maintainers prefer.

So my proposal:

  1. Extract shared Prometheus HTTP client to a common package (pkg/prometheus/ or pkg/toolsets/util/)
  2. Keep separate toolsets: observability for general-purpose tools, netedge for domain-specific diagnostics
  3. Both toolsets import the shared client

Does that work for you?

@bentito
Copy link
Author

bentito commented Jan 26, 2026

So my proposal:

1. Extract shared Prometheus HTTP client to a common package (`pkg/prometheus/` or `pkg/toolsets/util/`)

2. Keep separate toolsets: observability for general-purpose tools, netedge for domain-specific diagnostics

3. Both toolsets import the shared client

Yes @sakhoury, that would be great, and yes, I meant CLI flag-like functionality within LLM calling tool via JSON

We'll avoid redundant code, yay!

Your suggestion for pkg/prometheus seems good. Since you have better general code for prom already, would you want to restructure your PR for that and I'll wait until it merges and then restructure mine to import and use it?

@sakhoury
Copy link

Your suggestion for pkg/prometheus seems good. Since you have better general code for prom already, would you want to restructure your PR for that and I'll wait until it merges and then restructure mine to import and use it?

Sounds good @bentito! I'll restructure my PR to extract the shared Prometheus client to pkg/prometheus/.

@bentito bentito force-pushed the feat/netedge-prometheus branch from cd03f62 to 2aa93ea Compare January 28, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants