Skip to content

Conversation

@sakhoury
Copy link

Introduces a new observability toolset that provides tools for querying OpenShift cluster monitoring data:

  • prometheus_query: Execute instant PromQL queries against Thanos Querier
  • prometheus_query_range: Execute range PromQL queries for time-series data
  • alertmanager_alerts: Query active, silenced, and inhibited alerts

Introduces a new observability toolset that provides tools for querying
OpenShift cluster monitoring data:

- prometheus_query: Execute instant PromQL queries against Thanos Querier
- prometheus_query_range: Execute range PromQL queries for time-series data
- alertmanager_alerts: Query active, silenced, and inhibited alerts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Signed-off-by: Sharat Akhoury <sakhoury@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Jan 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sakhoury
Once this PR has been reviewed and has the lgtm label, please assign kaustubh-pande for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bentito
Copy link

bentito commented Jan 26, 2026

see the comment on my draft PR here please: #115 (comment)

Refactors the observability toolset to use a shared Prometheus client
package that can be reused by other toolsets.

The pkg/prometheus/ package provides:
- Client with functional options for flexible configuration
- Support for bearer token auth from REST config
- TLS configuration from REST config or custom CA
- Prometheus instant and range query methods
- Alertmanager alerts query methods
- Relative time conversion utilities

Signed-off-by: Sharat Akhoury <sakhoury@redhat.com>
@sakhoury
Copy link
Author

/cc @bentito

@openshift-ci openshift-ci bot requested a review from bentito January 26, 2026 18:29
@sakhoury
Copy link
Author

/cc @matzew

@openshift-ci openshift-ci bot requested a review from matzew January 26, 2026 21:02
@matzew
Copy link
Member

matzew commented Jan 27, 2026

@nader-ziada can you take a look?

Copy link

@nader-ziada nader-ziada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @sakhoury just a minor comment that could be done later

return defaultMonitoringNamespace
}

// getRouteURL retrieves the URL for an OpenShift route.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every Prometheus/Alertmanager query hits the Kubernetes API to resolve the route, even though routes rarely change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @nader-ziada! You're right that hitting the k8s API on every query is inefficient since routes rarely change.

I can implement per-session caching in which the route URL will be resolved once on first use and cached for the lifetime of the MCP server process. This eliminates the repeated API calls while keeping the auto-discovery behaviour. wdyt?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be great. thank you!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing to note is that if you are caching this, the kubernetes instance you are given is configured for you to communicate to a specific cluster in an ACM setup. So, you will need to be careful to make sure you aren't using an invalid url

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Cali0707! I missed that. Updating now!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've added per-session caching with support for multi-cluster environments.

@nader-ziada
Copy link

@matzew can we run mcpchecker here?

@Cali0707
Copy link

can we run mcpchecker here?

No, we need to figure out how to configure prow to get that working...

Route URLs are now cached for the lifetime of the server process.
This eliminates redundant Kubernetes API calls on every Prometheus
or Alertmanager query, since OpenShift routes rarely change.

The cache key includes the API server host to ensure correct
behavior in multi-cluster (ACM) environments where different
clusters have different route URLs.

Signed-off-by: Sharat Akhoury <sakhoury@redhat.com>
@sakhoury sakhoury force-pushed the feature/observability-toolset branch from db4c64c to 6ecb87e Compare January 27, 2026 17:55
@openshift-ci
Copy link

openshift-ci bot commented Jan 27, 2026

@sakhoury: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@nader-ziada
Copy link

LGTM

@Cali0707 or @matzew want to take a look?

@matzew matzew merged commit 8f8c97d into openshift:main Jan 28, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants