Add Collector Model Context Protocol (MCP) project proposal #3128
Add Collector Model Context Protocol (MCP) project proposal #3128pavolloffay wants to merge 3 commits intoopen-telemetry:mainfrom
Conversation
|
Count me down for being a contributing member! Looking forward to this SIG. Here's a related thread in the #otel-semantic-conventions slack channel from a couple weeks ago that may be of interest to this SIG. |
| * [pavolloffay/opentelemetry-mcp-server](https://github.com/pavolloffay/opentelemetry-mcp-server): Focuses on collector configuration. | ||
| * [austinlparker/otel-mcp](https://github.com/austinlparker/otel-mcp): Handles collector configuration and data profiling. | ||
| * [mottibec/otelcol-mcp](https://github.com/mottibec/otelcol-mcp): Focuses on collector configuration. | ||
| * [shiftyp/otel-mcp-server](https://github.com/shiftyp/otel-mcp-server): Provides data profiling, but requires OpenSearch. |
There was a problem hiding this comment.
Happy to answer any specific questions about this project!
|
@niwoerner @shiftyp @adrielp I have added you to the proposal. Thanks! |
shiftyp
left a comment
There was a problem hiding this comment.
Just some questions specifically related to the use case around data profiling, which I take to mean connecting telemetry itself to an agent flow, not just the configuration and troubleshooting use case for the OTEL stack itself.
projects/mcp-server.md
Outdated
| - OpenTelemetry collector configuration | ||
|
|
||
| Phase 2: Data profiling via collector (Months 1-2) | ||
| - OpenTelemetry collector extension which provides API to query and profile the processed telemetry data |
There was a problem hiding this comment.
Just as context, this was the focus of my particular project. An MCP server that generated OpenSearch / ElasticSearch queries to feed relevant telemetry data into an agent (tested with Claude). I used it specifically with Claude Code to combine telemetry intelligence with code context to answer root cause question about incidents, analyze performance of code changes, ect. This is probably where I could contribute the most in terms of thought partnership, although my effort could be used towards various goals.
projects/mcp-server.md
Outdated
|
|
||
| ### Project Scope and Architecture | ||
|
|
||
| The scope of this project is to create OpenTelemetry MCP server(s) to simplify deployment and day-2 operations. |
There was a problem hiding this comment.
As we'll likely would end up with different MCP servers for different use-cases, I'd love to establish an unified interface as part of this project.
Both, for a streamlined development of (multiple) servers AND a consistent way for the end-user to setup/configure the server(s)
There was a problem hiding this comment.
Agree on the consistent configuration and deployment. However, I am not sure what you mean by the streamlined development?
There was a problem hiding this comment.
That different kind of OTel MCP servers would follow the same (where possible) implementation/development patterns. Kind of to avoid that we have 5 different MCP servers with 5 different implementations
|
Thanks for the review @niwoerner . I have updated the proposal based on your feedback. |
|
A few notes --
I think it'd be totally appropriate to try and build a community around this independent of the main project and consider how different otel components could integrate MCP. I even think that there's smaller/point stuff (eg, a config validator MCP for the collector) that should be addressed at the existing SIG level. |
|
Question based on the discussion we had at the GC call today: would the Docs SIG be a good initial place for this SIG? |
|
Note: I'm referring to There's a huge potential to simplify the adoption/implementation of OTel with AI-Tooling and as mentioned the scope of potential OTel components which could benefit from is broad. While I agree that the concrete implementation should happen on component level in communication with the affected SIG, there are a few challenges which come with that. We'll end up with different implementations + inconsistencies across those tools, resulting in different user/developer-experiences. Additionally there might be redundant efforts for tooling with similar goals if there is no coordination happening across SIGs. Also, It would be great for users to have a common place to look at "Which official AI-Tooling is available in context of OTel today? What is currently developed and might be available soon?". A "MCP project/SIG" is perhaps not be the right term, but I believe what @pavolloffay and myself are looking for is a shared place to track and align the development of AI-Tooling in context of OTel. Having a cross-cutting SIG could be the right place to coordinate the development of AI-Tooling as it'd impact several different implementation/specification SIGs. I see the point that there is no critical need for a dedicated SIG to be able to experiment/develop this type of tools right now - so I understand the idea of placing this project into an existing SIG and based on demand eventually split it to a later point. Could the |
I fully support this view. It is vital that we coordinate to avoid redundancy, as overlapping tools will negatively impact the MCP's effectiveness.
Exactly. A key driver for this proposal is to establish a centralized forum for discussion, acknowledging that the actual implementation will be distributed across various components (such as the collector for config schema retrieval).
It works for me, If we can promote the MCP topics in that SIG and other people can join. |
|
The GC supports this, however this project will need TC sponsorship. We will consult with @open-telemetry/technical-committee |
austinlparker
left a comment
There was a problem hiding this comment.
approved pending TC sponsor
projects/mcp-server.md
Outdated
| * Provide context-optimized querying of the Semantic Conventions registry. | ||
| * Enable agents to assist with maintaining codebases to add and update semantic conventions, potentially integrating with [Weaver](https://github.com/open-telemetry/weaver). | ||
|
|
||
| #### Instrumentation & SDKs |
There was a problem hiding this comment.
I think there's a lot to be gained here - but I'd suggest we split some rules between weaver and SDKs. Particularly, this workstream, I think, needs a strong focus to succeed.
Weaver is supposed to help with the writing of instrumentation.
I think the major pain point, early, is discovering instrumentation libraries, configuring them, and testing to make sure it all works. I believe that's the focus of this workstream, but I think there will be some overlap with the semconv/weaver workstream over time.
I realize implicitly all these MCP servers need to work together to create a cohesive "agent-aided otel adoption", but I think calling out clear swimlanes for these is important.
This particular workstream seems to have less of a targeted "component" where an MCP server would live or play. It needs to handle all SDKs, not a specific one.
I'd suggest for the start of this to pick 1-2 SDKs to target an MCP server at for helping, a few clear use cases and then we iterate on what the MCP server has (vs. say agent-skills or other instructions).
There was a problem hiding this comment.
We should pick a language/SDK that is the most often directly used (e.g., without auto-instrumentation). Maybe golang? Create a set of instrumentation use-cases. Once we are successful reuse the approach for other languages/SDK.
I will reflect this in the proposal.
| * Align distribution and installation of the components with the Agentic Workflows. | ||
| * Agentic workflow documentation will be part of the existing [OpenTelemetry documentation](https://opentelemetry.io/docs/) and will not duplicate any existing content. | ||
|
|
||
| ### Non Goals |
There was a problem hiding this comment.
I don't see this called out, but I envision MCP as a piece of the solution. Would it make sense to also include agent skills that can leverage various MCP servers to complete tasks?
Personally, I'd love a "tight" set of "CUJ"s that we start with, and I imagine a lot of these will cross between MCP servers. I want this to be succesful, so I think it may also make sense to release "skills" to compose these MCP servers.
I'd either list it as a non-goal, or include it as a workstream between these MCP servers.
There was a problem hiding this comment.
I don't see this called out, but I envision MCP as a piece of the solution. Would it make sense to also include agent skills that can leverage various MCP servers to complete tasks?
This is what we already have in the proposal in Project Scope and Architecture section.
The goal of this SIG is to deliver an initial implementation of MCP server(s) and/or Agent Skills for the OpenTelemetry project in coordination with existing SIGs
jpkrohling
left a comment
There was a problem hiding this comment.
I'm giving my approval as I'm supportive of this. However, this proposal still needs a TC sponsor before it gets accepted.
There was a problem hiding this comment.
It doesn’t look like we’re able to find a TC sponsor for this project at the moment.
Please don’t take this as a sign that the project isn’t valuable or interesting to OpenTelemetry. The challenge is not about relevance, but about our capacity to support it in its current scope.
I believe this effort could continue and deliver strong value if it were structured a bit differently. For example:
- Scope it to a single SIG and a single artifact (Collector, docs, Weaver, instrumentations, a specific SDK, etc.) where MCP/skills can provide concrete improvements. [UPDATE] DevEx SIG is not one of them - it does not ship artifacts that MCP could help with
- Ensure the corresponding SIG is ready to actively embrace the effort as a sub-project, review PRs, and collaborate on improvements to the underlying component. We should optimize usability directly within those components wherever possible.
- Host MCP/skills artifacts either in the corresponding SIG repository or in DevEx repository.
- Hold discussions within the SIG’s regular meetings, additional discussion can happen within DevEx (since the DevEx SIG is comfortable with this setup) or during a dedicated time for MCP, which we can also set up.
In its current form, the project aims to work across multiple SIGs, even if in a phased approach, and to build a shared solution for all of them. While that could be a valid long-term direction, it requires significant cross-SIG coordination and alignment. Realistically, this effort is likely to require guidance and active TC sponsorship, which we do not currently have the capacity for.
A focused, SIG-by-SIG approach would allow the work to move forward incrementally. Once there is a successful engagement with one SIG, the outcomes could be shared with the broader community and then continue with additional SIGs.
This approach would not block a common look and feel for OTel AI tooling. Instead, it would ensure that the tooling evolves in close connection with the underlying components and addresses real usability challenges that affect both humans and AI systems today. Having a pilot run with an interested SIG would provide valuable input and help inform future phases.
[UPDATE] I'm happy to dismiss my review if there is an @open-telemetry/technical-committee member that's ready to support it
|
I have updated the proposal with agreement from @niwoerner to scope this to the collector only. I believe that having this as a project is beneficial from multiple aspects:
A dedicated project will better facilitate our needs. |
|
Thanks for the update, @pavolloffay ! I'm dismissing my review tagging @open-telemetry/collector-maintainers @open-telemetry/collector-contrib-maintainers and TC members who work on collector @bogdandrutu @jmacd @dashpole for the review |
Can you update the PR title, and the markdown title to make this clearer? e.g. "OpenTelemetry Collector MCP ..." |
|
@trask I have changed the PR title and the title in the doc. |
| * Enable agents to read and write valid Collector configuration. | ||
| * Enable agents to handle API breaking changes (e.g. deprecations, removals, renaming) in the configuration and collector Golang API. | ||
| * Enable agents to upgrade collector. | ||
| * Enable agents to write valid OpenTelemetry Transformation Language (OTTL). | ||
| * Enable agents to troubleshoot collector issues. |
There was a problem hiding this comment.
Paraphrasing from private DM conversation: I don't think there is a need to spell out everything now but I think it would be valuable to list what things you intend to work on to make these possible. For example, you are already working on the configuration schema. Is there an intention to work on similar aspects to enable agents in other use cases?
I think it would be much easier to show value to the Collector SIG if we know more about those intermediate aspects that would enable agents, since that way we can see the value not only for building the MCP but in general for all ways of interacting with the Collector.
There was a problem hiding this comment.
I have added some clarification below this line. Most likely, we will make improvements to the existing collector documentation and collector config schema (which has already started).
We would like to start working in devx repository, which should be created for this project #3198. I don't expect the collector SIG maintainers to be working with us in that repository.
There was a problem hiding this comment.
Creating opentelemetry-mcp-servers repo seems out of scope for this proposal now. If you need a temporary repo to prototype things before opening PR(s), I'd recommend using forks of the repos where the code will ultimately live (e.g. the collector/demo/website/configuration repos).
There was a problem hiding this comment.
We scoped down the proposal for the collector only for now based on @lmolkova feedback. and comment:
Host MCP/skills artifacts either in the corresponding SIG repository or in DevEx repository.
Our intention was to work on the collector use-cases now and then move to another SIG. One of the goals of the proposal is to have a unified experience of installing and managing MCPs/agentic skills.
The proposal was presented in the collector SIG with the intention of building this in a separate repository.
There was a problem hiding this comment.
Sorry, realize you're getting a lot of mixed messages, I don't think there's consensus in the GC/TC about this yet. If we need a centralized installer, could that be added later?
codeboten
left a comment
There was a problem hiding this comment.
Thanks for the proposal @pavolloffay, would love to help where possible with the work needed here
Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
dmitryax
left a comment
There was a problem hiding this comment.
I can help with this effort from the perspective of a Collector maintainer. I won’t be able to actively work on the implementation, but I can help move things forward on the Collector side and provide technical guidance.
mx-psi
left a comment
There was a problem hiding this comment.
Approving as the current Collector GC liaison. I can act as a liaison but won't be otherwise much involved on the dev side :)
|
|
||
| ### SIG | ||
|
|
||
| This effort will be hosted in the existing Collector SIG. |
There was a problem hiding this comment.
Can we add Dmitrii and Alex Boten here?
Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
Resolves #3129
Related to open-telemetry/opentelemetry.io#8331