-
Notifications
You must be signed in to change notification settings - Fork 284
Add Collector Model Context Protocol (MCP) project proposal #3128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| # OpenTelemetry Collector Agentic Workflows | ||
|
|
||
| ## Background and description | ||
|
|
||
| The OpenTelemetry project consists of a large number of components, including collector, SDKs, and instrumentation libraries, which are often configured and managed separately. This distribution of components poses a major operational challenge which is universally recognized by the community [1](https://opentelemetry.io/blog/2025/otel-rocks/), [2](https://www.youtube.com/watch?v=xEu8_Aeo_-o). | ||
|
|
||
| Large language models (LLMs) and Agentic Workflows present a significant opportunity to simplify the adoption, implementation, and management of the OpenTelemetry stack. An AI agent could, for example, facilitate configuration changes, resolve deployment issues, or assist and simplify the instrumentation process. | ||
|
|
||
| At the moment, the OpenTelemetry project does not have official support for these workflows. This has led to the creation of several independent, open-source projects (MCP servers) to fill the gap. | ||
| The [Can AI instrument OpenTelemetry](https://quesma.com/benchmarks/otel/) Benchmark demonstrates the complexity of instrumentation process and shows the gap of successfully using AI agents with OpenTelemetry. | ||
|
|
||
| As AI tooling becomes a standard part of developer workflows. Users, which are looking to extend their agents with tooling optimized for OpenTelemetry, have no easy way to discover what's available in the ecosystem. There's no central place to learn which MCP servers or other tools exist, what capabilities they offer, or where to file issues/requests. | ||
|
|
||
| This project is also motivated by the need to support the [Stability Proposal](https://opentelemetry.io/blog/2025/stability-proposal-announcement/) and [[Graduation] OpenTelemetry Graduation Application](https://github.com/cncf/toc/issues/1739). While the [OTEP: Stable by Default](https://github.com/open-telemetry/opentelemetry-specification/pull/4813) initiative aims to default to stable components, a large portion of the ecosystem—including the majority of collector components—remains in alpha or beta, creating complexity for users around potential breaking changes. This project aims to bridge this gap without adding core functionality or duplicating documentation. Instead, it focuses on making OpenTelemetry easier to use and more stable by enriching the ecosystem with new agentic workflows. | ||
|
|
||
| ### Existing OpenTelemetry MCP Servers | ||
pavolloffay marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| The proliferation of these projects demonstrates strong community interest and the clear potential of this technology: | ||
|
|
||
| * [open-telemetry/weaver](https://github.com/open-telemetry/weaver): MCP server for the OpenTelemetry Weaver | ||
| * [pavolloffay/opentelemetry-mcp-server](https://github.com/pavolloffay/opentelemetry-mcp-server): Focuses on collector configuration. | ||
| * [austinlparker/otel-mcp](https://github.com/austinlparker/otel-mcp): Handles collector configuration and data profiling. | ||
| * [mottibec/otelcol-mcp](https://github.com/mottibec/otelcol-mcp): Focuses on collector configuration. | ||
| * [shiftyp/otel-mcp-server](https://github.com/shiftyp/otel-mcp-server): Provides data profiling, but requires OpenSearch. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to answer any specific questions about this project! |
||
| * [liatrio-labs/otel-instrumentation-mcp](https://github.com/liatrio-labs/otel-instrumentation-mcp): Manages instrumentation. | ||
| * [traceloop/opentelemetry-mcp-server](https://github.com/traceloop/opentelemetry-mcp-server): Provides data profiling by connecting to Jaeger, Tempo and Traceloop. | ||
|
|
||
| Each of these servers uses a different approach, particularly for collector configuration and data profiling. | ||
| This fragmentation creates confusion for users regarding installation and configuration. Furthermore, using multiple competing tools is inefficient as they consume the context window with overlapping functionality. | ||
|
|
||
| ### Current challenges | ||
|
|
||
| Adopting OpenTelemetry presents several significant challenges. Many users lack deep observability expertise, and enabling it is often treated as an afterthought. | ||
|
|
||
| The sheer size and velocity of the OpenTelemetry ecosystem add to this difficulty. The project encompasses instrumentation for over 12 languages and includes diverse components like the Collector, OpAMP, and Weaver. Each component is released independently with its own setup requirements and release schedule. For example, the Collector is released bi-weekly, while auto-instrumentation libraries follow different schedules. | ||
|
|
||
| Maintenance is also complex. The ecosystem evolves rapidly, introducing frequent breaking changes. Our analysis of the Collector changelogs indicates that approximately 29% of changes are breaking. Keeping up with these updates requires significant manual effort to review release notes, update configuration files, and modify code. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To support this I have briefly looked at the changelogs and categorized changes https://github.com/pavolloffay/community/blob/mcp-changelog-analysis/FINAL_CHANGELOG_REPORT.md |
||
|
|
||
| ## Project Scope and Architecture | ||
|
|
||
| The scope of this project is to enable **Agentic Workflows** to simplify deployment, configuration, and day-2 operations for the OpenTelemetry collector. | ||
| Additional components (e.g., SDKs, instrumentation, semantic conventions) could be added in phased approach in the future. | ||
|
|
||
| To support this workflow, a standardized interface is required for Agents and LLMs to interact with the OpenTelemetry ecosystem. The project will focus on [The Model Context Protocol (MCP)](https://modelcontextprotocol.io/) and [Agent Skills](https://agentskills.io/home) concepts to provide this interface for agents to interact with the OpenTelemetry project. | ||
|
|
||
| The goal of this project is to deliver an initial implementation of MCP server(s) and/or Agent Skills for the OpenTelemetry collector in coordination with the collector SIG. | ||
|
|
||
| ### Goals, objectives, and requirements | ||
pavolloffay marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### Collector | ||
|
|
||
| The Collector follows a fast two-week release cadence, which requires constant maintenance to stay up to date and avoid breaking changes. Additionally configuring the collector correctly and writing valid OTTL statements is important for effective usage, but requires domain expertise and isn't always trivial. General-purpose coding agents struggle here because they lack up-to-date knowledge of recent releases and aren't specialized for Collector workflows. | ||
|
|
||
| * Enable agents to read and write valid Collector configuration. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An example how agentic workflow can help with maintaining collector docs https://gist.github.com/pavolloffay/c78595721676576b64768c247d1e22c5 |
||
| * Enable agents to handle API breaking changes (e.g. deprecations, removals, renaming) in the configuration and collector Golang API. | ||
| * Enable agents to upgrade collector. | ||
| * Enable agents to write valid OpenTelemetry Transformation Language (OTTL). | ||
| * Enable agents to troubleshoot collector issues. | ||
|
Comment on lines
+54
to
+58
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Paraphrasing from private DM conversation: I don't think there is a need to spell out everything now but I think it would be valuable to list what things you intend to work on to make these possible. For example, you are already working on the configuration schema. Is there an intention to work on similar aspects to enable agents in other use cases? I think it would be much easier to show value to the Collector SIG if we know more about those intermediate aspects that would enable agents, since that way we can see the value not only for building the MCP but in general for all ways of interacting with the Collector.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added some clarification below this line. Most likely, we will make improvements to the existing collector documentation and collector config schema (which has already started). We would like to start working in devx repository, which should be created for this project #3198. I don't expect the collector SIG maintainers to be working with us in that repository.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Creating opentelemetry-mcp-servers repo seems out of scope for this proposal now. If you need a temporary repo to prototype things before opening PR(s), I'd recommend using forks of the repos where the code will ultimately live (e.g. the collector/demo/website/configuration repos).
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We scoped down the proposal for the collector only for now based on @lmolkova feedback. and comment:
Our intention was to work on the collector use-cases now and then move to another SIG. One of the goals of the proposal is to have a unified experience of installing and managing MCPs/agentic skills. The proposal was presented in the collector SIG with the intention of building this in a separate repository.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, realize you're getting a lot of mixed messages, I don't think there's consensus in the GC/TC about this yet. If we need a centralized installer, could that be added later? |
||
|
|
||
| The mentioned goals might require enhancements in the collector repositories. We expect to make improvements in the documentation as it is the primary source for building skills and knowledge base for the agents. | ||
| Another example is improvements in the collector configuration schema which is already being worked on in the collector SIG. | ||
|
|
||
| #### Documentation and distribution | ||
|
|
||
| Coherent documentation and distribution of the agentic workflows are required to enable users install and manage the agentic workflows. | ||
|
|
||
| * Introduce documentation for the Agentic Workflows. | ||
| * Align distribution and installation of the components with the Agentic Workflows. | ||
| * Agentic workflow documentation will be part of the existing [OpenTelemetry documentation](https://opentelemetry.io/docs/) and will not duplicate any existing content. | ||
|
|
||
| ### Non Goals | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see this called out, but I envision MCP as a piece of the solution. Would it make sense to also include agent skills that can leverage various MCP servers to complete tasks? Personally, I'd love a "tight" set of "CUJ"s that we start with, and I imagine a lot of these will cross between MCP servers. I want this to be succesful, so I think it may also make sense to release "skills" to compose these MCP servers. I'd either list it as a non-goal, or include it as a workstream between these MCP servers.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is what we already have in the proposal in
|
||
|
|
||
| * The project will not implement any telemetry backends. | ||
| * The project will not maintain a separate documentation knowledge base; it will leverage existing OpenTelemetry documentation. | ||
|
|
||
| ## Deliverables | ||
|
|
||
| The following deliverables can change based on the project progress, community feedback and validation of the agentic workflows. | ||
| The deliverables are ordered based on the priority the project team deems them to be. | ||
|
|
||
| * MCP server or agentic skill to facilitate deployment, configuration and day-2 operations of the collector. | ||
| * MCP server or agentic skill to troubleshoot collector issues. | ||
|
|
||
| ## Staffing / Help Wanted | ||
|
|
||
| This project requires a blend of OpenTelemetry collector, documentation and instrumentation expertise and expertise in building MCP server(s). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. given that collector and instrumentation + semantic conventions cases are different, work on top of different otel tooling and require different staffing and work with different SIGs, does it make sense for this project to cover all of them? We've been pushing for smaller, better scoped projects and, given this one targets collector as the first phase, would it make sense to limit this to collector only? If upon completion, the group has appetite to cover other cases, a Phase 2 covering instrumentations / conventions can be started.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
yes it does make sense to cover all these parts of the ecosystem in a single MCP project. We might actually start with the instrumentation configuration use-cases given there is almost "stable" configuration schema for it.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think you need to get support from @open-telemetry/specs-semconv-maintainers and @open-telemetry/weaver-maintainers for this. My opinion that these efforts would be best covered within the SIG they belong to. I.e. collector and semconv/weaver. /cc @jerbly who showed MCP server prototype for weaver in the SIG call today. If there is something in common between collector MCP server and the instrumentation one there could be some level of coordination, but the core parts and target personas are generally different.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. big +1 to what @lmolkova asks here, if the goal of this project is to implement an MCP server (or skill or ) for other parts of the project, these other SIGs need to be involved here. |
||
|
|
||
| ### SIG | ||
|
|
||
| This effort will be hosted in the existing Collector SIG. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add Dmitrii and Alex Boten here?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added |
||
|
|
||
| Sponsors for this effort are: | ||
|
|
||
| * [@dmitryax](https://github.com/dmitryax) (Splunk) | ||
| * [@codeboten](https://github.com/codeboten) (OHoneycomb) | ||
|
|
||
| ### Required staffing | ||
|
|
||
| #### Project Leads(s) | ||
|
|
||
| * [@pavolloffay](https://github.com/pavolloffay) (Red Hat) | ||
| * [@niwoerner](https://github.com/niwoerner) (OllyGarden) | ||
|
|
||
| #### TC Sponsor | ||
|
|
||
| Existing Collector SIG TC sponsor. | ||
|
|
||
| #### GC Liaison | ||
|
|
||
| Existing Collector SIG liaison. | ||
|
|
||
| #### Engineers | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this project targets collector side first, it would be great to have some active collector contributors onboard. Do we have any @open-telemetry/collector-approvers or @open-telemetry/collector-contrib-approvers interested in helping out?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not directly @mx-psi is interested in resolving the collector JSON schema issue e.g. open-telemetry/opentelemetry-collector#9769 There is already a PR for it open-telemetry/opentelemetry-collector#14288 Eventually we might need to have a sponsor for the MCP component in the collector if we will decide that is the best approach for it.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can participate in this as a collector maintainer. However, I won’t have capacity for active development. @jkoronaAtCisco can help with the JSON schema for collector components configuration, he is actively working on it already.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not an active collector contributor at the moment, but I'd be happy to jump in and contribute to the development! Count me in as an extra pair of hands to help move this forward.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am happy to participate in the JSON schema/other parts that are impactful to both the MCP server and high priority Collector areas
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you list which projects these active engineers represent? I wan tot see if we have coverage / communication channels with all the SIGs you need to interact with. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi my team would like to help contribute. We currently don't have any active collector contributors.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added you @nr-nfajardo |
||
|
|
||
| * [@adrielp](https://github.com/adrielp) | ||
| * [@shiftyp](https://github.com/shiftyp) | ||
| * [@johannaojeling](https://github.com/johannaojeling) | ||
| * [@vitorvasc](https://github.com/vitorvasc) | ||
| * [@nr-nfajardo](https://github.com/nr-nfajardo) | ||
|
|
||
pavolloffay marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| #### Other Staffing | ||
|
|
||
| ### Industry outreach (Optional) | ||
|
|
||
| The following users have built OpenTelemetry MCP servers: | ||
|
|
||
| * [@austinlparker](https://github.com/austinlparker) - author of [otel-mcp](https://github.com/austinlparker/otel-mcp) | ||
| * [@mottibec](https://github.com/mottibec) - author of [otelcol-mcp](https://github.com/mottibec/otelcol-mcp) | ||
| * [@shiftyp](https://github.com/shiftyp) - author of [otel-mcp-server](https://github.com/shiftyp/otel-mcp-server) | ||
|
|
||
| There will be [OpenTelemetry MCP call for contributors post](https://github.com/open-telemetry/opentelemetry.io/pull/8629) to promote the project. | ||
|
|
||
| ## Expected Timeline | ||
|
|
||
| This timeline assumes project approval and resource allocation as outlined in the staffing section. Until staffing is | ||
| confirmed and expected time commitments are known, this timeline is in flux. | ||
|
|
||
| ## Labels | ||
|
|
||
| `agentic-workflow`, `mcp` for all PRs and issues related to this project. | ||
|
|
||
| ## GitHub Project (Post-Approval) | ||
|
|
||
| TBD | ||
|
|
||
| ## SIG Meetings, Roadmap, and Other Info (Post-Approval) | ||
|
|
||
| All communication will be done in the existing Collector SIG. | ||
Uh oh!
There was an error while loading. Please reload this page.