Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .cspell.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,13 @@ ignoreRegExpList:
- GitHub Handle in YML
words:
- Abinet
- agentic
- Alain
- Alff
- Arize
- Aronoff
- Ashpole
- austinlparker
- automations
- Baeyens
- calendar-localization-ptbr
Expand Down Expand Up @@ -194,6 +196,7 @@ words:
- mjwolf
- mkorbi
- molkova
- mottibec
- msomasu
- MSYS
- PATHCONV
Expand All @@ -216,6 +219,7 @@ words:
- opentelemetrybot
- ossf
- otel
- otelcol
- otel-agentmanwg
- otel-comms
- otel-ebpf
Expand All @@ -230,6 +234,7 @@ words:
- Prometheus
- paixão
- pająk
- pavolloffay
- passcodes
- poncelow
- proto
Expand All @@ -256,6 +261,7 @@ words:
- severin
- sguyon
- sharma
- shiftyp
- shkuro
- sigelman
- signup
Expand Down
147 changes: 147 additions & 0 deletions projects/agentic-workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# OpenTelemetry Collector Agentic Workflows

## Background and description

The OpenTelemetry project consists of a large number of components, including collector, SDKs, and instrumentation libraries, which are often configured and managed separately. This distribution of components poses a major operational challenge which is universally recognized by the community [1](https://opentelemetry.io/blog/2025/otel-rocks/), [2](https://www.youtube.com/watch?v=xEu8_Aeo_-o).

Large language models (LLMs) and Agentic Workflows present a significant opportunity to simplify the adoption, implementation, and management of the OpenTelemetry stack. An AI agent could, for example, facilitate configuration changes, resolve deployment issues, or assist and simplify the instrumentation process.

At the moment, the OpenTelemetry project does not have official support for these workflows. This has led to the creation of several independent, open-source projects (MCP servers) to fill the gap.
The [Can AI instrument OpenTelemetry](https://quesma.com/benchmarks/otel/) Benchmark demonstrates the complexity of instrumentation process and shows the gap of successfully using AI agents with OpenTelemetry.

As AI tooling becomes a standard part of developer workflows. Users, which are looking to extend their agents with tooling optimized for OpenTelemetry, have no easy way to discover what's available in the ecosystem. There's no central place to learn which MCP servers or other tools exist, what capabilities they offer, or where to file issues/requests.

This project is also motivated by the need to support the [Stability Proposal](https://opentelemetry.io/blog/2025/stability-proposal-announcement/) and [[Graduation] OpenTelemetry Graduation Application](https://github.com/cncf/toc/issues/1739). While the [OTEP: Stable by Default](https://github.com/open-telemetry/opentelemetry-specification/pull/4813) initiative aims to default to stable components, a large portion of the ecosystem—including the majority of collector components—remains in alpha or beta, creating complexity for users around potential breaking changes. This project aims to bridge this gap without adding core functionality or duplicating documentation. Instead, it focuses on making OpenTelemetry easier to use and more stable by enriching the ecosystem with new agentic workflows.

### Existing OpenTelemetry MCP Servers

The proliferation of these projects demonstrates strong community interest and the clear potential of this technology:

* [open-telemetry/weaver](https://github.com/open-telemetry/weaver): MCP server for the OpenTelemetry Weaver
* [pavolloffay/opentelemetry-mcp-server](https://github.com/pavolloffay/opentelemetry-mcp-server): Focuses on collector configuration.
* [austinlparker/otel-mcp](https://github.com/austinlparker/otel-mcp): Handles collector configuration and data profiling.
* [mottibec/otelcol-mcp](https://github.com/mottibec/otelcol-mcp): Focuses on collector configuration.
* [shiftyp/otel-mcp-server](https://github.com/shiftyp/otel-mcp-server): Provides data profiling, but requires OpenSearch.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to answer any specific questions about this project!

* [liatrio-labs/otel-instrumentation-mcp](https://github.com/liatrio-labs/otel-instrumentation-mcp): Manages instrumentation.
* [traceloop/opentelemetry-mcp-server](https://github.com/traceloop/opentelemetry-mcp-server): Provides data profiling by connecting to Jaeger, Tempo and Traceloop.

Each of these servers uses a different approach, particularly for collector configuration and data profiling.
This fragmentation creates confusion for users regarding installation and configuration. Furthermore, using multiple competing tools is inefficient as they consume the context window with overlapping functionality.

### Current challenges

Adopting OpenTelemetry presents several significant challenges. Many users lack deep observability expertise, and enabling it is often treated as an afterthought.

The sheer size and velocity of the OpenTelemetry ecosystem add to this difficulty. The project encompasses instrumentation for over 12 languages and includes diverse components like the Collector, OpAMP, and Weaver. Each component is released independently with its own setup requirements and release schedule. For example, the Collector is released bi-weekly, while auto-instrumentation libraries follow different schedules.

Maintenance is also complex. The ecosystem evolves rapidly, introducing frequent breaking changes. Our analysis of the Collector changelogs indicates that approximately 29% of changes are breaking. Keeping up with these updates requires significant manual effort to review release notes, update configuration files, and modify code.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support this I have briefly looked at the changelogs and categorized changes https://github.com/pavolloffay/community/blob/mcp-changelog-analysis/FINAL_CHANGELOG_REPORT.md


## Project Scope and Architecture

The scope of this project is to enable **Agentic Workflows** to simplify deployment, configuration, and day-2 operations for the OpenTelemetry collector.
Additional components (e.g., SDKs, instrumentation, semantic conventions) could be added in phased approach in the future.

To support this workflow, a standardized interface is required for Agents and LLMs to interact with the OpenTelemetry ecosystem. The project will focus on [The Model Context Protocol (MCP)](https://modelcontextprotocol.io/) and [Agent Skills](https://agentskills.io/home) concepts to provide this interface for agents to interact with the OpenTelemetry project.

The goal of this project is to deliver an initial implementation of MCP server(s) and/or Agent Skills for the OpenTelemetry collector in coordination with the collector SIG.

### Goals, objectives, and requirements

#### Collector

The Collector follows a fast two-week release cadence, which requires constant maintenance to stay up to date and avoid breaking changes. Additionally configuring the collector correctly and writing valid OTTL statements is important for effective usage, but requires domain expertise and isn't always trivial. General-purpose coding agents struggle here because they lack up-to-date knowledge of recent releases and aren't specialized for Collector workflows.

* Enable agents to read and write valid Collector configuration.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example how agentic workflow can help with maintaining collector docs https://gist.github.com/pavolloffay/c78595721676576b64768c247d1e22c5

* Enable agents to handle API breaking changes (e.g. deprecations, removals, renaming) in the configuration and collector Golang API.
* Enable agents to upgrade collector.
* Enable agents to write valid OpenTelemetry Transformation Language (OTTL).
* Enable agents to troubleshoot collector issues.
Comment on lines +54 to +58
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paraphrasing from private DM conversation: I don't think there is a need to spell out everything now but I think it would be valuable to list what things you intend to work on to make these possible. For example, you are already working on the configuration schema. Is there an intention to work on similar aspects to enable agents in other use cases?

I think it would be much easier to show value to the Collector SIG if we know more about those intermediate aspects that would enable agents, since that way we can see the value not only for building the MCP but in general for all ways of interacting with the Collector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some clarification below this line. Most likely, we will make improvements to the existing collector documentation and collector config schema (which has already started).

We would like to start working in devx repository, which should be created for this project #3198. I don't expect the collector SIG maintainers to be working with us in that repository.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating opentelemetry-mcp-servers repo seems out of scope for this proposal now. If you need a temporary repo to prototype things before opening PR(s), I'd recommend using forks of the repos where the code will ultimately live (e.g. the collector/demo/website/configuration repos).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We scoped down the proposal for the collector only for now based on @lmolkova feedback. and comment:

Host MCP/skills artifacts either in the corresponding SIG repository or in DevEx repository.

Our intention was to work on the collector use-cases now and then move to another SIG. One of the goals of the proposal is to have a unified experience of installing and managing MCPs/agentic skills.

The proposal was presented in the collector SIG with the intention of building this in a separate repository.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, realize you're getting a lot of mixed messages, I don't think there's consensus in the GC/TC about this yet. If we need a centralized installer, could that be added later?


The mentioned goals might require enhancements in the collector repositories. We expect to make improvements in the documentation as it is the primary source for building skills and knowledge base for the agents.
Another example is improvements in the collector configuration schema which is already being worked on in the collector SIG.

#### Documentation and distribution

Coherent documentation and distribution of the agentic workflows are required to enable users install and manage the agentic workflows.

* Introduce documentation for the Agentic Workflows.
* Align distribution and installation of the components with the Agentic Workflows.
* Agentic workflow documentation will be part of the existing [OpenTelemetry documentation](https://opentelemetry.io/docs/) and will not duplicate any existing content.

### Non Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this called out, but I envision MCP as a piece of the solution. Would it make sense to also include agent skills that can leverage various MCP servers to complete tasks?

Personally, I'd love a "tight" set of "CUJ"s that we start with, and I imagine a lot of these will cross between MCP servers. I want this to be succesful, so I think it may also make sense to release "skills" to compose these MCP servers.

I'd either list it as a non-goal, or include it as a workstream between these MCP servers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this called out, but I envision MCP as a piece of the solution. Would it make sense to also include agent skills that can leverage various MCP servers to complete tasks?

This is what we already have in the proposal in Project Scope and Architecture section.

The goal of this SIG is to deliver an initial implementation of MCP server(s) and/or Agent Skills for the OpenTelemetry project in coordination with existing SIGs


* The project will not implement any telemetry backends.
* The project will not maintain a separate documentation knowledge base; it will leverage existing OpenTelemetry documentation.

## Deliverables

The following deliverables can change based on the project progress, community feedback and validation of the agentic workflows.
The deliverables are ordered based on the priority the project team deems them to be.

* MCP server or agentic skill to facilitate deployment, configuration and day-2 operations of the collector.
* MCP server or agentic skill to troubleshoot collector issues.

## Staffing / Help Wanted

This project requires a blend of OpenTelemetry collector, documentation and instrumentation expertise and expertise in building MCP server(s).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that collector and instrumentation + semantic conventions cases are different, work on top of different otel tooling and require different staffing and work with different SIGs, does it make sense for this project to cover all of them?

We've been pushing for smaller, better scoped projects and, given this one targets collector as the first phase, would it make sense to limit this to collector only? If upon completion, the group has appetite to cover other cases, a Phase 2 covering instrumentations / conventions can be started.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that collector and instrumentation + semantic conventions cases are different, work on top of different otel tooling and require different staffing and work with different SIGs, does it make sense for this project to cover all of them?

yes it does make sense to cover all these parts of the ecosystem in a single MCP project.

We might actually start with the instrumentation configuration use-cases given there is almost "stable" configuration schema for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might actually start with the instrumentation configuration use-cases given there is almost "stable" configuration schema for it.

I think you need to get support from @open-telemetry/specs-semconv-maintainers and @open-telemetry/weaver-maintainers for this.

My opinion that these efforts would be best covered within the SIG they belong to. I.e. collector and semconv/weaver.

/cc @jerbly who showed MCP server prototype for weaver in the SIG call today.

If there is something in common between collector MCP server and the instrumentation one there could be some level of coordination, but the core parts and target personas are generally different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big +1 to what @lmolkova asks here, if the goal of this project is to implement an MCP server (or skill or ) for other parts of the project, these other SIGs need to be involved here.


### SIG

This effort will be hosted in the existing Collector SIG.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add Dmitrii and Alex Boten here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


Sponsors for this effort are:

* [@dmitryax](https://github.com/dmitryax) (Splunk)
* [@codeboten](https://github.com/codeboten) (OHoneycomb)

### Required staffing

#### Project Leads(s)

* [@pavolloffay](https://github.com/pavolloffay) (Red Hat)
* [@niwoerner](https://github.com/niwoerner) (OllyGarden)

#### TC Sponsor

Existing Collector SIG TC sponsor.

#### GC Liaison

Existing Collector SIG liaison.

#### Engineers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this project targets collector side first, it would be great to have some active collector contributors onboard. Do we have any @open-telemetry/collector-approvers or @open-telemetry/collector-contrib-approvers interested in helping out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly @mx-psi is interested in resolving the collector JSON schema issue e.g. open-telemetry/opentelemetry-collector#9769 There is already a PR for it open-telemetry/opentelemetry-collector#14288

Eventually we might need to have a sponsor for the MCP component in the collector if we will decide that is the best approach for it.

Copy link
Member

@dmitryax dmitryax Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can participate in this as a collector maintainer. However, I won’t have capacity for active development. @jkoronaAtCisco can help with the JSON schema for collector components configuration, he is actively working on it already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an active collector contributor at the moment, but I'd be happy to jump in and contribute to the development! Count me in as an extra pair of hands to help move this forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to participate in the JSON schema/other parts that are impactful to both the MCP server and high priority Collector areas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you list which projects these active engineers represent? I wan tot see if we have coverage / communication channels with all the SIGs you need to interact with.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi my team would like to help contribute. We currently don't have any active collector contributors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added you @nr-nfajardo


* [@adrielp](https://github.com/adrielp)
* [@shiftyp](https://github.com/shiftyp)
* [@johannaojeling](https://github.com/johannaojeling)
* [@vitorvasc](https://github.com/vitorvasc)
* [@nr-nfajardo](https://github.com/nr-nfajardo)

#### Other Staffing

### Industry outreach (Optional)

The following users have built OpenTelemetry MCP servers:

* [@austinlparker](https://github.com/austinlparker) - author of [otel-mcp](https://github.com/austinlparker/otel-mcp)
* [@mottibec](https://github.com/mottibec) - author of [otelcol-mcp](https://github.com/mottibec/otelcol-mcp)
* [@shiftyp](https://github.com/shiftyp) - author of [otel-mcp-server](https://github.com/shiftyp/otel-mcp-server)

There will be [OpenTelemetry MCP call for contributors post](https://github.com/open-telemetry/opentelemetry.io/pull/8629) to promote the project.

## Expected Timeline

This timeline assumes project approval and resource allocation as outlined in the staffing section. Until staffing is
confirmed and expected time commitments are known, this timeline is in flux.

## Labels

`agentic-workflow`, `mcp` for all PRs and issues related to this project.

## GitHub Project (Post-Approval)

TBD

## SIG Meetings, Roadmap, and Other Info (Post-Approval)

All communication will be done in the existing Collector SIG.