Skip to content

Support spec v1.11.0: OpenTelemetry OTLP tracing configuration #3187

@lpcox

Description

@lpcox

Context

gh-aw#24602 was merged, extending the MCP Gateway Specification from v1.10.0 to v1.11.0. It adds an optional opentelemetry configuration object to the gateway config section. When configured, the gateway must emit distributed tracing spans for each MCP tool invocation using OTLP/HTTP.

Related: #3177 — our existing feature issue for OTLP tracing (architecture analysis + solution proposal). This issue focuses specifically on spec compliance with v1.11.0.

What the spec requires (§4.1.3.6)

Config schema (opentelemetry object in gateway)

Field Type Required Description
endpoint string Yes OTLP/HTTP collector URL. MUST be HTTPS. Supports ${VAR} expansion.
headers object No HTTP headers for export requests (e.g., auth tokens). Values support ${VAR}.
traceId string No Parent trace ID (32-char lowercase hex, W3C format). Supports ${VAR}.
spanId string No Parent span ID (16-char lowercase hex, W3C format). Ignored without traceId. Supports ${VAR}.
serviceName string No service.name resource attribute. Default: "mcp-gateway".

Required tracing behavior

When opentelemetry is configured, the gateway MUST:

  1. Create a root span for the gateway process lifetime with service.name set to serviceName
  2. Create a child span per MCP tool invocation with attributes:
    • mcp.server — server name from config
    • mcp.method — JSON-RPC method (e.g., tools/call)
    • mcp.tool — tool name
    • http.status_code — HTTP status of proxied response
  3. Record accurate start/end timestamps
  4. Export via OTLP/HTTP to the configured endpoint
  5. Apply configured headers to every export request
  6. Propagate W3C traceparent when traceId/spanId provided

Failure handling

  • Gateway MUST NOT fail to start if collector is unreachable
  • Export failures SHOULD be logged as warnings, MUST NOT affect MCP processing
  • SHOULD implement exponential backoff retry

Validation rules

  • endpoint required when opentelemetry present
  • endpoint MUST be HTTPS
  • traceId must be 32-char lowercase hex (or ${VAR})
  • spanId must be 16-char lowercase hex (or ${VAR})
  • spanId without traceId → log warning, ignore

Gap analysis: current gateway vs spec

Existing TracingConfig (from recent work)

type TracingConfig struct {
    Endpoint    string   `toml:"endpoint" json:"endpoint,omitempty"`
    ServiceName string   `toml:"service_name" json:"service_name,omitempty"`
    SampleRate  *float64 `toml:"sample_rate" json:"sample_rate,omitempty"`
}

What needs to change

Area Current State Required by Spec Work
Config fields endpoint, service_name, sample_rate Add headers, traceId, spanId Add 3 fields to TracingConfig
TOML field name [gateway.tracing] Spec uses opentelemetry Add TOML alias or rename
JSON stdin config Not wired opentelemetry in StdinGatewayConfig Add StdinOpenTelemetryConfig + conversion
Endpoint validation None MUST be HTTPS Add validation in validation.go
traceId/spanId validation N/A 32/16-char hex regex Add validation
Variable expansion All fields support ${VAR} via existing expansion Spec requires it Already supported ✅
W3C trace context Not implemented Construct traceparent from traceId+spanId New: build parent context
Headers Not supported Pass to OTLP exporter Thread through exporter config
Root span Not implemented Process-lifetime root span New: create at startup
Tool call spans Not implemented Per-invocation with mcp.* attributes New: instrument callBackendTool()
OTLP export Not implemented OTLP/HTTP to endpoint New: init TracerProvider with OTLP exporter
Failure isolation Not implemented Export errors must not affect MCP Use batch processor + noop fallback
SampleRate In config Not in spec (but harmless) Keep as extension field
Compliance tests None T-OTEL-001 through T-OTEL-010 10 new tests

Proposed implementation plan

1. Config alignment (~80 lines)

Update TracingConfig to match spec §4.1.3.6:

type TracingConfig struct {
    Endpoint    string            `toml:"endpoint" json:"endpoint,omitempty"`
    Headers     map[string]string `toml:"headers" json:"headers,omitempty"`
    TraceID     string            `toml:"trace_id" json:"traceId,omitempty"`
    SpanID      string            `toml:"span_id" json:"spanId,omitempty"`
    ServiceName string            `toml:"service_name" json:"serviceName,omitempty"`
    SampleRate  *float64          `toml:"sample_rate" json:"sampleRate,omitempty"`
}

Add opentelemetry alias in TOML (the spec uses opentelemetry, our current config uses tracing). Support both for backward compatibility.

Wire into StdinGatewayConfig + convertStdinConfig().

2. Validation (~60 lines)

In validation.go:

  • endpoint must be HTTPS (when present)
  • traceId must match ^[0-9a-f]{32}$ (after variable expansion)
  • spanId must match ^[0-9a-f]{16}$ (after variable expansion)
  • spanId without traceId → warning
  • endpoint required when opentelemetry object is present

3. OTLP exporter + TracerProvider (~100 lines)

New internal/tracing/ package:

  • InitTracerProvider(cfg *config.TracingConfig) → returns *sdktrace.TracerProvider or noop
  • Uses OTLP/HTTP exporter with configured endpoint + headers
  • Batch span processor (built-in retry + backoff)
  • Constructs W3C parent context from traceId+spanId if provided
  • Graceful shutdown via TracerProvider.Shutdown(ctx)

4. Instrumentation (~80 lines)

In callBackendTool() (unified.go):

ctx, span := tracer.Start(ctx, "mcp.tool_call",
    trace.WithAttributes(
        attribute.String("mcp.server", serverID),
        attribute.String("mcp.method", "tools/call"),
        attribute.String("mcp.tool", toolName),
    ))
defer span.End()

Root span created at startup in cmd/root.go.

5. Compliance tests (~200 lines)

T-OTEL-001 through T-OTEL-010 as described in spec §10.1.10.

New dependencies

go.opentelemetry.io/otel
go.opentelemetry.io/otel/sdk
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp

Compliance test mapping

Test ID Description Type
T-OTEL-001 Gateway starts when opentelemetry omitted Config
T-OTEL-002 Gateway starts with valid endpoint Config
T-OTEL-003 Reject missing endpoint Validation
T-OTEL-004 Reject non-HTTPS endpoint Validation
T-OTEL-005 Span per tool call with required attributes Integration
T-OTEL-006 Headers sent with OTLP export Integration
T-OTEL-007 W3C traceparent with traceId + spanId Integration
T-OTEL-008 Random spanId when only traceId provided Integration
T-OTEL-009 Export failure doesn't affect MCP Resilience
T-OTEL-010 serviceName in service.name attribute Integration

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions