Skip to content

feat: OTLP trace export from agent runtime #24373

@jaroslawgajewski

Description

@jaroslawgajewski

Problem

gh-aw already depends on go.opentelemetry.io/otel (v1.39.0+) but does not
actively instrument execution paths or export traces. The OpenTelemetry
dependency is present but dormant.

Without OTLP export, there is no way to get structured, per-LLM-call span data
from the agent runtime. All observability is limited to post-run aggregate
metadata, which misses the rich execution timeline.

Proposed solution

Activate OpenTelemetry instrumentation for key code paths and allow users to
configure an OTLP exporter endpoint:

# In workflow frontmatter or env
observability:
  otlp:
    endpoint: ${{ secrets.OTLP_ENDPOINT }}

Or via standard OTel env vars:

  • OTEL_EXPORTER_OTLP_ENDPOINT
  • OTEL_SERVICE_NAME=gh-aw

Spans to instrument

Code path Span name Key attributes
Agent execution gh-aw.agent.execute workflow, engine, model
Each agent turn gh-aw.agent.turn turn_number, tokens, cost
MCP tool call gh-aw.mcp.call tool, server, status
Safe-output processing gh-aw.output.process type, validation
AWF network request gh-aw.awf.request domain, allowed/blocked

Trace hierarchy

Trace: workflow-run-{call_id}
  └─ Span: gh-aw.agent.execute
       ├─ Span: gh-aw.agent.turn (turn 1)
       │    ├─ Span: gh-aw.mcp.call (get_file_contents)
       │    └─ Span: gh-aw.mcp.call (search_code)
       ├─ Span: gh-aw.agent.turn (turn 2)
       │    └─ Span: gh-aw.mcp.call (create_pull_request)
       └─ Span: gh-aw.output.process (safe-output)

Use cases unlocked

  • Any OTLP-compatible backend: Langfuse, Datadog, Honeycomb, Jaeger,
    Grafana Tempo, New Relic — all accept OTLP natively.
  • Per-call latency profiling: See which LLM call or tool call is slowest.
  • Distributed tracing: Correlate gh-aw spans with downstream service spans.
  • No vendor lock-in: Standard protocol, any backend, any visualization.

Implementation notes

  • The OTLP endpoint must be auto-added to the AWF firewall allowlist.
  • Use noop tracer when OTLP is not configured (zero overhead).
  • Graceful degradation: exporter errors must never fail the workflow.
  • Batching: use the OTel SDK's built-in batch span processor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions