Skip to content

feat: native OTLP export via configure_otlp()#54

Open
ellucas-creator wants to merge 1 commit intomainfrom
feat/otlp-native-support
Open

feat: native OTLP export via configure_otlp()#54
ellucas-creator wants to merge 1 commit intomainfrom
feat/otlp-native-support

Conversation

@ellucas-creator
Copy link
Collaborator

Native OTLP Support

Adds zero-breaking-change native OTLP export to bmasterai. One call to configure_otlp() before your first monitor call and all agent events automatically flow to any OTel-compatible backend.

Usage

from bmasterai import configure_otlp, get_monitor

configure_otlp(endpoint="http://localhost:4317", service_name="my-agent")

monitor = get_monitor()
monitor.track_agent_start("researcher")
monitor.track_llm_call("researcher", "claude-3-5-sonnet", tokens_used=1200, duration_ms=1840)
monitor.track_agent_stop("researcher")
# → spans + metrics automatically sent to your collector

What gets exported

Spans (traces)

Span Trigger
agent.<agent_id> track_agent_start / track_agent_stop
llm.call track_llm_call — includes model, tokens, latency, reasoning_steps
task.<task_name> track_task_duration
Error events track_error — added to active agent span

Metrics

Metric Type
bmasterai.llm.tokens_used Counter (agent_id, model)
bmasterai.llm.call_duration Histogram ms
bmasterai.task.duration Histogram ms
bmasterai.agent.errors Counter (agent_id, error_type)
bmasterai.custom.metric Counter (passthrough labels)

Install

pip install 'bmasterai[otlp]'       # gRPC (Jaeger, Grafana, local collector)
pip install 'bmasterai[otlp-http]'  # HTTP/protobuf (Honeycomb, New Relic, Grafana Cloud)

Zero install impact for existing users — opentelemetry-sdk is optional. If not installed, all OTLP calls are no-ops.

Supported backends

Grafana Tempo, Jaeger, Honeycomb, Datadog, New Relic, Prometheus (OTLP bridge), any OTel collector.

Files changed

  • src/bmasterai/otlp.py — new OTLP module (hooks, span management, instruments)
  • src/bmasterai/monitoring.py — hook calls added to all track_* methods
  • src/bmasterai/__init__.pyconfigure_otlp exported from top level
  • pyproject.toml[otlp] and [otlp-http] optional dependency groups
  • examples/otlp-export/ — working example + README with backend configs

Add bmasterai/otlp.py — a zero-breaking-change OTLP layer that hooks
into AgentMonitor and emits OTel spans + metrics to any compatible backend.

What's new:
- bmasterai.configure_otlp(endpoint, service_name, headers, use_http, ...)
- Wraps track_agent_start/stop   → root span per agent lifecycle
- Wraps track_llm_call           → child span with token/model/latency attrs
- Wraps track_task_duration      → child span per task
- Wraps track_error              → counter + span event
- Wraps record_custom_metric     → OTel counter

Metrics exported:
  bmasterai.llm.tokens_used      (counter, labels: agent_id, model)
  bmasterai.llm.call_duration    (histogram ms)
  bmasterai.task.duration        (histogram ms)
  bmasterai.agent.errors         (counter, labels: agent_id, error_type)
  bmasterai.custom.metric        (counter, passthrough labels)

Optional dependencies (zero install required for existing users):
  pip install 'bmasterai[otlp]'       # gRPC transport
  pip install 'bmasterai[otlp-http]'  # HTTP/protobuf transport

Supported backends: Grafana Tempo, Jaeger, Honeycomb, Datadog,
New Relic, Prometheus (OTLP bridge), any OTel collector.

Examples:
  examples/otlp-export/agent_with_otlp.py
  examples/otlp-export/README.md
@claude
Copy link

claude bot commented Mar 20, 2026

Claude finished @ellucas-creator's task —— View job


Code Review Complete

I've thoroughly reviewed this PR that adds native OTLP export functionality to bmasterai. Here's my comprehensive analysis:

Todo List

  • Read and analyze all changed files
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Evaluate security concerns
  • Review test coverage
  • Provide comprehensive feedback

🔍 Overall Assessment

This is a well-structured PR that adds valuable OpenTelemetry integration. The implementation is clean, follows good practices, and maintains backward compatibility. However, there are several areas for improvement.

✅ Strengths

  1. Zero-breaking-change design - Optional dependencies with graceful fallback
  2. Clean architecture - Separation of concerns between otlp.py and monitoring.py
  3. Comprehensive instrumentation - Covers spans, metrics, and error tracking
  4. Good documentation - Clear examples and configuration options
  5. Proper resource management - Uses OpenTelemetry best practices

⚠️ Issues & Recommendations

Critical - Missing Test Coverage

  • NO TESTS for the new OTLP functionality (src/bmasterai/otlp.py:323 lines of code)
  • Tests only exist for existing functionality in tests/test_enhanced_functionality.py
  • Recommendation: Add comprehensive tests for:
    • configure_otlp() with different transports (gRPC/HTTP)
    • Mock OpenTelemetry dependencies unavailable scenarios
    • All instrumentation hooks (on_agent_start, on_llm_call, etc.)
    • Error handling when OTLP setup fails

Security Concerns

  1. Headers exposure (src/bmasterai/otlp.py:79, src/bmasterai/otlp.py:138, src/bmasterai/otlp.py:166)

    headers: Optional[Dict[str, str]] = None,

    API keys/auth headers could be logged or exposed. Add sanitization for logging.

  2. Insecure default (src/bmasterai/otlp.py:81)

    insecure: bool = True,

    Defaulting to insecure transport is dangerous for production. Consider False as default or add warnings.

Code Quality Issues

  1. Global state management (src/bmasterai/otlp.py:58-67)

    _otlp_configured = False
    _tracer: Optional[Any] = None
    _meter: Optional[Any] = None
    _agent_spans: Dict[str, Any] = {}

    No cleanup/reset mechanism. Add reset_otlp() function for testing.

  2. Exception handling (src/bmasterai/otlp.py:209-212)

    except Exception as exc:
        logger.warning("Metrics OTLP setup failed (traces still active): %s", exc)
        _meter = None

    Too broad exception handling masks specific issues.

  3. Type safety (src/bmasterai/otlp.py:59-60)

    _tracer: Optional[Any] = None
    _meter: Optional[Any] = None

    Using Any type reduces type safety. Consider proper OpenTelemetry type imports.

Performance Considerations

  1. Span storage (src/bmasterai/otlp.py:63)

    _agent_spans: Dict[str, Any] = {}

    No cleanup mechanism for long-running agents - potential memory leak.

  2. Synchronous span operations (src/bmasterai/otlp.py:268-272)
    Operations are synchronous and could add latency to agent operations.

Potential Bugs

  1. Version mismatch (src/bmasterai/otlp.py:77)

    service_version: str = "0.2.3",

    Hardcoded version should match package version dynamically.

  2. Missing null checks (src/bmasterai/otlp.py:311-313)

    span = _agent_spans.get(agent_id)
    if span:
        span.add_event("error", {"bmasterai.error_type": error_type})

    Should verify span is still active before adding events.

  3. Import error handling (src/bmasterai/monitoring.py:13-16)

    try:
        from bmasterai import otlp as _otlp
    except ImportError:
        _otlp = None

    Circular import potential since otlp.py doesn't exist in base package.

📋 Recommended Actions

  1. Add comprehensive tests - This is the highest priority
  2. Implement security improvements:
    • Add header sanitization for logs
    • Change insecure=False as default with proper warnings
    • Document security implications
  3. Add cleanup mechanisms:
    • reset_otlp() function for testing
    • Automatic span cleanup for long-running agents
  4. Improve error handling with more specific exceptions
  5. Fix version synchronization with package version
  6. Consider async operations for better performance

🎯 Minor Suggestions

  • Add type hints instead of Any for better type safety
  • Consider adding configuration validation
  • Document resource requirements and performance impact
  • Add integration examples for popular backends

This is a solid foundation for OTLP integration, but the lack of tests and several security/reliability issues prevent it from being production-ready without these improvements.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants