Skip to content

[OTAGENT-823] bootstrap Dogtel extension#47532

Open
songy23 wants to merge 10 commits intomainfrom
yang.song/OTAGENT-824
Open

[OTAGENT-823] bootstrap Dogtel extension#47532
songy23 wants to merge 10 commits intomainfrom
yang.song/OTAGENT-824

Conversation

@songy23
Copy link
Copy Markdown
Member

@songy23 songy23 commented Mar 6, 2026

What does this PR do?

Adds standalone mode support to the otel-agent (DD_OTEL_STANDALONE=true) and introduces the dogtelextension OTel Collector extension for Datadog Agent functionalities.

Key changes:

  • dogtelextension (comp/otelcol/dogtelextension/): New OTel Collector extension providing a tagger gRPC server, host metadata submission, and secrets resolution when otel-agent runs without a core Datadog Agent.
  • Standalone/connected FX split (cmd/otel-agent/subcommands/run/command.go): Refactors otel-agent startup into commonAgentFxOptions + mode-specific standaloneAgentFxOptions / connectedAgentFxOptions. Standalone wires local hostname, real secrets backend, local tagger, and host metadata runner. Connected mode keeps remote hostname, remote tagger, and on-init config sync from the core agent.
  • K8s tag enrichment (comp/core/workloadmeta/collectors/catalog-otel/): New catalog-otel workloadmeta catalog (kubelet, containerd, docker, ECS, crio, podman). Added kubelet to OTEL_AGENT_TAGS. In standalone mode the infraattributes processor enriches spans/metrics/logs with K8s tags (kube_deployment, kube_namespace, pod_name, etc.) via the local tagger.

Motivation

Standalone Dogtel Agent

Describe how you validated your changes

  • Deployed otel-agent in standalone mode on a kind cluster with DD_OTEL_STANDALONE=true, DD_KUBERNETES_KUBELET_HOST=status.hostIP, and DD_KUBELET_TLS_VERIFY=false.
  • Sent a test trace with k8s.pod.uid; confirmed infraattributes processor enriched it with kube_deployment, kube_namespace, pod_name, kube_replica_set, pod_phase, and UST tags via the debug exporter.
  • Unit tests added for dogtelextension and fxutil.TestRun tests for both standalone and connected FX graphs.

Additional Notes

Deployments using infraattributes in standalone mode require:

  1. DD_KUBERNETES_KUBELET_HOST: status.hostIP env var
  2. DD_KUBELET_TLS_VERIFY=false (or kubelet CA cert)
  3. RBAC: get on nodes/proxy for the otel-agent ServiceAccount

@songy23 songy23 added this to the 7.78.0 milestone Mar 6, 2026
@songy23 songy23 added changelog/no-changelog No changelog entry needed qa/done QA done before merge and regressions are covered by tests team/opentelemetry-agent labels Mar 6, 2026
@github-actions github-actions bot added the long review PR is complex, plan time to review it label Mar 6, 2026
@dd-octo-sts dd-octo-sts bot added internal Identify a non-fork PR team/agent-configuration labels Mar 6, 2026
@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr bot commented Mar 6, 2026

Go Package Import Differences

Baseline: a6298b3
Comparison: 1659ddd

binaryosarchchange
otel-agentlinuxamd64
+83, -0
+code.cloudfoundry.org/garden
+code.cloudfoundry.org/garden/client
+code.cloudfoundry.org/garden/client/connection
+code.cloudfoundry.org/garden/routes
+code.cloudfoundry.org/garden/transport
+code.cloudfoundry.org/lager
+github.com/DataDog/datadog-agent/comp/core/hostname
+github.com/DataDog/datadog-agent/comp/core/hostname/hostnameimpl
+github.com/DataDog/datadog-agent/comp/core/tagger/collectors
+github.com/DataDog/datadog-agent/comp/core/tagger/common
+github.com/DataDog/datadog-agent/comp/core/tagger/fx
+github.com/DataDog/datadog-agent/comp/core/tagger/impl
+github.com/DataDog/datadog-agent/comp/core/tagger/k8s_metadata
+github.com/DataDog/datadog-agent/comp/core/tagger/mock
+github.com/DataDog/datadog-agent/comp/core/tagger/proto
+github.com/DataDog/datadog-agent/comp/core/tagger/server
+github.com/DataDog/datadog-agent/comp/core/tagger/subscriber
+github.com/DataDog/datadog-agent/comp/core/tagger/taglist
+github.com/DataDog/datadog-agent/comp/core/tagger/tagstore
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/baseimpl
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/catalog
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/fx
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/impl
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/impl/parse
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/program
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/proto
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/telemetry
+github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/catalog-otel
+github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/util
+github.com/DataDog/datadog-agent/comp/dogstatsd/packets
+github.com/DataDog/datadog-agent/comp/metadata/host
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl/hosttags
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl/utils
+github.com/DataDog/datadog-agent/comp/metadata/inventoryhost
+github.com/DataDog/datadog-agent/comp/metadata/inventoryhost/inventoryhostimpl
+github.com/DataDog/datadog-agent/comp/metadata/packagesigning/utils
+github.com/DataDog/datadog-agent/comp/metadata/resources
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/def
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/impl
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/impl/metrics
+github.com/DataDog/datadog-agent/pkg/collector/python
+github.com/DataDog/datadog-agent/pkg/gohai
+github.com/DataDog/datadog-agent/pkg/gohai/cpu
+github.com/DataDog/datadog-agent/pkg/gohai/filesystem
+github.com/DataDog/datadog-agent/pkg/gohai/memory
+github.com/DataDog/datadog-agent/pkg/gohai/network
+github.com/DataDog/datadog-agent/pkg/gohai/platform
+github.com/DataDog/datadog-agent/pkg/gohai/processes
+github.com/DataDog/datadog-agent/pkg/gohai/processes/gops
+github.com/DataDog/datadog-agent/pkg/gohai/utils
+github.com/DataDog/datadog-agent/pkg/gpu/tags
+github.com/DataDog/datadog-agent/pkg/logs/status
+github.com/DataDog/datadog-agent/pkg/logs/tailers
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/alibaba
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/cloudfoundry
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/ibm
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/kubernetes
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/oracle
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/tencent
+github.com/DataDog/datadog-agent/pkg/util/containers/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metadata
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/containerd
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/docker
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/ecsfargate
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/ecsmanagedinstances
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/kubelet
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/provider
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/system
+github.com/DataDog/datadog-agent/pkg/util/gpu
+github.com/DataDog/datadog-agent/pkg/util/kubernetes/cloudprovider
+github.com/DataDog/datadog-agent/pkg/util/kubernetes/clusterinfo
+github.com/DataDog/datadog-agent/pkg/util/net
+github.com/DataDog/datadog-agent/pkg/util/procfilestats
+github.com/DataDog/datadog-agent/pkg/util/size
+github.com/DataDog/datadog-agent/pkg/util/tags
+github.com/DataDog/datadog-agent/pkg/util/tmplvar
+github.com/DataDog/datadog-agent/pkg/util/trie
+github.com/bmizerany/pat
+github.com/tedsuo/rata
otel-agentlinuxarm64
+83, -0
+code.cloudfoundry.org/garden
+code.cloudfoundry.org/garden/client
+code.cloudfoundry.org/garden/client/connection
+code.cloudfoundry.org/garden/routes
+code.cloudfoundry.org/garden/transport
+code.cloudfoundry.org/lager
+github.com/DataDog/datadog-agent/comp/core/hostname
+github.com/DataDog/datadog-agent/comp/core/hostname/hostnameimpl
+github.com/DataDog/datadog-agent/comp/core/tagger/collectors
+github.com/DataDog/datadog-agent/comp/core/tagger/common
+github.com/DataDog/datadog-agent/comp/core/tagger/fx
+github.com/DataDog/datadog-agent/comp/core/tagger/impl
+github.com/DataDog/datadog-agent/comp/core/tagger/k8s_metadata
+github.com/DataDog/datadog-agent/comp/core/tagger/mock
+github.com/DataDog/datadog-agent/comp/core/tagger/proto
+github.com/DataDog/datadog-agent/comp/core/tagger/server
+github.com/DataDog/datadog-agent/comp/core/tagger/subscriber
+github.com/DataDog/datadog-agent/comp/core/tagger/taglist
+github.com/DataDog/datadog-agent/comp/core/tagger/tagstore
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/baseimpl
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/catalog
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/fx
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/impl
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/impl/parse
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/program
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/proto
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/telemetry
+github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/catalog-otel
+github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/util
+github.com/DataDog/datadog-agent/comp/dogstatsd/packets
+github.com/DataDog/datadog-agent/comp/metadata/host
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl/hosttags
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl/utils
+github.com/DataDog/datadog-agent/comp/metadata/inventoryhost
+github.com/DataDog/datadog-agent/comp/metadata/inventoryhost/inventoryhostimpl
+github.com/DataDog/datadog-agent/comp/metadata/packagesigning/utils
+github.com/DataDog/datadog-agent/comp/metadata/resources
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/def
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/impl
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/impl/metrics
+github.com/DataDog/datadog-agent/pkg/collector/python
+github.com/DataDog/datadog-agent/pkg/gohai
+github.com/DataDog/datadog-agent/pkg/gohai/cpu
+github.com/DataDog/datadog-agent/pkg/gohai/filesystem
+github.com/DataDog/datadog-agent/pkg/gohai/memory
+github.com/DataDog/datadog-agent/pkg/gohai/network
+github.com/DataDog/datadog-agent/pkg/gohai/platform
+github.com/DataDog/datadog-agent/pkg/gohai/processes
+github.com/DataDog/datadog-agent/pkg/gohai/processes/gops
+github.com/DataDog/datadog-agent/pkg/gohai/utils
+github.com/DataDog/datadog-agent/pkg/gpu/tags
+github.com/DataDog/datadog-agent/pkg/logs/status
+github.com/DataDog/datadog-agent/pkg/logs/tailers
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/alibaba
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/cloudfoundry
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/ibm
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/kubernetes
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/oracle
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/tencent
+github.com/DataDog/datadog-agent/pkg/util/containers/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metadata
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/containerd
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/docker
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/ecsfargate
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/ecsmanagedinstances
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/kubelet
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/provider
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/system
+github.com/DataDog/datadog-agent/pkg/util/gpu
+github.com/DataDog/datadog-agent/pkg/util/kubernetes/cloudprovider
+github.com/DataDog/datadog-agent/pkg/util/kubernetes/clusterinfo
+github.com/DataDog/datadog-agent/pkg/util/net
+github.com/DataDog/datadog-agent/pkg/util/procfilestats
+github.com/DataDog/datadog-agent/pkg/util/size
+github.com/DataDog/datadog-agent/pkg/util/tags
+github.com/DataDog/datadog-agent/pkg/util/tmplvar
+github.com/DataDog/datadog-agent/pkg/util/trie
+github.com/bmizerany/pat
+github.com/tedsuo/rata

@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr bot commented Mar 6, 2026

Files inventory check summary

File checks results against ancestor 5f07fe85:

Results for datadog-agent_7.78.0~devel.git.767.1659ddd.pipeline.103728784-1_amd64.deb:

No change detected

@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr bot commented Mar 6, 2026

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor a6298b3
📊 Static Quality Gates Dashboard
🔗 SQG Job

Successful checks

Info

Quality gate Change Size (prev → curr → max)
agent_rpm_arm64_fips +8.0 KiB (0.00% increase) 689.008 → 689.015 → 694.440
agent_suse_arm64_fips +8.0 KiB (0.00% increase) 689.008 → 689.015 → 694.440
29 successful checks with minimal change (< 2 KiB)
Quality gate Current Size
agent_deb_amd64 750.699 MiB
agent_deb_amd64_fips 707.631 MiB
agent_heroku_amd64 313.003 MiB
agent_msi 604.238 MiB
agent_rpm_amd64 750.682 MiB
agent_rpm_amd64_fips 707.614 MiB
agent_rpm_arm64 729.016 MiB
agent_suse_amd64 750.682 MiB
agent_suse_amd64_fips 707.614 MiB
agent_suse_arm64 729.016 MiB
docker_agent_amd64 811.017 MiB
docker_agent_arm64 814.164 MiB
docker_agent_jmx_amd64 1001.933 MiB
docker_agent_jmx_arm64 993.858 MiB
docker_cluster_agent_amd64 205.172 MiB
docker_cluster_agent_arm64 219.549 MiB
docker_cws_instrumentation_amd64 7.142 MiB
docker_cws_instrumentation_arm64 6.689 MiB
docker_dogstatsd_amd64 39.215 MiB
docker_dogstatsd_arm64 37.445 MiB
dogstatsd_deb_amd64 29.855 MiB
dogstatsd_deb_arm64 28.007 MiB
dogstatsd_rpm_amd64 29.855 MiB
dogstatsd_suse_amd64 29.855 MiB
iot_agent_deb_amd64 43.218 MiB
iot_agent_deb_arm64 40.273 MiB
iot_agent_deb_armhf 41.017 MiB
iot_agent_rpm_amd64 43.219 MiB
iot_agent_suse_amd64 43.219 MiB
On-wire sizes (compressed)
Quality gate Change Size (prev → curr → max)
agent_deb_amd64 -24.52 KiB (0.01% reduction) 174.707 → 174.683 → 177.700
agent_deb_amd64_fips +13.93 KiB (0.01% increase) 165.265 → 165.278 → 172.230
agent_heroku_amd64 -4.3 KiB (0.01% reduction) 74.945 → 74.941 → 79.970
agent_msi -16.0 KiB (0.01% reduction) 138.246 → 138.230 → 146.220
agent_rpm_amd64 -15.89 KiB (0.01% reduction) 177.585 → 177.569 → 180.780
agent_rpm_amd64_fips +41.13 KiB (0.02% increase) 167.313 → 167.353 → 173.370
agent_rpm_arm64 +30.64 KiB (0.02% increase) 159.535 → 159.565 → 161.610
agent_rpm_arm64_fips +8.46 KiB (0.01% increase) 151.251 → 151.259 → 155.910
agent_suse_amd64 -15.89 KiB (0.01% reduction) 177.585 → 177.569 → 180.780
agent_suse_amd64_fips +41.13 KiB (0.02% increase) 167.313 → 167.353 → 173.370
agent_suse_arm64 +30.64 KiB (0.02% increase) 159.535 → 159.565 → 161.610
agent_suse_arm64_fips +8.46 KiB (0.01% increase) 151.251 → 151.259 → 155.910
docker_agent_amd64 neutral 267.882 MiB → 271.240
docker_agent_arm64 +5.8 KiB (0.00% increase) 255.086 → 255.092 → 259.800
docker_agent_jmx_amd64 neutral 336.527 MiB → 339.870
docker_agent_jmx_arm64 +7.28 KiB (0.00% increase) 319.726 → 319.733 → 324.390
docker_cluster_agent_amd64 neutral 71.886 MiB → 72.920
docker_cluster_agent_arm64 +3.17 KiB (0.00% increase) 67.458 → 67.461 → 68.220
docker_cws_instrumentation_amd64 neutral 2.999 MiB → 3.330
docker_cws_instrumentation_arm64 neutral 2.729 MiB → 3.090
docker_dogstatsd_amd64 neutral 15.161 MiB → 15.820
docker_dogstatsd_arm64 neutral 14.479 MiB → 14.830
dogstatsd_deb_amd64 neutral 7.887 MiB → 8.790
dogstatsd_deb_arm64 neutral 6.773 MiB → 7.710
dogstatsd_rpm_amd64 neutral 7.897 MiB → 8.800
dogstatsd_suse_amd64 neutral 7.897 MiB → 8.800
iot_agent_deb_amd64 neutral 11.387 MiB → 12.040
iot_agent_deb_arm64 -2.92 KiB (0.03% reduction) 9.695 → 9.692 → 10.450
iot_agent_deb_armhf neutral 9.929 MiB → 10.620
iot_agent_rpm_amd64 neutral 11.404 MiB → 12.060
iot_agent_suse_amd64 neutral 11.404 MiB → 12.060

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da bot commented Mar 6, 2026

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: f3b3333d-083a-4de6-af57-7df75a4a3ec3

Baseline: a6298b3
Comparison: 1659ddd
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
docker_containers_cpu % cpu utilization -4.40 [-7.39, -1.40] 1 Logs

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
quality_gate_logs % cpu utilization +1.47 [-0.16, +3.10] 1 Logs bounds checks dashboard
ddot_metrics_sum_cumulative memory utilization +1.20 [+1.06, +1.34] 1 Logs
quality_gate_metrics_logs memory utilization +0.64 [+0.39, +0.89] 1 Logs bounds checks dashboard
ddot_metrics_sum_cumulativetodelta_exporter memory utilization +0.47 [+0.24, +0.69] 1 Logs
otlp_ingest_metrics memory utilization +0.37 [+0.21, +0.53] 1 Logs
quality_gate_idle_all_features memory utilization +0.31 [+0.27, +0.35] 1 Logs bounds checks dashboard
ddot_logs memory utilization +0.23 [+0.17, +0.29] 1 Logs
uds_dogstatsd_20mb_12k_contexts_20_senders memory utilization +0.12 [+0.07, +0.18] 1 Logs
file_to_blackhole_500ms_latency egress throughput +0.03 [-0.37, +0.42] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput +0.00 [-0.11, +0.11] 1 Logs
uds_dogstatsd_to_api ingress throughput +0.00 [-0.19, +0.20] 1 Logs
file_tree memory utilization -0.01 [-0.06, +0.05] 1 Logs
uds_dogstatsd_to_api_v3 ingress throughput -0.01 [-0.20, +0.19] 1 Logs
ddot_metrics memory utilization -0.02 [-0.20, +0.15] 1 Logs
ddot_metrics_sum_delta memory utilization -0.03 [-0.21, +0.14] 1 Logs
file_to_blackhole_100ms_latency egress throughput -0.03 [-0.12, +0.05] 1 Logs
file_to_blackhole_0ms_latency egress throughput -0.04 [-0.55, +0.47] 1 Logs
file_to_blackhole_1000ms_latency egress throughput -0.06 [-0.48, +0.36] 1 Logs
quality_gate_idle memory utilization -0.21 [-0.26, -0.16] 1 Logs bounds checks dashboard
docker_containers_memory memory utilization -0.36 [-0.43, -0.28] 1 Logs
tcp_syslog_to_blackhole ingress throughput -0.67 [-0.81, -0.52] 1 Logs
otlp_ingest_logs memory utilization -0.86 [-0.97, -0.75] 1 Logs
docker_containers_cpu % cpu utilization -4.40 [-7.39, -1.40] 1 Logs

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed observed_value links
docker_containers_cpu simple_check_run 10/10 719 ≥ 26
docker_containers_memory memory_usage 10/10 276.81MiB ≤ 370MiB
docker_containers_memory simple_check_run 10/10 706 ≥ 26
file_to_blackhole_0ms_latency memory_usage 10/10 0.19GiB ≤ 1.20GiB
file_to_blackhole_0ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_1000ms_latency memory_usage 10/10 0.23GiB ≤ 1.20GiB
file_to_blackhole_1000ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_100ms_latency memory_usage 10/10 0.20GiB ≤ 1.20GiB
file_to_blackhole_100ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_500ms_latency memory_usage 10/10 0.21GiB ≤ 1.20GiB
file_to_blackhole_500ms_latency missed_bytes 10/10 0B = 0B
quality_gate_idle intake_connections 10/10 3 = 3 bounds checks dashboard
quality_gate_idle memory_usage 10/10 173.44MiB ≤ 175MiB bounds checks dashboard
quality_gate_idle_all_features intake_connections 10/10 2 ≤ 3 bounds checks dashboard
quality_gate_idle_all_features memory_usage 10/10 494.94MiB ≤ 550MiB bounds checks dashboard
quality_gate_logs intake_connections 10/10 3 ≤ 6 bounds checks dashboard
quality_gate_logs memory_usage 10/10 203.59MiB ≤ 220MiB bounds checks dashboard
quality_gate_logs missed_bytes 10/10 0B = 0B bounds checks dashboard
quality_gate_metrics_logs cpu_usage 10/10 367.43 ≤ 2000 bounds checks dashboard
quality_gate_metrics_logs intake_connections 10/10 3 ≤ 6 bounds checks dashboard
quality_gate_metrics_logs memory_usage 10/10 407.76MiB ≤ 475MiB bounds checks dashboard
quality_gate_metrics_logs missed_bytes 10/10 0B = 0B bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

CI Pass/Fail Decision

Passed. All Quality Gates passed.

  • quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.

@songy23 songy23 force-pushed the yang.song/OTAGENT-824 branch from 0f93f0b to 5d620f6 Compare March 9, 2026 16:48
@dd-octo-sts dd-octo-sts bot added team/container-platform The Container Platform Team team/agent-devx labels Mar 10, 2026
@songy23 songy23 requested a review from truthbk March 10, 2026 20:35
Introduces the dogtelextension OTel Collector extension and refactors
otel-agent startup to support standalone mode (DD_OTEL_STANDALONE=true),
enabling the otel-agent to run independently without a core Datadog Agent.

Key changes:

- dogtelextension (comp/otelcol/dogtelextension): New OTel Collector
  extension providing a tagger gRPC server, host metadata submission,
  and secrets resolution for standalone mode.

- Standalone/connected FX split (cmd/otel-agent/subcommands/run):
  Refactors otel-agent startup into commonAgentFxOptions plus mode-
  specific standaloneAgentFxOptions / connectedAgentFxOptions. Standalone
  mode wires local hostname, real secrets backend, local tagger, host
  metadata runner, and disables on-init config sync. Connected mode
  keeps remote hostname, remote tagger, and core-agent config sync.

- K8s tag enrichment (comp/core/workloadmeta/collectors/catalog-otel):
  New catalog-otel workloadmeta catalog (kubelet, containerd, docker,
  ECS, crio, podman) compiled into otel-agent via the new kubelet build
  tag. In standalone mode the infraattributes processor enriches spans,
  metrics, and logs with K8s tags (kube_deployment, kube_namespace,
  pod_name, etc.) via the local tagger.

Deployments require DD_KUBERNETES_KUBELET_HOST=status.hostIP,
DD_KUBELET_TLS_VERIFY=false (or CA cert), and nodes/proxy RBAC on the
otel-agent ServiceAccount for K8s tag enrichment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@songy23 songy23 force-pushed the yang.song/OTAGENT-824 branch from 0dcb28d to 3d7b219 Compare March 10, 2026 21:12
Copy link
Copy Markdown
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean bootstrap! Also love how you were able to bring in the best of both worlds with fx + actual otel extension interfaces; and that resolves the extension configuration issue very cleanly. This is awesome.

We have to talk about what the otel-agent should default to, but this is a great start.

)
}

if acfg.GetBool("otel_standalone") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I have some doubts with this: should we instead consider a check on otel_bundled? Or !acfg.GetBool("otel_standalone")?

On one hand this is better because it's backward compatible with our operator and helm charts. On the other it's not ideal because we'd have to set an env var when deploying with the otel operator/helm. We really do want to make a strong attempt to minimize the number of steps our OTel customers need to take on tooling we don't have full control over. Let's discuss this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customers would have to set env vars in the otel operator/helm already, e.g. DD_OTELCOLLECTOR_ENABLED. Setting one more env var is probably fine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should optimize to minimize the number of options a customer needs to set on the OpenTelemetry operator/helm chart. I feel like we can get away with a lot more of that transparently on the DD side.

I'm fine with merging this as-is; but I also think there's chances we want to revisit this specifically.

songy23 and others added 2 commits March 13, 2026 15:54
…andalone mode

- Apply dogtelextension settings to DD agent pkgconfig only when
  otel_standalone=true; connected mode leaves core agent config untouched.
- Make EnableMetadataCollection a *bool (like KubeletTLSVerify) so absence
  preserves the agent default rather than forcing false.
- Add MetadataInterval default (1800 s) to comment.
- Gate standalone block with pkgconfig.GetBool("otel_standalone").
- Add TestDogtelExtensionConfig_ConnectedModeIgnored to assert dogtelextension
  fields are no-ops in connected mode.
- Tests use DD_OTEL_STANDALONE=true env var for standalone test cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@songy23 songy23 force-pushed the yang.song/OTAGENT-824 branch from 60768d2 to 4c5b322 Compare March 13, 2026 20:56
@songy23 songy23 marked this pull request as ready for review March 16, 2026 08:36
@songy23 songy23 requested a review from a team as a code owner March 16, 2026 08:36
@songy23 songy23 requested review from a team as code owners March 16, 2026 08:36
@songy23 songy23 added the ask-review Ask required teams to review this PR label Mar 16, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c5b322a08

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +314 to +318
for name, val := range extensions {
if !strings.HasPrefix(name, "dogtel") {
continue
}
extcfg := &dogtelextensionimpl.Config{}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pick dogtel extension config deterministically

This loop returns the first dogtel* entry encountered in a Go map, but map iteration order is randomized, so configs with multiple dogtel instances (for example dogtel plus dogtel/custom) can apply different overrides across runs. In standalone mode that can silently switch hostname/secrets/kubelet settings to the wrong extension instance, especially when only one instance is actually enabled in service.extensions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a singleton so we should be OK, but this is a potentially true concern for bad manual configuration. Maybe we should log something at the DEBUG level to reflect explicitly what extension instance defined in the config is being used.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be at most one dogtel extension in the config, otherwise it leads to undeterministic behavior. I added a check to error out when there are multiple dogtel extensions

Copy link
Copy Markdown
Contributor

@jeremy-hanna jeremy-hanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for agent-runtime owned files

Copy link
Copy Markdown
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Added a couple of nits you can feel free to ignore. I do think for the actual standalone vs connected default path we may have to make some changes, but we can do that later once we take on the deployment question more specifically. At that point we'll have a better understanding of what's better.

Comment on lines +314 to +318
for name, val := range extensions {
if !strings.HasPrefix(name, "dogtel") {
continue
}
extcfg := &dogtelextensionimpl.Config{}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a singleton so we should be OK, but this is a potentially true concern for bad manual configuration. Maybe we should log something at the DEBUG level to reflect explicitly what extension instance defined in the config is being used.

)
}

if acfg.GetBool("otel_standalone") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should optimize to minimize the number of options a customer needs to set on the OpenTelemetry operator/helm chart. I feel like we can get away with a lot more of that transparently on the DD side.

I'm fine with merging this as-is; but I also think there's chances we want to revisit this specifically.

songy23 and others added 2 commits March 20, 2026 14:26
…er stream subscribers

- getDogtelExtensionConfig now returns an error when multiple dogtel*
  extension entries are found instead of silently picking one
- stopTaggerServer replaces unbounded GracefulStop() with a 5-second
  timeout that falls back to Stop(), preventing long-lived
  TaggerStreamEntities subscribers from blocking otel-agent termination
- Add unit tests for both changes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@songy23
Copy link
Copy Markdown
Member Author

songy23 commented Mar 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 90662a7436

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

songy23 and others added 3 commits March 30, 2026 13:26
…ders list

Setting metadata_interval in the dogtel extension config was replacing
metadata_providers wholesale with a single {name: host} entry, silently
dropping any other providers (e.g. "resources") configured in datadog.yaml.

Read the existing providers first, update the host entry in place (or
append it if absent), then write back the merged list. Handle both
map[string]interface{} and the map[interface{}]interface{} type that YAML
v2 produces for maps inside sequences.

Add a regression test that pre-seeds a "resources" provider in datadog.yaml
and asserts it survives alongside the updated host interval.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add TestFxRun_NoDatadogExporter_Standalone and its config fixture to
cover the case where the otel-agent runs in standalone mode with no
datadog exporter in the pipeline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ask-review Ask required teams to review this PR changelog/no-changelog No changelog entry needed internal Identify a non-fork PR long review PR is complex, plan time to review it qa/done QA done before merge and regressions are covered by tests team/agent-configuration team/agent-devx team/agent-runtimes team/container-platform The Container Platform Team team/opentelemetry-agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants