Add CSI registry allow list check to admission webhook by knusbaum · Pull Request #48011 · DataDog/datadog-agent

knusbaum · 2026-03-18T20:47:30Z

What Does This Do

Adds a registry allow list for CSI-based APM library injection. When the CSI injection mode is used, the admission controller webhook now checks the configured registry allow list before adding CSI volumes to pods. If a library's registry is not permitted, the webhook skips the CSI volume entirely — the pod starts cleanly without blocked volumes.

This complements the CSI driver's own registry check (see DataDog/datadog-csi-driver#36). The webhook provides clean UX (no unnecessary volumes added), while the CSI driver acts as a security backstop (returns success on rejection so pods aren't blocked).

How to Test

The Helm chart (DataDog/helm-charts, branch knusbaum/csi-driver-registry-allow-list) passes datadog-csi-driver.apm.registryAllowList to the cluster-agent as DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_CSI_REGISTRY_ALLOW_LIST.

I have several injector-dev scenarios, yet to be published

Related PRs

Jira

INPLAT-881
INPLAT-1004

When using CSI-based library injection, the admission webhook now checks the registry allow list before adding CSI volumes to pods. If a library's registry is not in the allow list, the webhook skips adding the CSI volume entirely. This provides defense in depth alongside the CSI driver's own registry check — the webhook gives clean UX (no unnecessary volumes), while the CSI driver acts as a security backstop. The config key admission_controller.auto_instrumentation.csi_registry_allow_list is set via the Helm chart from the same value used for the CSI driver's DD_REGISTRY_ALLOW_LIST.

agent-platform-auto-pr · 2026-03-18T21:11:56Z

Files inventory check summary

File checks results against ancestor 1eed9e23:

Results for datadog-agent_7.79.0~devel.git.285.683994c.pipeline.105039631-1_amd64.deb:

No change detected

agent-platform-auto-pr · 2026-03-18T21:16:18Z

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 1eed9e2
📊 Static Quality Gates Dashboard
🔗 SQG Job
SOME SIZE DELTAS ARE N/A (ANCESTOR METRICS NOT YET AVAILABLE). RETRY JOB

Successful checks

Info

	Quality gate	Change	Size (prev → curr → max)
✅	agent_deb_amd64	N/A	N/A → 753.086 → 753.380
✅	agent_deb_amd64_fips	N/A	N/A → 710.025 → 713.900
✅	agent_heroku_amd64	N/A	N/A → 313.318 → 320.580
✅	agent_msi	N/A	N/A → 604.879 → 651.440
✅	agent_rpm_amd64	N/A	N/A → 753.070 → 753.350
✅	agent_rpm_amd64_fips	N/A	N/A → 710.009 → 713.880
✅	agent_rpm_arm64	N/A	N/A → 731.476 → 735.290
✅	agent_rpm_arm64_fips	N/A	N/A → 691.452 → 696.840
✅	agent_suse_amd64	N/A	N/A → 753.070 → 753.350
✅	agent_suse_amd64_fips	N/A	N/A → 710.009 → 713.880
✅	agent_suse_arm64	N/A	N/A → 731.476 → 735.290
✅	agent_suse_arm64_fips	N/A	N/A → 691.452 → 696.840
✅	docker_agent_amd64	N/A	N/A → 813.385 → 815.700
✅	docker_agent_arm64	N/A	N/A → 816.565 → 821.970
✅	docker_agent_jmx_amd64	N/A	N/A → 1004.300 → 1006.580
✅	docker_agent_jmx_arm64	N/A	N/A → 996.259 → 1001.570
✅	docker_cluster_agent_amd64	N/A	N/A → 203.945 → 206.270
✅	docker_cluster_agent_arm64	N/A	N/A → 218.419 → 220.000
✅	docker_cws_instrumentation_amd64	N/A	N/A → 7.142 → 7.180
✅	docker_cws_instrumentation_arm64	N/A	N/A → 6.689 → 6.920
✅	docker_dogstatsd_amd64	N/A	N/A → 39.238 → 39.380
✅	docker_dogstatsd_arm64	N/A	N/A → 37.445 → 37.940
✅	dogstatsd_deb_amd64	N/A	N/A → 29.881 → 30.610
✅	dogstatsd_deb_arm64	N/A	N/A → 28.034 → 29.110
✅	dogstatsd_rpm_amd64	N/A	N/A → 29.881 → 30.610
✅	dogstatsd_suse_amd64	N/A	N/A → 29.881 → 30.610
✅	iot_agent_deb_amd64	N/A	N/A → 43.289 → 43.290
✅	iot_agent_deb_arm64	N/A	N/A → 40.340 → 40.920
✅	iot_agent_deb_armhf	N/A	N/A → 41.088 → 41.100
✅	iot_agent_rpm_amd64	N/A	N/A → 43.290 → 43.290
✅	iot_agent_suse_amd64	N/A	N/A → 43.290 → 43.290

On-wire sizes (compressed)

	Quality gate	Change	Size (prev → curr → max)
✅	agent_deb_amd64	N/A	N/A → 174.763 → 178.360
✅	agent_deb_amd64_fips	N/A	N/A → 165.383 → 172.790
✅	agent_heroku_amd64	N/A	N/A → 74.998 → 79.970
✅	agent_msi	N/A	N/A → 138.359 → 146.220
✅	agent_rpm_amd64	N/A	N/A → 177.621 → 181.830
✅	agent_rpm_amd64_fips	N/A	N/A → 167.669 → 173.370
✅	agent_rpm_arm64	N/A	N/A → 159.551 → 163.060
✅	agent_rpm_arm64_fips	N/A	N/A → 151.442 → 156.170
✅	agent_suse_amd64	N/A	N/A → 177.621 → 181.830
✅	agent_suse_amd64_fips	N/A	N/A → 167.669 → 173.370
✅	agent_suse_arm64	N/A	N/A → 159.551 → 163.060
✅	agent_suse_arm64_fips	N/A	N/A → 151.442 → 156.170
✅	docker_agent_amd64	N/A	N/A → 268.195 → 272.480
✅	docker_agent_arm64	N/A	N/A → 255.397 → 261.060
✅	docker_agent_jmx_amd64	N/A	N/A → 336.850 → 341.100
✅	docker_agent_jmx_arm64	N/A	N/A → 320.029 → 325.620
✅	docker_cluster_agent_amd64	N/A	N/A → 71.374 → 72.920
✅	docker_cluster_agent_arm64	N/A	N/A → 67.011 → 68.220
✅	docker_cws_instrumentation_amd64	N/A	N/A → 2.999 → 3.330
✅	docker_cws_instrumentation_arm64	N/A	N/A → 2.729 → 3.090
✅	docker_dogstatsd_amd64	N/A	N/A → 15.175 → 15.820
✅	docker_dogstatsd_arm64	N/A	N/A → 14.488 → 14.830
✅	dogstatsd_deb_amd64	N/A	N/A → 7.892 → 8.790
✅	dogstatsd_deb_arm64	N/A	N/A → 6.780 → 7.710
✅	dogstatsd_rpm_amd64	N/A	N/A → 7.904 → 8.800
✅	dogstatsd_suse_amd64	N/A	N/A → 7.904 → 8.800
✅	iot_agent_deb_amd64	N/A	N/A → 11.405 → 12.040
✅	iot_agent_deb_arm64	N/A	N/A → 9.704 → 10.450
✅	iot_agent_deb_armhf	N/A	N/A → 9.940 → 10.620
✅	iot_agent_rpm_amd64	N/A	N/A → 11.423 → 12.060
✅	iot_agent_suse_amd64	N/A	N/A → 11.423 → 12.060

Use ParseEnvAsStringSlice (the established pattern in the codebase) to register an env var transformer that splits comma-separated values. This replaces the ad-hoc splitStringSlice wrapper and is consistent with how other string slice configs handle the Viper limitation (e.g. apm_config.features, apm_config.ignore_resources).

cit-pr-commenter-54b7da · 2026-03-18T22:22:24Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 8b2746d-57c2-408e-b1c7-fef5a8c2ff5a

Baseline: d3d4c0e
Comparison: 683994c
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	docker_containers_cpu	% cpu utilization	+4.31	[+1.25, +7.37]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	docker_containers_cpu	% cpu utilization	+4.31	[+1.25, +7.37]	1	Logs
➖	quality_gate_logs	% cpu utilization	+0.71	[-0.89, +2.30]	1	Logs bounds checks dashboard
➖	quality_gate_idle	memory utilization	+0.56	[+0.51, +0.61]	1	Logs bounds checks dashboard
➖	file_tree	memory utilization	+0.35	[+0.29, +0.40]	1	Logs
➖	ddot_metrics_sum_cumulative	memory utilization	+0.22	[+0.07, +0.36]	1	Logs
➖	otlp_ingest_metrics	memory utilization	+0.08	[-0.08, +0.24]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	+0.06	[-0.46, +0.58]	1	Logs
➖	file_to_blackhole_1000ms_latency	egress throughput	+0.03	[-0.40, +0.46]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	+0.01	[-0.20, +0.21]	1	Logs
➖	uds_dogstatsd_to_api_v3	ingress throughput	+0.00	[-0.20, +0.20]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	-0.00	[-0.11, +0.11]	1	Logs
➖	quality_gate_idle_all_features	memory utilization	-0.02	[-0.06, +0.02]	1	Logs bounds checks dashboard
➖	file_to_blackhole_500ms_latency	egress throughput	-0.06	[-0.46, +0.35]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	-0.09	[-0.17, -0.01]	1	Logs
➖	docker_containers_memory	memory utilization	-0.18	[-0.25, -0.11]	1	Logs
➖	ddot_metrics_sum_delta	memory utilization	-0.29	[-0.45, -0.12]	1	Logs
➖	ddot_metrics_sum_cumulativetodelta_exporter	memory utilization	-0.30	[-0.53, -0.08]	1	Logs
➖	otlp_ingest_logs	memory utilization	-0.40	[-0.51, -0.29]	1	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	-0.51	[-0.66, -0.35]	1	Logs
➖	ddot_logs	memory utilization	-0.57	[-0.63, -0.50]	1	Logs
➖	ddot_metrics	memory utilization	-0.60	[-0.77, -0.42]	1	Logs
➖	uds_dogstatsd_20mb_12k_contexts_20_senders	memory utilization	-0.63	[-0.69, -0.57]	1	Logs
➖	quality_gate_metrics_logs	memory utilization	-0.77	[-1.01, -0.53]	1	Logs bounds checks dashboard

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	docker_containers_cpu	simple_check_run	10/10	674 ≥ 26
✅	docker_containers_memory	memory_usage	10/10	272.38MiB ≤ 370MiB
✅	docker_containers_memory	simple_check_run	10/10	693 ≥ 26
✅	file_to_blackhole_0ms_latency	memory_usage	10/10	0.19GiB ≤ 1.20GiB
✅	file_to_blackhole_0ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10	0.23GiB ≤ 1.20GiB
✅	file_to_blackhole_1000ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_100ms_latency	memory_usage	10/10	0.20GiB ≤ 1.20GiB
✅	file_to_blackhole_100ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_500ms_latency	memory_usage	10/10	0.21GiB ≤ 1.20GiB
✅	file_to_blackhole_500ms_latency	missed_bytes	10/10	0B = 0B
✅	quality_gate_idle	intake_connections	10/10	3 = 3	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	174.81MiB ≤ 175MiB	bounds checks dashboard
✅	quality_gate_idle_all_features	intake_connections	10/10	3 = 3	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	494.24MiB ≤ 550MiB	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10	4 ≤ 6	bounds checks dashboard
✅	quality_gate_logs	memory_usage	10/10	205.27MiB ≤ 220MiB	bounds checks dashboard
✅	quality_gate_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard
✅	quality_gate_metrics_logs	cpu_usage	10/10	364.58 ≤ 2000	bounds checks dashboard
✅	quality_gate_metrics_logs	intake_connections	10/10	3 ≤ 6	bounds checks dashboard
✅	quality_gate_metrics_logs	memory_usage	10/10	417.75MiB ≤ 475MiB	bounds checks dashboard
✅	quality_gate_metrics_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.

The allow-list check now lives in the admission webhook (mutator.go) and covers all injection modes (init-container, image-volume, CSI), not just CSI. Rename the config key accordingly.

Test three cases: - Empty allow list permits injection from any registry (backward compat) - Registry in allow list permits injection - Registry not in allow list blocks injection and sets error annotation

…-allow-list-webhook

Replace global.containerRegistryAllowList approach with setting DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_CONTAINER_REGISTRY_ALLOW_LIST directly via clusterAgent.env, so the test works with the current helm chart version without needing the helm-charts PR to merge first. Two test cases: - TestRegistryAllowListBlocked: allow list = fake.registry.invalid, injection is blocked and error annotation is set - TestRegistryAllowListAllowed: allow list = registry.datadoghq.com (the injector's registry), injection proceeds and traces arrive

…-allow-list-webhook

iamluc · 2026-03-27T08:59:37Z

test/new-e2e/tests/ssi/ssi_test.go

 	})
 }
+
+func (v *ssiSuite) TestRegistryAllowListBlocked() {


It would be great to merge both scenarios into a single test with two sub-tests, since setting up the stack takes about 5 minutes per scenario.

Or maybe InjectionAllowedByAllowList could be tested as part of another existing scenario. I know it’' not perfect, but I think performance matters, and we should try to limit the number of scenarios.

Ahh, yeah, for sure. I didn't realize an UpdateEnv caused a full stack setup.

These tests should me mergeable, let me do that.

Deploy two apps in the same cluster with allow list = registry.datadoghq.com: - registry-allow-list-allowed: uses default injector, injection proceeds - registry-allow-list-blocked: pod annotation overrides injector image to fake.registry.invalid (not in allow list), injection is blocked This avoids a second UpdateEnv call (and second cluster setup) by using the admission.datadoghq.com/apm-inject.custom-image annotation to point one pod at an injector registry that is not in the allow list.

The previous check only validated the injector image registry. A user could bypass the allow list by annotating a pod with admission.datadoghq.com/python-lib.custom-image pointing to an arbitrary registry. Now InjectAPMLibraries checks every library's registry against the allow list in addition to the injector's registry. Adds a unit test for the library registry case and a third e2e scenario (LibraryRegistryBlockedByAllowList) that uses a python-lib.custom-image annotation pointing to fake.registry.invalid to verify the check fires.

knusbaum added the team/injection-platform label Mar 18, 2026

knusbaum added this to the 7.78.0 milestone Mar 18, 2026

knusbaum force-pushed the knusbaum/csi-registry-allow-list-webhook branch from 6dfd81c to 321c129 Compare March 18, 2026 21:24

dd-octo-sts bot added internal Identify a non-fork PR team/container-platform The Container Platform Team team/agent-configuration labels Mar 18, 2026

github-actions bot added the medium review PR review might take time label Mar 18, 2026

knusbaum force-pushed the knusbaum/csi-registry-allow-list-webhook branch from 321c129 to f46dad4 Compare March 18, 2026 21:35

knusbaum added 6 commits March 23, 2026 11:38

Rename csi_registry_allow_list to container_registry_allow_list

67f3f16

The allow-list check now lives in the admission webhook (mutator.go) and covers all injection modes (init-container, image-volume, CSI), not just CSI. Rename the config key accordingly.

Add unit tests for registry allow list enforcement in InjectAPMLibraries

89c6ff5

Test three cases: - Empty allow list permits injection from any registry (backward compat) - Registry in allow list permits injection - Registry not in allow list blocks injection and sets error annotation

Merge remote-tracking branch 'origin/main' into knusbaum/csi-registry…

9f1b8ab

…-allow-list-webhook

Add release note for SSI registry allow list feature

46b9454

Remove unnecessary GRADUAL_ROLLOUT_ENABLED override

c70ef79

knusbaum added the qa/done QA done before merge and regressions are covered by tests label Mar 26, 2026

knusbaum added 2 commits March 26, 2026 20:38

Merge remote-tracking branch 'origin/main' into knusbaum/csi-registry…

77d8641

…-allow-list-webhook

Fix gofmt formatting

10e5a41

iamluc reviewed Mar 27, 2026

View reviewed changes

knusbaum and others added 5 commits March 27, 2026 12:57

Merge registry allow list test scenarios into a single test

0fde2bd

Merge branch 'main' into knusbaum/csi-registry-allow-list-webhook

a3623d2

Merge branch 'main' into knusbaum/csi-registry-allow-list-webhook

683994c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CSI registry allow list check to admission webhook#48011

Add CSI registry allow list check to admission webhook#48011
knusbaum wants to merge 15 commits intomainfrom
knusbaum/csi-registry-allow-list-webhook

knusbaum commented Mar 18, 2026 •

edited

Loading

Uh oh!

agent-platform-auto-pr bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

agent-platform-auto-pr bot commented Mar 18, 2026 •

edited

Loading

Info

Uh oh!

cit-pr-commenter-54b7da bot commented Mar 18, 2026 •

edited

Loading

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Uh oh!

iamluc Mar 27, 2026 •

edited

Loading

Uh oh!

knusbaum Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

knusbaum commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

How to Test

Related PRs

Jira

Uh oh!

agent-platform-auto-pr bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Files inventory check summary

Results for datadog-agent_7.79.0~devel.git.285.683994c.pipeline.105039631-1_amd64.deb:

Uh oh!

agent-platform-auto-pr bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Static quality checks

Info

Uh oh!

cit-pr-commenter-54b7da bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

CI Pass/Fail Decision

Uh oh!

iamluc Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knusbaum Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

knusbaum commented Mar 18, 2026 •

edited

Loading

agent-platform-auto-pr bot commented Mar 18, 2026 •

edited

Loading

agent-platform-auto-pr bot commented Mar 18, 2026 •

edited

Loading

cit-pr-commenter-54b7da bot commented Mar 18, 2026 •

edited

Loading

iamluc Mar 27, 2026 •

edited

Loading