fix: resolve AgentRuntime by Deployment name and inject trace env vars into agent containers#245
Conversation
5a3408e to
b53162c
Compare
Two fixes for the AuthBridge webhook injection: 1. ReadAgentRuntimeOverrides now strips the pod-template-hash suffix from the ReplicaSet name to match the Deployment name in targetRef. Without this, the webhook couldn't find the AgentRuntime CR because it received "myapp-7d4f8b9c5" but targetRef.name is "myapp". 2. Inject OTEL trace env vars (OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_PROTOCOL, OTEL_TRACES_SAMPLER_ARG) from AgentRuntime spec.trace into the user's agent containers, not the sidecar. These configure the agent's own telemetry (e.g., MLflow, LangChain). Existing env vars are not overwritten — developer values take precedence. Sidecar containers are skipped. Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
b53162c to
7596b7d
Compare
cwiklik
left a comment
There was a problem hiding this comment.
Review Summary
Clean, well-scoped PR fixing two related issues: ReplicaSet-to-Deployment name resolution and OTEL trace env var injection into user containers. The code is correct, no-overwrite semantics are sound, and test coverage is thorough. The exact-match-first approach for name resolution handles edge cases well.
Areas reviewed: Go, Tests
Commits: 1 commit, signed-off: yes, proper Assisted-By trailer
CI status: DCO passing
Excellent PR description with clear design decisions and a completed test plan.
| // Derive the Deployment name by stripping the pod-template-hash suffix. | ||
| // ReplicaSet names follow the pattern "<deployment>-<pod-template-hash>". | ||
| deploymentName := workloadName | ||
| if idx := strings.LastIndex(workloadName, "-"); idx > 0 { |
There was a problem hiding this comment.
suggestion: The LastIndex stripping is correct for the common case and the exact-match-first approach handles ambiguity well. One edge case worth noting: a Deployment literally named with a hash-like suffix (e.g. my-agent-7d4f8b9c5) would be ambiguous, but exact match takes priority so it still works. Might be worth a brief comment noting this precedence for future readers.
| // Inject OTEL trace env vars into the user's (non-sidecar) containers. | ||
| // These come from the AgentRuntime spec.trace overrides and configure | ||
| // the agent's own telemetry (e.g., MLflow, LangChain, OpenTelemetry SDK). | ||
| if resolved != nil { |
There was a problem hiding this comment.
suggestion: Trace env vars are only injected when PerWorkloadConfigResolution is enabled (since resolved is nil on the legacy path). This seems intentional — a brief comment noting "trace injection requires the resolved config path" would clarify this for future maintainers.
| // injectTraceEnvVars adds OTEL trace env vars to all user (non-sidecar) | ||
| // containers. Existing env vars are not overwritten — if the developer already | ||
| // set OTEL_EXPORTER_OTLP_ENDPOINT, the webhook respects their value. | ||
| func injectTraceEnvVars(podSpec *corev1.PodSpec, resolved *ResolvedConfig) { |
There was a problem hiding this comment.
nit: The log "vars", len(traceEnv) always logs the total count of candidate trace vars, not the number actually injected (some may be skipped due to existing vars). Consider logging the injected count, or rename to "candidateVars".
| EnvoyProxyContainerName: true, | ||
| ProxyInitContainerName: true, | ||
| SpiffeHelperContainerName: true, | ||
| ClientRegistrationContainerName: true, |
There was a problem hiding this comment.
👍 Good approach — centralizing sidecar names in a map with a MAINTENANCE comment is clean and maintainable. The no-overwrite semantics are well-implemented.
| Type: agentv1alpha1.RuntimeTypeAgent, | ||
| TargetRef: agentv1alpha1.TargetRef{ | ||
| APIVersion: "apps/v1", | ||
| Kind: "Deployment", |
There was a problem hiding this comment.
👍 Thorough test coverage for both single-hyphen and multi-hyphen deployment names. Good test structure.
Summary
Two fixes for AuthBridge webhook injection:
ReplicaSet name resolution:
ReadAgentRuntimeOverridesnow strips the pod-template-hash suffix from the ReplicaSet name to match the Deployment name intargetRef. Without this, the webhook couldn't find the AgentRuntime CR because it receivedmyapp-7d4f8b9c5buttargetRef.nameismyapp.Trace env var injection into agent containers: Injects
OTEL_EXPORTER_OTLP_ENDPOINT,OTEL_EXPORTER_OTLP_PROTOCOL, andOTEL_TRACES_SAMPLER_ARGfrom AgentRuntimespec.traceinto the user's agent containers (not sidecars). This enables agent-level telemetry (e.g., MLflow, LangChain, OpenTelemetry SDK). Developer-set env vars are never overwritten.Design decisions
sidecarContainerNamesmap.OTEL_EXPORTER_OTLP_ENDPOINTon their container, the webhook respects their value.Changes
injector/agentruntime_config.goinjector/agentruntime_config_test.goinjector/pod_mutator.goinjectTraceEnvVars()into user containers; hoistresolvedvariableinjector/pod_mutator_test.goinjector/container_builder.goTest plan
agentcontainer, not onenvoy-proxy🤖 Generated with Claude Code