-
Notifications
You must be signed in to change notification settings - Fork 78
proposal for adding a sidecar to kafka pods #185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,194 @@ | ||
| # Proposal: Native Sidecar Container Support for Strimzi Components | ||
|
|
||
| ## Summary | ||
| Provide a brief summary of the feature you are proposing to add to Strimzi. | ||
|
|
||
| This proposal adds first‑class, **declarative sidecar container** support to Strimzi-managed workloads (starting with Kafka brokers), configured via CRDs under `spec.kafka.template.pod.sidecarContainers`. Users can attach monitoring, logging, security, and networking sidecars without webhooks or forks. The design introduces a stable CRD schema (`SidecarContainer`), a validation and conversion pipeline, and generic interfaces so other Strimzi components (KafkaConnect, KafkaBridge, MirrorMaker2, etc.) can adopt it. | ||
|
|
||
| ## Current situation | ||
| Describe the current capability Strimzi has in this area. | ||
|
|
||
| Strimzi currently lacks native sidecar support. Users rely on: | ||
| 1. Mutating webhooks/admission controllers | ||
| 2. Forking and patching Strimzi code | ||
| 3. Ephemeral containers for ad‑hoc troubleshooting | ||
|
|
||
| Limitations of these approaches include fragile upgrades, higher operational complexity, lack of declarative control, and weak integration with Strimzi lifecycle and security primitives. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe you can elaborate more on this? I think that:
But for those I have at least an idea what you might mean. For the rest - |
||
|
|
||
| ## Motivation | ||
| Explain the motivation why this should be added, and what value it brings. | ||
|
|
||
| **Business value** | ||
| - Operational flexibility: Enable Flexibility for adding additional sidecars as per the custom need of the users | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure flexibility on its own is a value. It is a problem because it makes things unsupportable and unmaintainable. So thre needs to be some goal to justufy the flexibility. |
||
| - Simplified ops: manage sidecars natively via Strimzi CRDs and GitOps | ||
| - Cloud‑native patterns: support log forwarders, APM agents, service‑mesh proxies | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Leaving service-mesh aside, these are not cloud-native patterns for me and have better solutions for example through daemon set for collecting logs etc. Service mesh is an interesting use-case. However, service meshes seem to move away from expensive sidecars. They also rely on injecting sidecars rather than having them preconfigured. And the things I mention below on proxies would apply to them. And there is no service mesh supporting Kafka as far as I know. So frankly, I do not think this proposal helps with it in any way. |
||
|
|
||
| **Representative use cases** | ||
| - Monitoring/Observability: prometheus exporters, APM agents (Datadog, New Relic) | ||
| - Logging/Auditing: Fluent Bit/Filebeat; audit collectors | ||
| - Security/Compliance: Vault agent; cert rotation helpers | ||
| - Networking: traffic analyzers/proxies | ||
| - Data tooling: lightweight data checks or protocol helpers | ||
|
Comment on lines
+26
to
+31
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure how much these usecases really justify the proposal ...
There is a built-in support for Prometheus metrics. The future is likely OpenTelemetry support. These are generally recognized things supported by Strimzi as well as the majority of observability platforms. You do not need any additional Prometheus exporters and stuff like that.
The cloud native solution is through Daemon Sets and container logging. That provides the performance and scale needed. Using file logs and sidecars is not the right pattern.
This is not something sidecar can help with. This would be something what would need much deeper integration into Strimzi.
I think these are the valid points. For things such as Kafka-aware proxies, you would need some kind of support from Strimzi. But that support cannot consist only of adding a sidecar container. You need to handle the proper routing, advertised listeners, authentication / encryption offloading. So while this is the use-case where this proposal would make most sense, it is also the one where it is in my opinion completely insufficient. And moreover, adopting this proposal might prevent these use-cases in the future because it fixes the API. |
||
|
|
||
| ## Proposal | ||
| Provide an introduction to the proposal. Use sub sections to call out considerations, possible delivery mechanisms etc. | ||
|
|
||
| This proposal introduces a **custom `SidecarContainer` CRD model** (stable, Fabric8‑independent) and a **component‑agnostic sidecar pipeline**: | ||
| 1) Schema & API surface (`SidecarContainer`) kept minimal and serializable | ||
| 2) Validation in CR processing; conversion to Fabric8 `Container` at runtime | ||
| 3) Component interface (`SidecarInterface`) + `SidecarUtils` for reuse | ||
| 4) PodTemplate placement for per‑pool (KRaft) configuration | ||
| 5) Automatic NetworkPolicy enrichment for declared sidecar ports | ||
| 6) Volume access model supporting Kafka data (read‑only), ConfigMaps/Secrets, and `emptyDir` | ||
|
|
||
| ### Rationale for Custom `SidecarContainer` Abstraction | ||
| We use a custom `SidecarContainer` instead of Fabric8 `io.fabric8.kubernetes.api.model.Container` in CRDs. | ||
|
|
||
| **Technical justification** | ||
| - **Problem – CRD generation incompatibility**: Fabric8's `Container` exposes `IntOrString` fields (e.g., ports, probe handlers). Strimzi's CRD generator cannot reliably serialize `IntOrString` to OpenAPI v3, breaking validation and schema generation. | ||
| - **Solution – Custom abstraction**: `SidecarContainer` mirrors essential fields with simple types (String, List, standard objects). Probes use a Strimzi type (`SidecarProbe`) instead of Fabric8 `Probe`. | ||
|
|
||
| **Supported configuration (concise)** | ||
| - Core: `name`, `image`, `imagePullPolicy`, `command`, `args` | ||
| - Env & storage: `env` (ContainerEnvVar), `volumeMounts` | ||
| - Networking: `ports` (ContainerPort) | ||
| - Resources: `resources` (ResourceRequirements) **mandatory** | ||
| - Security: `securityContext` (SecurityContext) | ||
| - Health: `livenessProbe`, `readinessProbe` (custom `SidecarProbe`) | ||
|
|
||
| **Benefits** | ||
| - **CRD compatibility**: Ensures generated CRDs are valid and parseable by Kubernetes | ||
| - **Simplified API surface**: Exposes only the configuration options relevant to Strimzi sidecars | ||
| - **Maintainability**: Decouples Strimzi's API from potential breaking changes in Fabric8's container model | ||
| - **Type safety**: Avoids runtime serialization errors caused by ambiguous field types | ||
|
|
||
| ### API design (small snippet) | ||
| ```java | ||
| @Description("Sidecar container alongside main component") | ||
| public class SidecarContainer { | ||
| private String name; | ||
| private String image; | ||
| private List<ContainerPort> ports; | ||
| private List<EnvVar> env; | ||
| private ResourceRequirements resources; // mandatory | ||
| private List<VolumeMount> volumeMounts; | ||
| private SecurityContext securityContext; | ||
| } | ||
| ``` | ||
|
|
||
| **CRD location in PodTemplate (Kafka example)** | ||
| ```yaml | ||
| spec: | ||
| kafka: | ||
| template: | ||
| pod: | ||
| sidecarContainers: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have the container template outside of the Pod template. So likely the sidecars would also end outside of the Pod template and not inside it. |
||
| - name: fluent-bit | ||
| image: fluent/fluent-bit:2.2.0 | ||
| resources: | ||
| limits: { cpu: "200m", memory: "256Mi" } | ||
| ``` | ||
|
|
||
| ### Validation & conversion pipeline | ||
| **Hierarchy**: (1) CRD schema → (2) Strimzi API validation → (3) component rules → (4) pod creation. | ||
| **Key rules**: unique names; no conflict with Strimzi-managed names; resource limits required; ports must not collide with component‑reserved ports; volume mounts must exist and avoid conflicts. | ||
|
|
||
| **Interface & utils (signatures only)** | ||
| ```java | ||
| public interface SidecarInterface { | ||
| void validateSidecarContainers(...); | ||
| List<Container> createSidecarContainers(PodTemplate pod); | ||
| } | ||
|
|
||
| // Conversion | ||
| List<Container> convertSidecarContainers(PodTemplate pod); | ||
| ``` | ||
|
|
||
| ### Network policies & volumes | ||
| - **NetworkPolicy**: Extract sidecar ports from all pools; add ingress rules for those ports by default. Users can further restrict via standard policies. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You cannot restrict the Network Policy rules once they are opened by the operator. So this does not sound like a good approach. You would either need to make it fully configurable or leave it up to the users. |
||
| - **Volumes**: Support read‑only Kafka data mounts for log access, user‑defined `volumes` in `pod.template`, Secrets/ConfigMaps, and `emptyDir`. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Kafka data mounts should not be used for any logs. And ensuring a read-only mounts only might not be really simple. |
||
|
|
||
| ### SidecarProbe (overview) | ||
| A Strimzi probe type used by sidecars to avoid `IntOrString` in CRDs and keep schemas simple. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not reuse the Fabric8 classes? |
||
|
|
||
| **Compact class sketch** | ||
| ```java | ||
| @JsonInclude(NON_NULL) | ||
| @JsonPropertyOrder({ "execCommand", "httpGetPath", "httpGetPort", "httpGetScheme", | ||
| "tcpSocketPort", "initialDelaySeconds", "timeoutSeconds", "periodSeconds", | ||
| "successThreshold", "failureThreshold" }) | ||
| public class SidecarProbe implements UnknownPropertyPreserving { | ||
| private List<String> execCommand; | ||
| private String httpGetPath, httpGetPort, httpGetScheme, tcpSocketPort; | ||
| private int initialDelaySeconds = 15, timeoutSeconds = 5, periodSeconds = 10; | ||
| private int successThreshold = 1, failureThreshold = 3; | ||
| private Map<String,Object> additionalProperties; | ||
| } | ||
| ``` | ||
| **Highlights** | ||
| - Supports `exec`, `httpGet`, `tcpSocket` styles via simple String fields (no `IntOrString`). | ||
| - Sensible defaults: `initialDelaySeconds=15`, `timeoutSeconds=5`, `periodSeconds=10`. | ||
| - Min validations (e.g., timeout/period/success/failure thresholds ≥ 1). | ||
| - Preserves unknown properties for forward compatibility. | ||
|
|
||
| ### Minimal examples | ||
| **Single sidecar (log forwarder)** | ||
| ```yaml | ||
| spec: | ||
| kafka: | ||
| template: | ||
| pod: | ||
| sidecarContainers: | ||
| - name: fluent-bit | ||
| image: fluent/fluent-bit:2.2.0 | ||
| resources: | ||
| limits: { cpu: "200m", memory: "256Mi" } | ||
| ``` | ||
| **Per‑pool configuration (KRaft)** | ||
| ```yaml | ||
| kind: KafkaNodePool | ||
| spec: | ||
| roles: [broker] | ||
| template: | ||
| pod: | ||
| sidecarContainers: | ||
| - name: log-forwarder | ||
| image: fluent/fluent-bit:2.2.0 | ||
| resources: | ||
| limits: { cpu: "200m", memory: "256Mi" } | ||
| ``` | ||
|
|
||
| ## Affected/not affected projects | ||
| Call out the projects in the Strimzi organisation that are/are not affected by this proposal. | ||
|
|
||
| **Affected (Phase 1)** | ||
| - `strimzi-kafka-operator` (api: new `SidecarContainer`; operator: `SidecarInterface`, `SidecarUtils`, KafkaCluster integration; crd‑generator updates) | ||
|
|
||
| **Future (Phase 2+)** | ||
| - KafkaConnect, KafkaBridge, MirrorMaker2, EntityOperator (implement interface; add reserved ports/names) | ||
|
Comment on lines
+164
to
+168
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are these phases? You never taked abotu them before. |
||
|
|
||
| **Not affected** | ||
| - strimzi-kafka-oauth, test-container, drain-cleaner, access-operator (no changes) | ||
|
|
||
| ## Compatibility | ||
| Call out any future or backwards compatibility considerations this proposal has accounted for. | ||
|
|
||
| - **Backward compatible**: Field is optional; no change for existing CRs. | ||
| - **Forward compatible**: Stable custom model avoids Fabric8 `IntOrString` issues in schema generation. | ||
| - **Rolling behavior**: Add/modify/remove sidecars via standard rolling updates. | ||
| - **Versioning**: Introduce in Strimzi ≥ 0.50.0; other components can adopt incrementally. | ||
|
|
||
| ## Rejected alternatives | ||
| Call out options that were considered while creating this proposal, but then later rejected, along with reasons why. | ||
|
|
||
| 1) **Expose Fabric8 `Container` directly** — rejected due to OpenAPI/CRD stability and `IntOrString` schema issues. | ||
| 2) **Separate `KafkaSidecar` CRD** — rejected for not having a generic placeholder for future expandability. | ||
| 3) **Operator‑level auto‑injection** — rejected to keep explicit user control and avoid risky implicit coupling. | ||
| 4) **InitContainer/DaemonSet patterns** — do not satisfy long‑running, pod‑local sidecar use cases. | ||
| 5) **Helm‑only customization** — not portable; breaks on operator‑managed rollouts. | ||
|
Comment on lines
+184
to
+188
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do mot follow what these mean. You should probably elaborate on them. |
||
|
|
||
| ## References | ||
| References (informative) | ||
| - Strimzi Pod Templates (docs) | ||
| - Kubernetes multi‑container pods (docs) | ||
| - Implementation PR: strimzi/strimzi‑kafka‑operator#12121 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does
a validation and conversion pipelinemean? We do not have anything like this.