Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
316 changes: 316 additions & 0 deletions 133-multi-namespace-topic-operator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,316 @@
# Multi-namespace watching for the Topic Operator

Issue [#1206](https://github.com/strimzi/strimzi-kafka-operator/issues/1206)

## Current situation

The Unidirectional Topic Operator (UTO, [proposal 051](https://github.com/strimzi/proposals/blob/main/051-unidirectional-topic-operator.md)) watches `KafkaTopic` resources in a single namespace — typically the namespace where the `Kafka` resource is deployed. The UTO reconciles topic state unidirectionally from Kubernetes to Kafka, uses `creationTimestamp`-based conflict resolution when multiple `KafkaTopic` CRs reference the same Kafka topic, and employs finalizer-based deletion.

The UTO is deployed as a container within the Entity Operator pod, which the Cluster Operator manages. The `STRIMZI_NAMESPACE` environment variable controls which namespace the UTO watches.

## Motivation

In Kubernetes clusters, it is common practice to isolate applications in separate namespaces. When a Kafka cluster is shared among multiple applications or teams, each application typically lives in its own namespace. Currently, all `KafkaTopic` resources must be created in the Kafka cluster's namespace, which creates several problems:

- **Access control**: Teams that need to create topics must have RBAC permissions in the Kafka namespace, which may also grant access to other infrastructure resources they should not touch.
- **Resource management**: Kubernetes namespace-level quotas, and RBAC cannot be used to separate topic ownership per team.
- **GitOps workflows**: Application teams want to declare their Kafka topics alongside their application manifests in their own namespace, not in a shared infrastructure namespace.
- **Namespace hygiene**: Hundreds of `KafkaTopic` resources from different applications co-mingle in a single namespace, making management and auditing harder.

Allowing the Topic Operator to watch `KafkaTopic` resources across multiple namespaces solves these problems while maintaining the existing UTO semantics.

## Proposal

This proposal extends the UTO to watch `KafkaTopic` resources in multiple namespaces. It builds on the UTO's existing mechanisms for conflict resolution, deletion, and status reporting.

### Configuration

A new field `watchedNamespaces` is added to the Topic Operator configuration in the `Kafka` custom resource:

```yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
namespace: kafka
spec:
entityOperator:
topicOperator:
watchedNamespaces:
- team-a
- team-b
- team-c
```

The `watchedNamespaces` field accepts:
- A list of specific namespace names (e.g., `["team-a", "team-b"]`)
- A wildcard `["*"]` to watch all namespaces

The Topic Operator's own namespace (where the `Kafka` resource lives) is **always** watched implicitly. It does not need to be listed in `watchedNamespaces`. When `watchedNamespaces` is not set or is empty, the behavior is unchanged — the TO watches only its own namespace.

For the standalone Topic Operator deployment, the existing `STRIMZI_NAMESPACE` environment variable is extended to accept a comma-separated list of namespaces or `*`, consistent with how the Cluster Operator already supports multi-namespace watching.

**Deprecation of `watchedNamespace` (singular):** The existing `spec.entityOperator.topicOperator.watchedNamespace` field is deprecated in favor of `watchedNamespaces` (plural). When `watchedNamespaces` is set, `watchedNamespace` is ignored and a deprecation warning is logged. Users should migrate by converting `watchedNamespace: team-a` to `watchedNamespaces: [team-a]`.

### RBAC

Watching additional namespaces requires the Topic Operator to have Kubernetes RBAC permissions in those namespaces. The Cluster Operator is responsible for creating and managing these RBAC resources.

**For explicit namespace list:**

The Cluster Operator creates a `Role` and `RoleBinding` in each watched namespace, granting the Entity Operator's `ServiceAccount` permissions to watch, list, get, and update `KafkaTopic` resources and their status.

The Role and RoleBinding names include the Kafka cluster's namespace and name to ensure uniqueness: `strimzi-topic-operator-<kafka-namespace>-<kafka-name>`. This prevents two Kafka clusters watching the same tenant namespace from overwriting each other's RBAC resources.

```yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tha naming here as well as below does not work because it is not unique. You need to create a unique naming mechanism (likely cluster name, namespace + some suffix).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I updated the proposal accordingly to ensure uniqueness

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: strimzi-topic-operator-kafka-my-cluster
namespace: team-a
labels:
strimzi.io/cluster: my-cluster
rules:
- apiGroups: ["kafka.strimzi.io"]
resources: ["kafkatopics", "kafkatopics/status"]
verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: strimzi-topic-operator-kafka-my-cluster
namespace: team-a
labels:
strimzi.io/cluster: my-cluster
subjects:
- kind: ServiceAccount
name: my-cluster-entity-operator
namespace: kafka
roleRef:
kind: Role
name: strimzi-topic-operator-kafka-my-cluster
apiGroup: rbac.authorization.k8s.io
```

**For wildcard (`*`):**

The Cluster Operator creates a `ClusterRole` and `ClusterRoleBinding` instead:

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: strimzi-topic-operator-my-cluster
labels:
strimzi.io/cluster: my-cluster
rules:
- apiGroups: ["kafka.strimzi.io"]
resources: ["kafkatopics", "kafkatopics/status"]
verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: strimzi-topic-operator-my-cluster
labels:
strimzi.io/cluster: my-cluster
subjects:
- kind: ServiceAccount
name: my-cluster-entity-operator
namespace: kafka
roleRef:
kind: ClusterRole
name: strimzi-topic-operator-my-cluster
apiGroup: rbac.authorization.k8s.io
```

When `watchedNamespaces` is updated (namespaces added or removed), the Cluster Operator reconciles the RBAC resources: creating new `Role`/`RoleBinding` pairs for added namespaces and deleting them for removed namespaces. RBAC resources created by the Cluster Operator are labeled with `strimzi.io/cluster` so they can be identified and cleaned up.

### Cross-namespace conflict resolution

The UTO already maintains an in-memory map of *resolved topic name* → (`namespace`, `name`) pairs for conflict detection within a single namespace. This proposal extends that map to span all watched namespaces.

The **resolved topic name** is `spec.topicName` if set, otherwise `metadata.name`. This is critical — two `KafkaTopic` CRs in different namespaces can reference the same Kafka topic through different mechanisms:

- Namespace A: `KafkaTopic` with `metadata.name: my-topic` (no `spec.topicName`, resolves to `my-topic`)
- Namespace B: `KafkaTopic` with `metadata.name: app-topic`, `spec.topicName: my-topic` (resolves to `my-topic`)

Both reference the same Kafka topic. The conflict resolution rules are:

1. The `KafkaTopic` with the **oldest** `metadata.creationTimestamp` is the owner, regardless of which namespace it is in.
2. All other `KafkaTopic` CRs referencing the same Kafka topic get:

```yaml
status:
conditions:
- type: Ready
status: "False"
reason: ResourceConflict
message: "Managed by team-a/my-topic"
lastTransitionTime: "2024-11-24T17:00:00Z"
```

This is identical to how the UTO handles conflicts within a single namespace today, extended across namespace boundaries.

### Deletion semantics

The UTO uses Kubernetes finalizers to ensure that deleting a `KafkaTopic` also deletes the corresponding Kafka topic. This behavior is unchanged in the multi-namespace case.

When the **owning** `KafkaTopic` (the oldest one) is deleted:
1. The finalizer logic deletes the Kafka topic from the Kafka cluster.
2. Any other `KafkaTopic` CRs in other namespaces that referenced the same Kafka topic will have their status updated to reflect that the topic no longer exists. They can then be cleaned up by their respective namespace owners, or if left in place, they will attempt to recreate the topic on their next reconciliation (becoming the new owner since no conflict exists).

When a **non-owning** `KafkaTopic` (one with `ResourceConflict` status) is deleted:
1. The finalizer is removed without deleting the Kafka topic (same as existing UTO behavior for conflicting resources).
2. No impact on the owning `KafkaTopic` or the Kafka topic itself.

The existing `strimzi.io/managed: false` annotation continues to work as before — it allows a `KafkaTopic` to be deleted without deleting the Kafka topic, enabling ownership transfers.

### Identifying ownership via `strimzi.io/cluster`

The existing `strimzi.io/cluster` label already identifies which Kafka cluster a `KafkaTopic` belongs to. This proposal extends it to eliminate ambiguity when multiple Kafka clusters share the same name across different namespaces.

**Single-namespace mode (unchanged):** Users continue to label topics with the Kafka cluster name:

```yaml
labels:
strimzi.io/cluster: my-cluster
```

**Multi-namespace mode (new):** Users label topics with the fully-qualified `<namespace>/<name>` form:

```yaml
labels:
strimzi.io/cluster: kafka/my-cluster
```

The TO filters its watch using this label. In multi-namespace mode, the TO only reconciles resources whose `strimzi.io/cluster` label matches its own `<namespace>/<name>` identity (e.g., `kafka/my-cluster`). It also accepts the unqualified form `my-cluster` for backward compatibility when there is no ambiguity — but the qualified form `kafka/my-cluster` is required when multiple Kafka clusters share the same name across namespaces.

Because each TO instance selects only topics carrying its own qualified identity, two TOs watching overlapping namespaces do not conflict: `kafka/my-cluster` and `staging/my-cluster` are distinct label values and each TO ignores resources labeled for the other.

If a `KafkaTopic` is in scope for a TO but lacks the expected `strimzi.io/cluster` label entirely (e.g., due to a misconfiguration), the TO skips it silently and logs a WARN. No status writes are performed on resources the TO does not own, which prevents reconciliation fights between competing instances.

### Namespace lifecycle considerations

**Adding a namespace to `watchedNamespaces`:**
The Cluster Operator creates RBAC resources in the new namespace and restarts/reconfigures the TO. The TO starts watching the new namespace and reconciles any existing `KafkaTopic` resources in it.

**Removing a namespace from `watchedNamespaces`:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not work because once you remove the namespace, it is beyond the operator's reach. So it cannot remove the finalizers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean when someone else deletes the namespaces, and not when someone removes a namespace from the watchedNamespaces list, in the first case you are right and manual intervention is required, but if i remove a namespace from the operator's config i can perform action before i remove RBAC, i updated the proposal to make sure this is clear

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe my comment was too confusing.

  • Cluster Operator does not touch any topics. It should not remove any finalizers from them. (not 100% sure the CO is typo here or intention)
  • Topic Operator would never know that some namespace was removed as it is removed through restart. So it starts with the new namespace configuration and has no idea a namespace was removed.
  • And even if it knew about it, it would not have the RBAC rights to do anything with the topics because the RBAC rights are managed independently through CO, and there is no synchronization point.
  • Plus we in general do not delete the user-facing custom resources.

I do not think you can really fix it to be honest. All what can be done here is that this is user's responsibility to either delete the topics first or manually later.

The Cluster Operator must follow this sequence to ensure finalizers can be cleaned up before RBAC is revoked:

1. While RBAC still exists, the CO removes the `strimzi.io/topic-operator` finalizer from all `KafkaTopic` resources in the namespace being removed.
2. The CO updates the TO configuration to remove the namespace from `watchedNamespaces`.
3. The CO deletes the RBAC resources (`Role`/`RoleBinding`) for that namespace.

This ordering is critical: deleting RBAC before removing finalizers would leave the TO unable to reach the `KafkaTopic` resources, causing finalizers to block any subsequent namespace deletion.

**Watched namespace is deleted:**
If the TO is running, it processes the finalizers normally — deleting Kafka topics as appropriate. If the TO is not running (or the namespace was already removed from the watch list), `KafkaTopic` resources with finalizers will block namespace deletion. In this case, manual finalizer removal is necessary (e.g., `kubectl patch kafkatopic <name> -n <ns> --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'`). This is an existing limitation of the UTO that applies equally to the single-namespace case, and should be documented as operational guidance.

**TO restart:**
On startup, the TO scans all watched namespaces to rebuild its in-memory conflict resolution map before beginning reconciliation. This is the same process used today for a single namespace, extended to multiple namespaces. Startup time increases linearly with the number of watched namespaces and `KafkaTopic` resources.

### Example

A Kafka cluster in namespace `kafka`, with topics managed by teams in `team-a` and `team-b`:

```yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
namespace: kafka
spec:
entityOperator:
topicOperator:
watchedNamespaces:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this handle compatibility witht he existing watchNamespace field?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the Proposal, i went with a route of depreacting this field in favor of the plural one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecation is fine. But we are not abandoning fields from one day to another. We do things in backwards compatible way.

- team-a
- team-b
```

Team A creates a topic in their namespace, using the qualified `strimzi.io/cluster` label:

```yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: orders
namespace: team-a
labels:
strimzi.io/cluster: kafka/my-cluster
spec:
partitions: 12
replicas: 3
config:
retention.ms: "604800000"
```

The TO reconciles this topic and creates it in Kafka:

```yaml
metadata:
name: orders
namespace: team-a
labels:
strimzi.io/cluster: kafka/my-cluster
finalizers:
- strimzi.io/topic-operator
status:
topicName: orders
conditions:
- type: Ready
status: "True"
lastTransitionTime: "2024-11-24T17:00:00Z"
```

If Team B creates a `KafkaTopic` also named `orders` in their namespace, it receives a conflict error:

```yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: orders
namespace: team-b
status:
conditions:
- type: Ready
status: "False"
reason: ResourceConflict
message: "Managed by team-a/orders"
lastTransitionTime: "2024-11-24T17:05:00Z"
```

## Affected/not affected projects

Affected:
- **Topic Operator**: Extended to watch multiple namespaces, cross-namespace conflict resolution
- **Cluster Operator**: Creates RBAC resources for watched namespaces, passes `watchedNamespaces` configuration to the Entity Operator

Not affected:
- User Operator, Kafka Connect operator, Kafka MirrorMaker operator
- KafkaTopic CRD schema (no changes to spec fields)

## Compatibility

- The `watchedNamespaces` field is optional with an empty default, preserving existing single-namespace behavior.
- No changes to the `KafkaTopic` CRD spec schema.
- `strimzi.io/cluster` label: the existing unqualified form (e.g., `my-cluster`) continues to work in single-namespace deployments. In multi-namespace mode, the qualified `<namespace>/<name>` form (e.g., `kafka/my-cluster`) is required when multiple Kafka clusters share the same name across namespaces. The TO accepts both forms for backward compatibility.
- `watchedNamespace` (singular) is deprecated. When `watchedNamespaces` (plural) is set, the singular field is ignored with a deprecation warning logged. The migration path is: `watchedNamespace: team-a` → `watchedNamespaces: [team-a]`.
- Existing `KafkaTopic` resources continue to work without modification.

## Future work

- Multi-namespace support for the User Operator (`KafkaUser` resources) — this will require a separate proposal addressing the security implications of cross-namespace ACL management (see [PR #137 discussion](https://github.com/strimzi/proposals/pull/137)).
- Namespace-level policies to restrict which topic configurations can be set from a given namespace (e.g., allowing infra teams to retain control over certain topic-level configs).

## Rejected alternatives

### New `KafkaNamespaceTopic` CRD

[Issue #1206](https://github.com/strimzi/strimzi-kafka-operator/issues/1206) proposed a `KafkaNamespaceTopic` CRD that would encode the Kubernetes namespace in the Kafka topic name (e.g., `team-a.orders`). This approach was rejected because:

1. **CRD maintenance cost**: Each custom resource is expensive to maintain. A new CRD requires its own controller logic, status management, validation webhooks, documentation, and migration tooling.
2. **Forces naming convention**: Encoding the namespace in the topic name forces a specific naming convention (`<namespace>.<topic>`) that may not match existing topic naming patterns or be desirable for all users.
3. **Dual CRD confusion**: Having both `KafkaTopic` and `KafkaNamespaceTopic` with different sync semantics (the original proposal suggested unidirectional sync for the new CRD) creates confusion about which to use when.
4. **Unnecessary complexity**: Extending the existing UTO to watch multiple namespaces achieves the same goal without introducing a new resource type, by building on the conflict resolution and lifecycle mechanisms the UTO already has.