Conversation
Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
handling Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
132-dynamic-controller-quorum.md
Outdated
| Without the bootstrap snapshot containing the `VotersRecord`, initial controllers face a kind of deadlock: they cannot participate in elections or become candidates unless they know who the voters are, but the voters set information comes from the `VotersRecord` which must be written to the replicated log by a leader. | ||
| However, no leader can exist without successful elections, and elections cannot happen without controllers knowing the voters set. | ||
| This circular dependency means that if all controllers start without a pre-written `VotersRecord` in their bootstrap snapshot, they would all have an empty voters set, preventing any of them from becoming candidates or holding elections, leaving the cluster unable to bootstrap. | ||
| The only way to break this deadlock is to pre-write the `VotersRecord` into the bootstrap snapshot during initial formatting using the `--initial-controllers` parameter, ensuring all initial controllers know the voters set from the beginning before the cluster even starts. |
There was a problem hiding this comment.
This is a really good explanation about why the bootstrap snapshot needed. This part is not explained in the original KIP-853. Nice!
132-dynamic-controller-quorum.md
Outdated
| The operator also adds a `cluster.new` field to the ConfigMap, which indicates whether this is a brand new cluster (`true`) or an existing cluster (`false`). | ||
| This field is determined based on the presence of the `clusterId` in the `Kafka` custom resource status: | ||
|
|
||
| * A cluster is considered "new" if the status is null or the `clusterId` is null/empty. |
There was a problem hiding this comment.
This is confusing. From the description below, the status is to determine if it's static or dynamic quorum, and the clusterId is to determine if this cluster is new or existing. Am I right?
There was a problem hiding this comment.
Referring to the Kafka custom resource status, when it's null, the cluster is new anyway. With regards to being static quorum based, it's the controllers field in the status to be null instead, but status still does exist.
132-dynamic-controller-quorum.md
Outdated
|
|
||
| * Reads the node role from the `process.roles` property in `/tmp/strimzi.properties` into a `PROCESS_ROLES` variable. | ||
| * If the node is a broker only, it formats with the `-N` option (works for both new cluster creation and broker scale-up). | ||
| * If the node is a controller, it reads the `cluster.new` field from the ConfigMap and proceeds based on whether this is a new cluster or an existing one: |
There was a problem hiding this comment.
question:
- so we cannot use the
clusterIdas an identifier to know if this cluster is new or existing at this point of time? - When will
cluster.newbe set to false?
There was a problem hiding this comment.
so we cannot use the clusterId as an identifier to know if this cluster is new or existing at this point of time?
We can't use it because at this point even a new cluster we'll get assigned a clusterId by the operator for formatting the starting nodes.
When will cluster.new be set to false?
based on the condition that the status is set together with a clusterId within it.
There was a problem hiding this comment.
Can you provide maybe full flow of how cluster Id generated and not generated therefore how these new fields respond? I found it confusing as to how these new fields.
I wonder if it would be helpful to explain the flow something like the following if my understanding is correct:
When a new Kafka CR is created, therefore it's a new cluster:
- Unique clusterId is generated since Kafka status is null. Kafka CR status is not updated yet at this point.
-
Is this when Kafka CR status is populated with the
controllers? - Configmap is generated with the fields and is set to be volume mounted.
- Create
cluster.newfield for the configmap, as Kafka status is null or has null clusterId. -
Is this when we populate the
controllersstring for the configmap based on the CR?
- Create
- Cluster is started by running the script which checks the
cluster.newfield from the configmap (at this point cluster.id exists but just not in the status yet). - Kafka CR status is updated with the clusterId retrieved via Admin API.
When reconciling an existing Kafka cluster:
- clusterId is retrieved from Kafka CR status, as it already contains a valid clusterId.
- Configmap is generated with the fields and is set to be volume mounted.
- Do not create
cluster.newfield for the configmap, as Kafka status has clusterId. -
Is this when we populate the
controllersstring for the configmap based on the CR?
- Do not create
- Cluster is started by running the script which checks that
cluster.newdoes not exist in the configmap.
There was a problem hiding this comment.
I think the flow is somehow described around line 200.
Do not create cluster.new field for the configmap, as Kafka status has clusterId.
cluster.new is created anyway but it's false.
Although this comment let me think that the formatting flow could be simplified.
I can simplify and not distinguish between new and existing cluster but the main difference would just be "is the current node present in the controllers list?" yes -> use -I, no use -N.
The only exception would be if it's in the controllers list but there was a metadata disk change, it needs -N (but it's anyway independent from the fact it's new or existing cluster). I will work on my POC to validate this simplification.
There was a problem hiding this comment.
@tinaselenge FYI I updated the proposal based on the above idea. The flow in the run script isn't based on new or existing cluster anymore so the cluster.new was also removed. You can check it around lines 227 and forward.
|
|
||
| * BEGIN RECONCILIATION | ||
| * ... (other reconcile operations) ... | ||
| * **KRaft quorum reconciliation**: analyze current quorum state, unregister and register controllers as needed (typically unregisters controllers being scaled down). |
There was a problem hiding this comment.
Does this mean the unregistered controller node will be deleted after this reconciliation? If not, then when?
There was a problem hiding this comment.
Unregistered controller nodes deletion is part of the "scale down controllers" step and it happens after the controllers are unregistered but still in the same reconciliation.
| * **KRaft quorum reconciliation**: analyze current quorum state, unregister and register controllers as needed (typically unregisters controllers being scaled down). | ||
| * scale down controllers. | ||
| * ... (other reconcile operations) ... | ||
| * **KRaft quorum reconciliation** (rolling): for each controller pod restart, `KafkaRoller` invokes a "single-controller reconciliation" to handle metadata disk changes immediately (unregister old voter with stale directory ID, register new observer with current directory ID). |
There was a problem hiding this comment.
So the single-controller reconciliation process only needs for metadata disk change, right?
There was a problem hiding this comment.
Yes because it's reconcile the participation to the KRaft quorum only for that specific node and it's not a full KRaft quorum reconciliation which takes into account all controllers (the code in my POC is anyway shared between the two, because reconciling the KRaft quorum as a whole needs check quorum participation for each controller anyway).
|
|
||
| Analyze desired vs. actual state, for each "desired" controller node: | ||
| - Checks if the controller pod has been rolled with the controller role (verified by the presence of the `strimzi.io/controller-role` label on the pod). | ||
| - Compares the controller's current state in the quorum (voter, observer, or absent) with the expected state based on the `controllers` status field. |
There was a problem hiding this comment.
question: If controller is scaling up from 3 -> 4, then the new added controller node has network issue that cannot talk to the active controller to register itself. In this situation, the admin API might still not have the 4th node in observer or voter. What will we do in this situation?
There was a problem hiding this comment.
In this case the KRaft quorum reconciler logic won't do anything, nothing to register or nothing to unregister. On next reconciliation, it will still detect that 4th node is in the controllers "desired" list, gets the quorum metadata to compare and if it's an observer, it will be registered otherwise still skipped.
|
|
||
| * Add support for dynamic controller quorum to the Strimzi Cluster Operator and using it by default for any newly created Apache Kafka cluster. | ||
| * Add support for controllers scaling by leveraging the dynamic controller quorum. | ||
| * Add migration from static to dynamic controller quorum. |
There was a problem hiding this comment.
Are these going to implemented via separate PRs or all in one go? Or is there a plan to break down implementation in a way that makes sense to make it easier to review?
There was a problem hiding this comment.
The first two bullets will come together for sure because strictly related.
The migration one can be done in a separate PR if we want. Meanwhile, the operator just skips the KRaft quorum reconciliation for any "static" quorum cluster (where the "controllers" field in the status is null).
There was a problem hiding this comment.
The "Compatibility" section seems to imply that auto migration will be released at the same time.
There was a problem hiding this comment.
"will be released at the same time" is different from "will be merged as two different PRs".
We'll release the overall dynamic quorum support feature in one go but the controller scaling and migration could come in two separate PRs anyway (to simplify the review process).
132-dynamic-controller-quorum.md
Outdated
| They use the `controller.quorum.bootstrap.servers` configuration to contact an existing controller from where fetching the metadata log containing the `VotersRecord` for the voters set. | ||
| The `VotersRecord` contains critical information including voter IDs, directory IDs (unique UUIDs), endpoints, and supported `kraft.version` ranges. | ||
|
|
||
| Without the bootstrap snapshot containing the `VotersRecord`, initial controllers face a kind of deadlock: they cannot participate in elections or become candidates unless they know who the voters are, but the voters set information comes from the `VotersRecord` which must be written to the replicated log by a leader. |
There was a problem hiding this comment.
| Without the bootstrap snapshot containing the `VotersRecord`, initial controllers face a kind of deadlock: they cannot participate in elections or become candidates unless they know who the voters are, but the voters set information comes from the `VotersRecord` which must be written to the replicated log by a leader. | |
| Without the bootstrap snapshot containing the `VotersRecord`, initial controllers face a deadlock: they cannot participate in elections unless they know who the voters are, but this information must be written to the replicated log by an elected leader. |
There was a problem hiding this comment.
I will push a slightly different version of your suggestion.
| Once this standalone controller is running, additional controllers can be formatted with `--no-initial-controllers`, started as observers, and then dynamically added to the quorum using the standard add controller operations. | ||
| While this approach simplifies the initial bootstrap by avoiding the need to coordinate directory IDs for all controllers upfront, it means the cluster starts with no fault tolerance since a single-controller quorum cannot tolerate any failures. | ||
| However, once additional controllers are added and registered as voters, the cluster achieves the desired redundancy and fault tolerance. | ||
| This approach can't cope with how Strimzi starts up the cluster nodes all together and doesn't have the possibility to do a rolling start one by one on cluster creation. |
There was a problem hiding this comment.
Are you saying, if a quorum is started with a single controller, the rolling cannot happen?
There was a problem hiding this comment.
What I am saying is that if you have a KafkaNodePool for controllers with 3 replicas, the operator doesn't start the first one only, then when it's ready it moves to the second, and finally to the third. The operator just creates the StrimziPodSets which creates all the 3 pods in parallel. So more controllers always starts in parallel, there is no sequential start on cluster creation. Operator provides only sequential pods re-start during rolling (thanks to the KafkaRoller).
| * builds the broker and controller configuration by setting the `controller.quorum.bootstrap.servers` field (instead of the `controller.quorum.voters` one). | ||
| * generates a random directory ID, as a UUID, for each controller. | ||
| * saves the `controllers` field (list of `KafkaControllerStatus` objects) within the `Kafka` custom resource status, containing the controller IDs and their corresponding directory IDs. | ||
| * builds the controllers string from the controllers status list, and stores it as `controllers` field within the node ConfigMap, to be loaded by the `kafka_run.sh` script where it's needed for formatting the storage properly. |
There was a problem hiding this comment.
controller.quorum.bootstrap.servers field already exists in the configmap at this point, so can it not be used for the script if the format of the string is the same?
There was a problem hiding this comment.
The string is not the same. The string for storage formatting (with --initial-controllers) has a different format including "nodeId" and "directoryId" for each controller but they are not part of the controller.quorum.bootstrap.servers. For example:
The "controllers" field in the ConfigMap for the format purposes:
controllers: 3@my-cluster-controller-3.my-cluster-kafka-brokers.myproject.svc:9090:IQf-4YZ6Qx20wzUPuf8Oeg,4@my-cluster-controller-4.my-cluster-kafka-brokers.myproject.svc:9090:PtjEWItSR6W4q1JtnTPXaw,5@my-cluster-controller-5.my-cluster-kafka-brokers.myproject.svc:9090:hLPNF3NWSC2laduRehT0Dw
versus the controller.quorum.bootstrap.servers configuration within the controller node:
controller.quorum.bootstrap.servers=my-cluster-controller-3.my-cluster-kafka-brokers.myproject.svc:9090,my-cluster-controller-4.my-cluster-kafka-brokers.myproject.svc:9090,my-cluster-controller-5.my-cluster-kafka-brokers.myproject.svc:9090
132-dynamic-controller-quorum.md
Outdated
|
|
||
| * Reads the node role from the `process.roles` property in `/tmp/strimzi.properties` into a `PROCESS_ROLES` variable. | ||
| * If the node is a broker only, it formats with the `-N` option (works for both new cluster creation and broker scale-up). | ||
| * If the node is a controller, it reads the `cluster.new` field from the ConfigMap and proceeds based on whether this is a new cluster or an existing one: |
There was a problem hiding this comment.
Can you provide maybe full flow of how cluster Id generated and not generated therefore how these new fields respond? I found it confusing as to how these new fields.
I wonder if it would be helpful to explain the flow something like the following if my understanding is correct:
When a new Kafka CR is created, therefore it's a new cluster:
- Unique clusterId is generated since Kafka status is null. Kafka CR status is not updated yet at this point.
-
Is this when Kafka CR status is populated with the
controllers? - Configmap is generated with the fields and is set to be volume mounted.
- Create
cluster.newfield for the configmap, as Kafka status is null or has null clusterId. -
Is this when we populate the
controllersstring for the configmap based on the CR?
- Create
- Cluster is started by running the script which checks the
cluster.newfield from the configmap (at this point cluster.id exists but just not in the status yet). - Kafka CR status is updated with the clusterId retrieved via Admin API.
When reconciling an existing Kafka cluster:
- clusterId is retrieved from Kafka CR status, as it already contains a valid clusterId.
- Configmap is generated with the fields and is set to be volume mounted.
- Do not create
cluster.newfield for the configmap, as Kafka status has clusterId. -
Is this when we populate the
controllersstring for the configmap based on the CR?
- Do not create
- Cluster is started by running the script which checks that
cluster.newdoes not exist in the configmap.
| In the controllers scaling scenario, the operator handles the registration as well (not only the unregistration). | ||
|
|
||
| In addition to handling the registration and unregistration of controllers, the Strimzi Cluster Operator has to check the health of the KRaft quorum before allowing the scale down of the controllers. | ||
| At the beginning of the `KafkaReconciler` reconciliation process, if scaling down controllers could break the quorum (by losing the majority of voters needed for consensus), it should be blocked and reverted back by the operator. |
There was a problem hiding this comment.
Is the quorum health checked before unregistering the node that is about to be scaled down?
132-dynamic-controller-quorum.md
Outdated
| - Find if this node ID is part of voters and/or observers in the quorum | ||
| - If multiple incarnations detected (same node ID with different directory IDs): | ||
| - Read actual directory ID from pod via Kafka Agent HTTP endpoint | ||
| - Add voter with wrong directory ID to `toUnregister` list |
There was a problem hiding this comment.
do you mean the old directory id, rather than wrong?
There was a problem hiding this comment.
Yeah, it's better to stick with "old" even if it's technically "wrong" as well.
132-dynamic-controller-quorum.md
Outdated
| - If multiple incarnations detected (same node ID with different directory IDs): | ||
| - Read actual directory ID from pod via Kafka Agent HTTP endpoint | ||
| - Add voter with wrong directory ID to `toUnregister` list | ||
| - Add observer matching actual directory ID to `toRegister` list |
There was a problem hiding this comment.
do you mean the new directory id? How are we differentiating the 2?
There was a problem hiding this comment.
"actual" would mean the real one coming from reading the Kafka Agent HTTP endpoint, and it would be technically the "new". I will change to "new" but leaving that it's the one actually in the meta.properties.
132-dynamic-controller-quorum.md
Outdated
|
|
||
| *Reconciliation:* | ||
| - Status before reconciliation: [{3:A}, {4:B}, {5:C}] | ||
| - Analysis: Node 6 in desired but NOT in status, operator generates random UUID "D" |
There was a problem hiding this comment.
How do we know whether we need to generate the directory id?
There was a problem hiding this comment.
It's a left over of the rejected approach, I will change it and review the rest if they have same issue.
FYI, it will be generated by the Kafka storage format tool when the node starts by using the famous -N option because it's a scale up.
132-dynamic-controller-quorum.md
Outdated
| * removes/unregisters the controllers from the quorum by using the `removeRaftVoter` method in the Kafka Admin API. | ||
| * scales down the controllers by deleting the corresponding pods. | ||
|
|
||
| If getting the quorum information or unregistering the controllers fails (the Apache Kafka returns an error), the reconciliation will fail to avoid the controllers, not unregistered correctly, to be shutten down. |
There was a problem hiding this comment.
| If getting the quorum information or unregistering the controllers fails (the Apache Kafka returns an error), the reconciliation will fail to avoid the controllers, not unregistered correctly, to be shutten down. | |
| If getting the quorum information or unregistering the controllers fails (the Apache Kafka returns an error), the reconciliation will fail to avoid the controllers, not unregistered correctly, to be shut down. |
132-dynamic-controller-quorum.md
Outdated
| * registers them as voters using the `addRaftVoter` API, following the same sequential registration process described in the scale-up section. | ||
|
|
||
| The `strimzi.io/controller-role` label check is critical for resilience: if the operator crashes after some nodes have been rolled but before registration completes, on restart the operator can detect which nodes have already been rolled as controllers (by checking the label) and only register those, skipping nodes that haven't been rolled yet. | ||
| This prevents attempting to register nodes that are desired to be controllers but haven't yet been actualized as controllers. |
There was a problem hiding this comment.
| This prevents attempting to register nodes that are desired to be controllers but haven't yet been actualized as controllers. | |
| This prevents attempting to register nodes that are desired to be controllers but haven't yet been restarted as controllers. |
132-dynamic-controller-quorum.md
Outdated
| At the beginning of the reconciliation cycle, the KRaft quorum reconciliation: | ||
|
|
||
| * analyzes the quorum state and identifies controllers that need to be removed (those in the current voters but no longer having the controller role in the desired configuration). | ||
| * unregisters them from the quorum using the `removeRaftVoter` API, ensuring they are removed before the rolling update begins. |
There was a problem hiding this comment.
| * unregisters them from the quorum using the `removeRaftVoter` API, ensuring they are removed before the rolling update begins. | |
| * unregisters them from the quorum using the `removeRaftVoter` API, ensuring they are removed from the voter list before the rolling update begins. |
132-dynamic-controller-quorum.md
Outdated
|
|
||
| After unregistration completes: | ||
|
|
||
| * the operator builds the new node configuration without controller-specific settings. |
There was a problem hiding this comment.
| * the operator builds the new node configuration without controller-specific settings. | |
| * the operator builds configurations of the affected nodes without controller-specific settings. |
|
|
||
| What is the impact of it on downgrading the operator to a previous release where there is no support for dynamic quorum? | ||
|
|
||
| When downgraded to an older release, the operator reconfigures the nodes with the old `controller.quorum.voters` parameter and roll them. |
There was a problem hiding this comment.
so controller.quorum.voters will be added back to the configuration but not actually used for anything right?
There was a problem hiding this comment.
Yes, exactly. This is how Kafka works for backward compatibility. @showuon can confirm it.
132-dynamic-controller-quorum.md
Outdated
| A new controller formatted with `-I` containing all current controllers can still join the quorum as an observer and be registered as a voter. | ||
| However, several issues make this approach unsuitable: | ||
|
|
||
| - Unnecessary checkpoint file: Formatting with `-I` creates a bootstrap snapshot/checkpoint file on the scaled-up controller's disk. This file is redundant because new controllers don't use it to discover the quorum but they fetch the `VotersRecord` from the leader's metadata log instead. This can be consider a minimal issue. |
There was a problem hiding this comment.
| - Unnecessary checkpoint file: Formatting with `-I` creates a bootstrap snapshot/checkpoint file on the scaled-up controller's disk. This file is redundant because new controllers don't use it to discover the quorum but they fetch the `VotersRecord` from the leader's metadata log instead. This can be consider a minimal issue. | |
| - Unnecessary checkpoint file: Formatting with `-I` creates a bootstrap snapshot/checkpoint file on the scaled-up controller's disk. This file is redundant because new controllers don't use it to discover the quorum but they fetch the `VotersRecord` from the leader's metadata log instead. This can be considered a minor issue since the file is quite small and doesn't have any impact. |
| However, several issues make this approach unsuitable: | ||
|
|
||
| - Unnecessary checkpoint file: Formatting with `-I` creates a bootstrap snapshot/checkpoint file on the scaled-up controller's disk. This file is redundant because new controllers don't use it to discover the quorum but they fetch the `VotersRecord` from the leader's metadata log instead. This can be consider a minimal issue. | ||
| - Undocumented behavior: Most critically, this approach is not documented in the official Apache Kafka documentation or KIP-853. Relying on undocumented behavior creates a risk that future Kafka versions could change or break this functionality without notice. |
There was a problem hiding this comment.
If this gets documented officially, does using -I all the time significantly simplifies the process and code? Maybe we could mention here, what exactly gets simplified e.g. it removes the need for cluster.new field in the config map.
If it gets documented officially after the proposal is accepted and implemented, would you consider changing it? How difficult would that be to refactor?
There was a problem hiding this comment.
There could be an update about this and I need to discuss with @showuon further because it could be not usable anymore after some investigation he made.
|
|
||
| ```shell | ||
| bin/kafka-metadata-quorum.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 describe --status | ||
| ``` |
There was a problem hiding this comment.
I agree that this should be rejected but maybe can add a short sentence or 2 on why it was rejected? Do we need to include all these commands users have to do? Maybe can just say, this requires users to run many commands in specific order manually, how that is not user friendly and error prone etc. ?
There was a problem hiding this comment.
I wanted to leave the commands to make more understandable the process, because this is actually what you would do with a Kafka cluster running on bare metal/VMs so it helps, I guess, to understand more about the automation.
I will add a couple of sentences about why it was rejected.
| kubectl patch kafka my-cluster -n myproject --type=merge --subresource=status -p '{"status":{"controllers":[{"id":3,"directoryId":"U3fHvCoMVWiCVYa2ri_K5w"},{"id":4,"directoryId":"g3OMYG2gvmLCeE9Nv-Cz5Q"},{"id":5,"directoryId":"2K7pPIanujBKY1Tsxr-gWg"}]}}' | ||
| ``` | ||
|
|
||
| Patching the `Kafka` custom resource status will trigger nodes rolling and the operator will reconfigure them with the `controller.quorum.bootstrap.servers` field for using the dynamic quorum. |
There was a problem hiding this comment.
Similar to above, do need to these commands? Can we add why this was rejected?
fvaleri
left a comment
There was a problem hiding this comment.
@ppatierno thanks for the proposal and examples.
I left few comments, but the approach LGTM.
|
|
||
| * Add support for dynamic controller quorum to the Strimzi Cluster Operator and using it by default for any newly created Apache Kafka cluster. | ||
| * Add support for controllers scaling by leveraging the dynamic controller quorum. | ||
| * Add migration from static to dynamic controller quorum. |
There was a problem hiding this comment.
The "Compatibility" section seems to imply that auto migration will be released at the same time.
| 3. Analyze current voters and identify unwanted ones: | ||
| - For each voter not in desired controllers: | ||
| - Add to `toUnregister` list (handles scale-down scenarios) |
There was a problem hiding this comment.
What's the toUnregister list order? In other words, which controller is scaled down first when multiple controllers need to be removed? This is crucial to minimize leadership gaps where no metadata changes can be processed.
I think we scale down from the highest-numbered controller pod right. IMO the operator should prefer removing non-leaders first to minimize these leadership gaps and make scale down more efficient.
Example:
- The user scales from 5 to 3.
- The operator starts removing controller-4 which happens to be the leader.
- Controller-4 resigns after commit causing a hopefully brief leadership gap.
- New leader is elected among {0, 1, 2, 3}.
- Now it's the turn of controller-3, that was just elected as the new leader.
- The same leader-removal dance happens again.
There was a problem hiding this comment.
I can think more about this but take into account that unregistration is pretty fast and it happens before scaling down so I would expect that controller 4 and 3 would be unregistered one after the other very fast and then shut down. At this point if 4 was the leader, the new election will start among the rest 0,1,2 (so skipping the dance with 3 elected).
There was a problem hiding this comment.
unregistration is pretty fast
This is assuming that the active controller is always fast, which may not be the case. We can consider this an optimization and do a follow up, but I would at least mention it in the proposal.
There was a problem hiding this comment.
I agree on doing this. I changed the reconciliation algorithms by adding the following:
Phase 1: Unregister all controllers in the
toUnregisterlist (stale/unwanted controllers). If one of the controller is the leader, it will be unregistered as the last one to avoid useless leader elections in between multiple controllers unregistrations.
| This means that while all controllers start up in a single reconciliation, registering them all as voters may require multiple reconciliation cycles. | ||
|
|
||
| Furthermore, monitoring that the controller has caught up with the active ones is not necessary. | ||
| If the controller hasn't caught up yet and the operator runs the registration, the active controller returns an error and the reconciliation will fail, allowing the registration to be re-tried in the next reconciliation cycle. |
There was a problem hiding this comment.
Are we going to intercept this error code and log a useful message?
There was a problem hiding this comment.
Yeah, it's logged by the operator. Not sure we want to have some warning conditions in the status.
Signed-off-by: Paolo Patierno <ppatierno@live.com>
formatting purposes Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Paolo Patierno <ppatierno@live.com>
fvaleri
left a comment
There was a problem hiding this comment.
LGTM. Thanks for addressing my comments.
This proposal is about adding the support for dynamic quorum and controllers scaling to the Strimzi Cluster Operator.
It replaces #190.
I have been already working on a POC to validate what is currently written within this proposal.
I also added some scenarios of dynamic quorum and controller scaling usage with both happy paths and failures.
It is also possible to try it by deploying a Strimzi Cluster Operator but using the following images in the Deployment file:
quay.io/ppatierno/operator:dynamic-quorumquay.io/ppatierno/kafka:dynamic-quorum-kafka-4.1.1quay.io/ppatierno/kafka:dynamic-quorum-kafka-4.2.0