Skip to content

Add cluster mode support#51

Open
qjsoq wants to merge 21 commits intovalkey-io:mainfrom
qjsoq:cluster-enable-branch
Open

Add cluster mode support#51
qjsoq wants to merge 21 commits intovalkey-io:mainfrom
qjsoq:cluster-enable-branch

Conversation

@qjsoq
Copy link

@qjsoq qjsoq commented Oct 24, 2025

This pull request introduces a new, comprehensive Helm chart named valkey-cluster for deploying a clustered Valkey instance on Kubernetes.
This chart is designed to handle the complexities of a clustered setup by leveraging a StatefulSet and an intelligent initialization script.

What This PR Does:

  • StatefulSet Deployment: Ensures stable network identifiers and ordered deployment, which is critical for a clustered database.
  • Introduces Dynamic Per-Pod PVCs: Implements volumeClaimTemplates to automatically provision a unique PersistentVolumeClaim for each cluster node, ensuring data persistence and isolation.
  • Automated Cluster Initialization: A new init-cluster.sh script (managed via cluster_config.yaml) is introduced. This script handles complex cluster bootstrapping, node discovery, and dynamic joining logic for new or replacement pods.
  • Graceful Master Failover: Adds a default preStop lifecycle hook. This critical feature ensures that when a master pod is scheduled for termination (e.g., during a rolling update), it first triggers a controlled failover to one of its slaves before shutting down, minimizing downtime.
  • Adds Headless Service: Creates the necessary headless service for stable network discovery between pods within the StatefulSet.
  • Exposes Metrics Exporter: Adds a dedicated Kubernetes Service for the Prometheus metrics exporter sidecar. This makes cluster metrics easily discoverable and scrape-able for monitoring systems.
  • Adds a PodDisruptionBudget to protect the cluster during voluntary disruptions.

How the Auto-Clustering Works
The core of the cluster logic resides in the init-cluster.sh script, which runs as a background process in the main container:

Peer Discovery: Each pod uses the Kubernetes headless service to discover its peers.

Joining an Existing Cluster: If a new pod starts and discovers an already healthy cluster, it uses the CLUSTER MEET command to join. It then intelligently finds a master with a deficit of replicas and assigns itself as a replica using CLUSTER REPLICATE.

Initial Cluster Creation: If no cluster is found, the first pod (-0) takes on the role of the initiator. It waits for all other pods to become ready and then bootstraps the entire cluster using the valkey-cli --cluster create command with the appropriate number of replicas.

Resiliency: The script can handle pods restarting by attempting to re-join the cluster and forget its old failed entry if necessary.

Here are some screenshots demonstrating the chart in action after a successful deployment.
For deployment these settings were set:

replicaCount: 6
metrics:
  enabled: true
dataStorage:
  enabled: true
  requestedSize: "8Gi"
  1. Cluster Nodes Output
image

This shows the successful formation of a 6-node cluster (3 masters, 3 replicas).

  1. Running Pods and PVCs
image image

This confirms that the StatefulSet has successfully created all pods and their corresponding PersistentVolumeClaims.

  1. Services
image

@qjsoq qjsoq changed the title Cluster enable branch Add cluster mode support Oct 24, 2025
@mk-raven mk-raven requested review from mk-raven and sgissi October 24, 2025 14:49
qjsoq added 14 commits October 24, 2025 17:49
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
@mk-raven mk-raven added the enhancement New feature or request label Oct 24, 2025
@qjsoq qjsoq force-pushed the cluster-enable-branch branch from 4691cd5 to e5d63e6 Compare October 24, 2025 14:49
@mk-raven mk-raven self-assigned this Oct 29, 2025
@gugu
Copy link

gugu commented Nov 2, 2025

This PR also changes deployment to statefulset for non-cluster use cases. While it is not a compatible change I believe statefulset is more suitable for valkey workloads

@sgissi
Copy link
Collaborator

sgissi commented Nov 3, 2025

Thanks for submitting the PR. I would prefer a single chart to make it more consistent, but I understand that Valkey Cluster is quite different. I'll take a closer look on this chart this week.

Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
@qjsoq qjsoq force-pushed the cluster-enable-branch branch from 8e084de to 93ac771 Compare November 25, 2025 11:35
@macropin
Copy link

macropin commented Dec 1, 2025

We've been kicking the tyres on this PR, and it looks very promising. We'd love to see this PR accepted.

A few thoughts (not all related specifically to this PR):

  • Reliability of failover with a preStop hook remains to be tested.
  • Seems to work fine with v9.0.0
  • There are about half a dozen sleep x commands in the init scripts, while these are sometimes unavoidable they are nonetheless a code smell
  • valkey-cli seems rough, having to echo yes to the tool on cluster creation is awkward. It should not expect input when being run non-interactively
  • Its disappointing that "cluster mode" does not support a single primary with multiple replicas

During testing I refactored this PR locally to configure with

cluster:
  # Number of primary nodes in the cluster
  primaries: 3
  # Number of replicas per primary node
  replicasPerPrimary: 1

however until cluster mode supports a more arbitrary configuration then I don't think there is any benefit to this separation.

Signed-off-by: Dmytro Artamonov <artamonovdima355@gmail.com>
@qjsoq
Copy link
Author

qjsoq commented Dec 3, 2025

@sgissi
I just wanted to follow up about any updates regarding the review.
Also, I’ve recently added a few fixes for the readiness probe and cluster scaling up. Let me know if you have any questions about this PR!

@sgissi
Copy link
Collaborator

sgissi commented Dec 3, 2025

Hi @qjsoq, currently we are working on Replication and Sentinel, with Cluster being the next target. Ideally, I would like to keep a single chart to keep consistency and avoid repeating work across the two charts. At this point, I don't know yet if that is feasible or not. I estimate it will be a few weeks still until we are ready to review Cluster mode.

@lyatanski
Copy link

Wouldn't it be slightly more optimal if there is podManagementPolicy: Parallel set in the spec? On initial installation this should improve cluster creation time.

@qjsoq qjsoq force-pushed the cluster-enable-branch branch from bb7102b to e8afc2d Compare January 26, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants