feat(cache): add Redis Cluster support for HA deployments#698
feat(cache): add Redis Cluster support for HA deployments#698krhoward-amd wants to merge 1 commit intorackslab:mainfrom
Conversation
Implements Redis Cluster client support in slurm-web agent to enable
high-availability caching across distributed Redis clusters.
## Problem
Slurm-web currently supports only standalone Redis instances for caching.
In high-availability deployments with Redis Cluster (3+ node clustered Redis),
slurm-web agents fail to connect because they use the standard redis.Redis()
client instead of the cluster-aware redis.cluster.RedisCluster() client.
## Solution
This commit adds optional Redis Cluster support while maintaining full
backwards compatibility with standalone Redis deployments.
### Core Changes
**slurmweb/cache.py**:
- Import RedisCluster and ClusterNode from redis.cluster
- Add cluster_mode and cluster_nodes optional parameters to CachingService
- Implement cluster mode initialization with RedisCluster client
- Parse cluster_nodes from "host:port" string format
- Add connection validation with fail-fast error handling
**slurmweb/apps/agent.py**:
- Pass cluster_mode and cluster_nodes parameters to CachingService
- Use getattr() with defaults for backwards compatibility
**conf/vendor/agent.yml**:
- Add cluster_mode boolean parameter (default: false)
- Add cluster_nodes list parameter with string content type
- Document configuration with examples
## Features
- **Opt-in design**: Cluster mode disabled by default (cluster_mode=false)
- **Automatic failover**: Cluster continues if a Redis node fails
- **Load distribution**: Requests distributed across cluster nodes
- **Backwards compatible**: Existing standalone configurations work unchanged
- **Fail-fast validation**: Connection tested at initialization
## Configuration Example
```ini
[cache]
enabled = yes
cluster_mode = yes
cluster_nodes =
10.0.0.1:6379
10.0.0.2:6379
10.0.0.3:6379
jobs = 30
nodes = 30
```
## Testing
Tested on production environment:
- Slurm-web 6.0.0
- Redis cluster: 3 nodes
- Slurm controllers: 2 nodes
- OS: Ubuntu 24.04
- Verified backward compatibility with standalone mode
## Implementation Notes
- Uses "host:port" string format for RFL schema compatibility (list content type must be str, not dict)
- skip_full_coverage_check=True allows partial cluster visibility
- decode_responses=False maintains pickle serialization compatibility
- Connection validated with ping() at initialization
Closes: #[issue-number]
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
|
________________________________
I have read the CLA Document and I hereby sign the CLA
Kris Howard
…________________________________
|
Hello @krhoward-amd thank you very much for your interest in Slurm-web and your contribution! Unfortunately, it seems the CLA assistant bot failed miserably to parse your message sent by email 😕 Can you please the copy/paste the line in a new comment sent in GitHub web interface to sign the CLA? |
Implements Redis Cluster client support in slurm-web agent to enable high-availability caching across distributed Redis clusters.
Problem
Slurm-web currently supports only standalone Redis instances for caching. In high-availability deployments with Redis Cluster (3+ node clustered Redis), slurm-web agents fail to connect because they use the standard redis.Redis() client instead of the cluster-aware redis.cluster.RedisCluster() client.
Solution
This commit adds optional Redis Cluster support while maintaining full backwards compatibility with standalone Redis deployments.
Core Changes
slurmweb/cache.py:
slurmweb/apps/agent.py:
conf/vendor/agent.yml:
Features
Configuration Example
Testing
Tested on production environment:
Implementation Notes
Closes: #[issue-number]