Skip to content

Conversation

@Ahmad-cercli
Copy link
Contributor

Summary

Implements multi-process worker support by storing task execution details in Redis instead of in-memory dictionaries.

Problem

Current implementation stores task payloads (func, args, kwargs) in process-local memory (worker_pool._tasks), which breaks when running multiple worker processes:

  • Process A submits task → stores in Process A's memory
  • Process B's worker picks up task via ZPOPMIN → can't find execution details in Process B's memory
  • Task orphaned with status "queued" but never processed

This makes Kew incompatible with common production patterns like Uvicorn/Gunicorn with --workers N > 1.

Solution

Store task payloads in Redis using cloudpickle serialization, enabling any worker process to execute any task.

Key Changes

1. Task Payload Storage (submit_task)

  • Serialize func/args/kwargs with cloudpickle.dumps()
  • Store in Redis as task_payload:{task_id} with base64 encoding (for decode_responses=True compatibility)
  • Expires after 24 hours (configurable via TASK_EXPIRY_SECONDS)

2. Task Execution (_process_queue)

  • Load payload from Redis instead of worker_pool._tasks
  • Deserialize with cloudpickle.loads()
  • Clean up payload after task completes

3. Circuit Breaker Configuration

  • Added max_circuit_breaker_failures parameter to QueueConfig
  • Allows effectively disabling circuit breaker for multi-process setups where process-local circuit breakers don't coordinate

4. Cleanup Improvements

  • Automatic deletion of task_payload:* keys after task completion
  • Added TASK_PAYLOAD_PREFIX constant for consistency

Backward Compatibility

  • ✅ Single-process deployments work unchanged
  • ✅ Existing QueueConfig parameters compatible (new param has default)
  • ✅ No breaking API changes
  • ✅ Existing tests pass

Testing

Tested in production with:

  • 4 Uvicorn workers processing 100+ concurrent tasks
  • No duplicate task executions observed
  • Tasks correctly distributed across all worker processes
  • OCR and AI screening workloads running for 1 week stable

Dependencies

  • Added cloudpickle>=3.0.0 for robust function/closure serialization

Checklist

  • ✅ Tests pass
  • ✅ Code formatted with Black
  • ✅ Imports sorted with isort
  • ✅ No linting errors
  • ✅ CHANGELOG.md updated

Related Issues

Implements roadmap item: "Distributed workers with coordination (locks) for multi-process scaling"

@justrach
Copy link
Owner

Hey! The CI is failing due to deprecated GitHub Actions versions in the workflow file. I've just pushed a fix to main that updates them to v4/v5. Could you rebase your branch on main to pick up the fix? Thanks!

AhmadElsaeedd and others added 4 commits January 20, 2026 09:55
- Serialize task functions/args/kwargs to Redis using cloudpickle
- Workers load payloads from Redis instead of in-memory storage
- Enables distributed workers across multiple processes/machines
- Add max_circuit_breaker_failures config to disable circuit breaker
- Implement automatic cleanup of task payloads after completion

Implements roadmap item: 'Distributed workers with coordination for multi-process scaling'
@Ahmad-cercli Ahmad-cercli force-pushed the fix/multi-process-redis-storage branch from b725191 to d00e32d Compare January 20, 2026 05:55
@justrach
Copy link
Owner

LGTM

@justrach justrach merged commit 008da37 into justrach:main Jan 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants