A scalable, distributed priority queue system designed to efficiently handle high-concurrency workloads. This project highlights key concepts in distributed systems, fault tolerance, and load balancing, drawing inspiration from real-world applications like Facebook's priority queue.
- Utilises Protocol Buffers (Protobuf) for efficient and language-agnostic serialization of messages.
- gRPC is used for communication between services, ensuring fast and reliable bi-directional streaming.
- Paxos ensures fault tolerance within the leader-follower replication model, maintaining consistency and availability even if nodes or leaders fail.

- Leader-Follower Model: A leader node handles write operations, while follower nodes replicate data for fault tolerance.
- Quorum-Based Approach: Requires a majority of nodes to acknowledge a change before it’s considered committed, ensuring data consistency.
- A gRPC-based weighted round-robin load balancer that dynamically adjusts node weights using real-time health metrics such as CPU usage, queue depth, and task processing rates.

- Long Polling allows consumers to pull jobs from the queue only when they are available, optimizing resource usage and reducing idle time.
- Pull Model ensures that consumers only retrieve jobs when needed, improving overall efficiency.
The system uses PostgreSQL for persistent job storage.
The schema for the job table:
CREATE TABLE jobs (
job_id BIGSERIAL PRIMARY KEY, -- Auto-incrementing job_id
priority INT CHECK (priority BETWEEN 1 AND 5), -- Validate priority (1-5)
payload BYTEA, -- Payload as a byte array
created_at TIMESTAMPTZ DEFAULT now(), -- Timestamp for creation
);
- Error Logs: Tracks and reports errors with relevant details.
- Request Logs: Logs details about requests received by each node. Example log entries: Error Log:
2024-12-27 12:34:56.789 [ERROR] <error_route>: <error_message>
Request Log:
2024-12-27 12:45:00.123 [INFO] index: Received <request_method> request for <request_uri> from IP: <client_IP>
The system consists of four main components, each with distinct responsibilities:
- Role: Provides an API for job submission, buffering incoming tasks, and distributing them to nodes.
- Functionality:
- Buffers tasks temporarily before distributing them based on priority and node availability.
- The Round-Robin Load Balancer distributes tasks according to weighted priority, ensuring high-priority tasks are prioritized but maintaining fairness across all nodes.
- Role: The leader node processes job enqueue requests and manages data replication across followers.
- Functionality:
- Receives job submission requests from the enqueue manager.
- Inserts new jobs into the database after the Paxos consensus protocol ensures data consistency.
- Maintains fault tolerance by managing leader-follower replication.
- Role: The follower nodes contain local priority queues and implement long-polling mechanisms for pulling jobs.
- Functionality:
- Each follower node maintains a local queue and processes jobs based on their priority.
- Long-polling ensures that a consumer only retrieves jobs when they are available, optimizing system resources and reducing idle time.
- Role: The consumer pulls jobs from the distributed queue, processes them, and acknowledges their completion.
- Functionality:
- Uses long-polling to efficiently wait for and fetch jobs, ensuring resources are not wasted while waiting for tasks.
Facebook Engineering (2021) FOQS scaling a distributed priority queue. Available at: https://engineering.fb.com/2021/02/22/production-engineering/foqs-scaling-a-distributed-priority-queue/ (Accessed: 15 December 2024).
Jordan has no life (2020) 31: Distributed Priority Queue | Systems Design Interview Questions With Ex-Google SWE [YouTube]. Available at: https://www.youtube.com/watch?v=PFsjVT-XwmA (Accessed: 15 December 2024).
Stack Overflow (2011) How to prevent low-priority messages on an ActiveMQ prioritized queue from being consumed indefinitely?. Available at: https://stackoverflow.com/questions/6393135/how-to-prevent-low-priority-messages-on-an-activemq-prioritized-queue-from-being (Accessed: 20 December 2024).
Guru99 (2021) Round-robin scheduling example. Available at: https://www.guru99.com/round-robin-scheduling-example.html (Accessed: 20 December 2024).
Towards Data Science (2020) Priority queues. Available at: https://towardsdatascience.com/course-2-data-structure-part-2-priority-queues-and-disjoint-set-ed11a0383011 (Accessed: 20 December 2024).
Educative (n.d.) Leader and follower replication. Available at: https://www.educative.io/answers/leader-and-follower-replication (Accessed: 21 December 2024).
Educative (n.d.) What is quorum in distributed systems. Available at: https://www.educative.io/answers/what-is-quorum-in-distributed-systems (Accessed: 21 December 2024).
