A high-performance, resilient, and distributed task scheduling system built with Spring Boot, RabbitMQ, and PostgreSQL.
This project is an Enterprise-Grade Distributed System designed to handle background job processing at scale. Unlike simple in-memory schedulers, this system completely decouples the Producer (API) from the Consumer (Worker) using a reliable message broker (RabbitMQ). This allows the system to:
- Handle spikes in traffic without crashing (Backpressure).
- Process jobs in parallel across multiple worker nodes (Scalability).
- Guarantee job execution even if the server restarts (Persistence).
- Recover gracefully from failures (Resilience).
The system follows a microservices-ready architecture:
- Clients: Submit jobs via REST API or the Real-Time WebSocket Dashboard.
- API Layer (Producer): Validates requests, saves metadata to PostgreSQL (
QUEUED), and pushes tasks to RabbitMQ. - Message Broker (RabbitMQ): Routes messages to the correct queue based on Priority (
High,Medium,Low). - Worker Layer (Consumer): Listens to queues, executes the job logic, updates status to PostgreSQL (
RUNNING->COMPLETED), and broadcasts real-time events. - Database (PostgreSQL): Serves as the persistent source of truth for job history and state.
- Idempotency Guard: Ensures "Exactly-Once" processing. If a client sends the same request twice (same
idempotencyKey), the system detects the duplicate and returns the existing job without re-running it. - Persistent Storage: All data is reliable stored in PostgreSQL. Nothing is lost on restart.
- Dead Letter Queues (DLQ): Jobs that fail repeatedly are not deleted; they are moved to a special "morgue" queue for manual inspection.
- Priority Queues: Critical tasks skip the line. Supporting
HIGH,MEDIUM, andLOWpriority lanes. - Concurrency Tuning: Configured to process 10+ jobs in parallel per worker node. Tuned for 1000+ concurrent jobs.
- WebSockets (Real-Time): The dashboard updates instantly via
STOMPpush events. Zero-latency feedback.
- Delayed Execution: Schedule jobs to run in the future (e.g., "Send email in 30 seconds").
- Exponential Backoff: If a job fails, the system waits (2s, 4s, 8s) before retrying, preventing cascading failures.
- Zombie Job Killer: A background monitor automatically detects and fails jobs that hang for >5 minutes (
TimeoutMonitor).
| Component | Technology | Description |
|---|---|---|
| Language | Java 17 | Core programming language. |
| Framework | Spring Boot 3 | Dependency Injection, Web MVC, AMQP. |
| Messaging | RabbitMQ | Advanced Message Queuing Protocol (AMQP) broker. |
| Database | PostgreSQL | Relational database for persistent storage. |
| Real-Time | WebSocket / STOMP | Full-duplex communication channel for UI updates. |
| Container | Docker | Containerization of Database and Broker. |
| Build Tool | Maven | Dependency management. |
- Java 17+ Installed (
java -version). - Docker Desktop Installed & Running (
docker -v).
Use Docker Compose to spin up Postgres and RabbitMQ immediately.
docker-compose up -dWait 30s for the containers to initialize.
We have a custom script that handles environment setup.
.\run.ps1The app will start on PORT 8080.
How to verify everything works accurately.
Go to http://localhost:8080.
- You will see the Real-Time Feed and Live Stats.
- The "Connection" status should be healthy.
- In the Email Notification box, enter
test@example.com. - Click Send.
- Watch the Table: You will see a row appear instantly with status
QUEUED. - Watch the Update: Without refreshing, it will flip to
RUNNING(Yellow) ->COMPLETED(Green).
- Verification: If the row updates without you pressing F5, WebSockets are working.
- Submit an Image Upload (Simulated). This is a
DEFAULT(Low) priority task. - Immediately submit a Page Indexer task. This is
HIGHpriority. - Observation: The High priority task should often finish before the earlier Low priority task if the system is under load.
- Enter an email
unique@test.com. - In the Idempotency Key field, enter
key-123. - Click Send. Job ID #X is created.
- Click Send AGAIN (same key
key-123). - Observation: You will NOT see a new row. The backend detected the duplicate key and returned the original Job ID #X. This proves safety.
- Enter an email
future@test.com. - In the Delay (s) field, enter
15. - Click Send.
- Observation: The job is accepted, but it will NOT appear in the
QUEUEDstate or the worker logs for exactly 15 seconds. Then, it will suddenly appear and process.
Advanced Test:
- The system is configured to kill jobs running > 5 minutes.
- To test this quickly, you would typically modify
TimeoutMonitor.javato 10 seconds, but assuming default config: - If a job stuck in
RUNNINGforever (simulated infinite loop), after 5 minutes, theTimeoutMonitorwill force-update it toFAILEDwith statusTIMEOUT.
- Open 3 browser tabs.
- Spam the "Send" button in all tabs.
- The system will ingest all of them. RabbitMQ will buffer them. The Consumers will process them 5-at-a-time (as per config).
- Check Metrics API: Go to
http://localhost:8080/api/metricsto see the raw counters flying up.
src/
├── main/
│ ├── java/com/example/taskscheduler/
│ │ ├── config/ # RabbitMQ & WebSocket Config
│ │ ├── controller/ # REST APIs (Job, Metrics)
│ │ ├── dto/ # Data Transfer Objects
│ │ ├── model/ # JPA Entities (Job.java)
│ │ ├── processor/ # Job Logic (Email, Image, etc.)
│ │ ├── repository/ # DB Access (JobRepository)
│ │ ├── service/ # Business Logic (JobService)
│ │ └── worker/ # RabbitMQ Consumer (JobWorker)
│ └── resources/
│ ├── static/ # Frontend (index.html, JS, CSS)
│ └── application.properties
├── docker-compose.yml # Infra Definition
└── pom.xml # Dependencies