A server-intensive, computation-heavy backend system for autonomous wall-finishing robots. This project focuses on path planning, async task execution, persistence, observability, and stress resilience.
Video Walkthrough
This repository is accompanied by a structured video walkthrough that explains the problem statement, system design, implementation details, and performance characteristics end-to-end.
Meet Toby: The Wall-Painting Robot (Problem Statement & Motivation)
- Introduces the real-world inspiration behind the project and clearly defines the problem being solved.
- Video: https://drive.google.com/file/d/1wJMI0MsLGbiqv69CxK_AT0k06hciiRcb/view?usp=drive_link
System Architecture: API -> RabbitMQ -> Worker -> Database -> Metrics
- Explains the asynchronous architecture, message flow, and responsibilities of each component.
- Video: https://drive.google.com/file/d/1rE2ljH71nbIuctpqmEB1yLKEPChVOfWp/view?usp=drive_link
Code Walkthrough: Path Planning, Workers, and Persistence Layer
- Detailed walkthrough of the core codebase, including domain logic, workers, and data persistence.
- Video: https://drive.google.com/file/d/1RmAdAJqYt9HVRqGP0hrKGYpaRMF8Ghkr/view?usp=drive_link
End-to-End Execution: From API Request to Completed Trajectories
- Demonstrates a real API request flowing through RabbitMQ, workers, and into the database.
- Video: https://drive.google.com/file/d/1E78aM8C6nCbQwo3UaUUBobHi4GzliEKh/view?usp=drive_link
Observability in Action: Prometheus with Concurrent Requests
- Shows live metrics collection under concurrent load using Prometheus.
- Video: https://drive.google.com/file/d/1LubXzp6cnhF0Pq7z0J50b5wbmPJMiHxE/view?usp=drive_link
High Concurrency Demo: Handling 100 Parallel Requests
- Demonstrates system behavior and stability under high request concurrency.
- Video: https://drive.google.com/file/d/19Pn5g9Dzhq8l27ftD2BZmo7FiiztQBkt/view?usp=drive_link
System Visualization: Real-Time Monitoring with Grafana
- Explains dashboards for latency, throughput, and system health using Grafana.
- Video: https://drive.google.com/file/d/1ecBWPrB9zE7M4pZMw4O5l8CkPN1GVnQw/view?usp=drive_link
Load & Stress Testing: Large Grids and High-Volume Requests
- Pushes the system with large grids and heavy traffic to identify bottlenecks and limits.
- Video: https://drive.google.com/file/d/1P7rbEagE5ZQsXxx6jJB3pBTzRxJu5XRQ/view?usp=drive_link
Given a wall grid (2D matrix where 1 = obstacle, 0 = free cell):
- Accepts the grid via an API
- Enqueues the task to RabbitMQ
- A worker:
- Computes optimized coverage paths (DFS, BFS)
- Stores wall configuration and trajectories
- Exposes metrics for performance analysis
- Supports stress testing & scalability analysis
This mimics a real autonomous robot backend where computation, IO, and persistence are decoupled.
Client/API Call
|
v
Django
|
v
RabbitMQ (task queue)
|
v
Worker Container (path planning, DB writes)
|
v
PostgreSQL / Redis
|
v
Prometheus / Grafana (metrics & dashboards)
| Layer | Tech |
|---|---|
| Language | Python 3.14.0 |
| Web | Django + Django REST Framework |
| Messaging | RabbitMQ |
| Database | PostgreSQL |
| Metrics, Monitoring | Prometheus , Grafana |
| Testing | pytest |
| Load Testing | custom async/threaded scripts |
| Deployment | Docker / docker-compose |
Represents a unique wall layout.
rowscolsobstacles(stored as coordinates)
Represents a path planning result for a given method.
wallmethodpathsteps
Tracks request lifecycle.
request_idstatuswall
{
"grid": [
[0, 1, 0],
[0, 0, 0],
[1, 0, 0]
]
}- No rows/cols/obstacles needed
- Backend derives everything
Invalid input will give HTTP 400
git clone git@github.com:ajinzrathod/robotwall.git
cd robot-wall-coverage-system
python -m venv env
source env/bin/activate
pip install -r requirements.txtCreate .env file like this
# persistence database
POSTGRES_DB=wall_coverage
POSTGRES_USER=wall_user
POSTGRES_PASSWORD=secret_password
POSTGRES_HOST=127.0.0.1
POSTGRES_PORT=5432
# Django
DJANGO_SECRET_KEY=super-secret-key
DJANGO_DEBUG=True
# redis
REDIS_URL=redis://127.0.0.1:6379/1
# rabbitMQ
RABBITMQ_HOST=127.0.0.1
RABBITMQ_PORT=5672
RABBITMQ_USER=guest
RABBITMQ_PASSWORD=guest
RABBITMQ_TASK_QUEUE=wall.compute
RABBITMQ_RESULT_QUEUE=wall.resultsdocker-compose up -dVerify:
- Postgres running
- Redis running
- RabbitMQ UI: http://localhost:15672
python manage.py migratepython manage.py runserverpython manage.py shellfrom path_planner.workers.path_planning_worker import start_worker
start_worker()Ctrl+C to stop.
Check at:
http://localhost:9090/targets?search=
Metrics include:
task_processing_secondsdb_write_secondstask_failures_total
Scrape with Prometheus.
http://localhost:9090/targets?search=

Worker metrics:
http://localhost:9001/metrics

django metrics:
http://localhost:8000/path_planner/metrics

Grafana
http://localhost:3000
Import the dashboard we have Robotwall Dashboard.json
pytest -v
Expected traffic under normal conditions
- 100–200 concurrent requests
- Medium grids (≤ 25x25)
Push until things break
-
200+ concurrency
-
Large grids (100–500 rows/cols)
-
Observe:
- timeouts
- DB contention
Timeouts != crash. They define system limits.
Question: Why We Store Only Obstacles (Not the Entire Grid)
- Storing the full grid is wasteful, slow, and scales badly.
What actually matters long-term is where obstacles exist, not every empty cell.
Storing the Full Grid Is a Bad Idea
Example:
- For large grids (e.g. 500×500):
- 250,000 cells per request
- Mostly zeros (empty space)
- Larger DB rows -> slower writes and slower reads (Impacts Performace)
RabbitMQ is asynchronous, but that alone does not make the system async end-to-end.
By default, Django is synchronous:
- One request = one worker thread
- Heavy CPU work blocks that worker
- Large grids slow down everything behind them
- 1 large grid -> request hangs (Every other task in queue waits)
RabbitMQ only decouples when work is done, not how it is processed.
Worker 1:
WORKER_METRICS_PORT=9001 \
python manage.py shell -c "from path_planner.workers.path_planning_worker import start_worker; start_worker()"
Worker 2:
WORKER_METRICS_PORT=9002 \
python manage.py shell -c "from path_planner.workers.path_planning_worker import start_worker; start_worker()"
RabbitMQ automatically distributes tasks across workers.
RabbitMQ handles burst traffic, workers handle computation, Django stays responsive.