Robot Wall Coverage System

A server-intensive, computation-heavy backend system for autonomous wall-finishing robots. This project focuses on path planning, async task execution, persistence, observability, and stress resilience.

Video Walkthrough

This repository is accompanied by a structured video walkthrough that explains the problem statement, system design, implementation details, and performance characteristics end-to-end.

Meet Toby: The Wall-Painting Robot (Problem Statement & Motivation)

Introduces the real-world inspiration behind the project and clearly defines the problem being solved.
Video: https://drive.google.com/file/d/1wJMI0MsLGbiqv69CxK_AT0k06hciiRcb/view?usp=drive_link

System Architecture: API -> RabbitMQ -> Worker -> Database -> Metrics

Explains the asynchronous architecture, message flow, and responsibilities of each component.
Video: https://drive.google.com/file/d/1rE2ljH71nbIuctpqmEB1yLKEPChVOfWp/view?usp=drive_link

Code Walkthrough: Path Planning, Workers, and Persistence Layer

Detailed walkthrough of the core codebase, including domain logic, workers, and data persistence.
Video: https://drive.google.com/file/d/1RmAdAJqYt9HVRqGP0hrKGYpaRMF8Ghkr/view?usp=drive_link

End-to-End Execution: From API Request to Completed Trajectories

Demonstrates a real API request flowing through RabbitMQ, workers, and into the database.
Video: https://drive.google.com/file/d/1E78aM8C6nCbQwo3UaUUBobHi4GzliEKh/view?usp=drive_link

Observability in Action: Prometheus with Concurrent Requests

Shows live metrics collection under concurrent load using Prometheus.
Video: https://drive.google.com/file/d/1LubXzp6cnhF0Pq7z0J50b5wbmPJMiHxE/view?usp=drive_link

High Concurrency Demo: Handling 100 Parallel Requests

Demonstrates system behavior and stability under high request concurrency.
Video: https://drive.google.com/file/d/19Pn5g9Dzhq8l27ftD2BZmo7FiiztQBkt/view?usp=drive_link

System Visualization: Real-Time Monitoring with Grafana

Explains dashboards for latency, throughput, and system health using Grafana.
Video: https://drive.google.com/file/d/1ecBWPrB9zE7M4pZMw4O5l8CkPN1GVnQw/view?usp=drive_link

Load & Stress Testing: Large Grids and High-Volume Requests

Pushes the system with large grids and heavy traffic to identify bottlenecks and limits.
Video: https://drive.google.com/file/d/1P7rbEagE5ZQsXxx6jJB3pBTzRxJu5XRQ/view?usp=drive_link

1. What This System Does (High Level)

Given a wall grid (2D matrix where 1 = obstacle, 0 = free cell):

Accepts the grid via an API
Enqueues the task to RabbitMQ
A worker:
- Computes optimized coverage paths (DFS, BFS)
- Stores wall configuration and trajectories
Exposes metrics for performance analysis
Supports stress testing & scalability analysis

This mimics a real autonomous robot backend where computation, IO, and persistence are decoupled.

2. Architecture Overview

Client/API Call
       |
       v
   Django
       |
       v
  RabbitMQ (task queue)
       |
       v
Worker Container (path planning, DB writes)
       |
       v
PostgreSQL / Redis
       |
       v
Prometheus / Grafana (metrics & dashboards)

3. Tech Stack

Layer	Tech
Language	Python 3.14.0
Web	Django + Django REST Framework
Messaging	RabbitMQ
Database	PostgreSQL
Metrics, Monitoring	Prometheus , Grafana
Testing	pytest
Load Testing	custom async/threaded scripts
Deployment	Docker / docker-compose

4. Core Concepts & Models

WallConfig

Represents a unique wall layout.

rows
cols
obstacles (stored as coordinates)

Trajectory

Represents a path planning result for a given method.

wall
method
path
steps

CoverageTask

Tracks request lifecycle.

request_id
status
wall

5. Input Format

API expects ONLY a grid

{
  "grid": [
    [0, 1, 0],
    [0, 0, 0],
    [1, 0, 0]
  ]
}

No rows/cols/obstacles needed
Backend derives everything

Invalid input will give HTTP 400

6. Running Locally (Dev Mode)

6.1 Clone & Setup

git clone git@github.com:ajinzrathod/robotwall.git
cd robot-wall-coverage-system
python -m venv env
source env/bin/activate
pip install -r requirements.txt

Create .env file like this

# persistence database
POSTGRES_DB=wall_coverage
POSTGRES_USER=wall_user
POSTGRES_PASSWORD=secret_password
POSTGRES_HOST=127.0.0.1
POSTGRES_PORT=5432

# Django
DJANGO_SECRET_KEY=super-secret-key
DJANGO_DEBUG=True

# redis
REDIS_URL=redis://127.0.0.1:6379/1

# rabbitMQ
RABBITMQ_HOST=127.0.0.1
RABBITMQ_PORT=5672
RABBITMQ_USER=guest
RABBITMQ_PASSWORD=guest
RABBITMQ_TASK_QUEUE=wall.compute
RABBITMQ_RESULT_QUEUE=wall.results

6.2 Start Infrastructure

docker-compose up -d

Verify:

Postgres running
Redis running
RabbitMQ UI: http://localhost:15672

6.3 Migrate DB

python manage.py migrate

6.4 Start API

python manage.py runserver

6.5 Start Worker (Local)

python manage.py shell

from path_planner.workers.path_planning_worker import start_worker
start_worker()

Ctrl+C to stop.

7. Metrics & Observability

Worker Metrics

Check at:

http://localhost:9090/targets?search=

Metrics include:

task_processing_seconds
db_write_seconds
task_failures_total

Scrape with Prometheus.

http://localhost:9090/targets?search=

Worker metrics: http://localhost:9001/metrics

django metrics: http://localhost:8000/path_planner/metrics

Grafana http://localhost:3000 Import the dashboard we have Robotwall Dashboard.json

8. Testing Strategy

8.1 Unit Tests

pytest -v

9. Load vs Stress Testing (Important) (Depends on System)

Load Testing

Expected traffic under normal conditions

100–200 concurrent requests
Medium grids (≤ 25x25)

Stress Testing

Push until things break

200+ concurrency
Large grids (100–500 rows/cols)
Observe:
- timeouts
- DB contention

Timeouts != crash. They define system limits.

10. Storing Obstacles Instead of Full Grids

Question: Why We Store Only Obstacles (Not the Entire Grid)

Storing the full grid is wasteful, slow, and scales badly.

What actually matters long-term is where obstacles exist, not every empty cell.

Storing the Full Grid Is a Bad Idea

Example:

For large grids (e.g. 500×500):
- 250,000 cells per request
- Mostly zeros (empty space)
- Larger DB rows -> slower writes and slower reads (Impacts Performace)

11. The Core Problem: RabbitMQ Is Async, Django Is Not

RabbitMQ is asynchronous, but that alone does not make the system async end-to-end.

By default, Django is synchronous:

One request = one worker thread
Heavy CPU work blocks that worker
Large grids slow down everything behind them

Problem?

1 large grid -> request hangs (Every other task in queue waits)

RabbitMQ only decouples when work is done, not how it is processed.

Solution: Run Multiple Workers Locally

Worker 1:

WORKER_METRICS_PORT=9001 \
python manage.py shell -c "from path_planner.workers.path_planning_worker import start_worker; start_worker()"

Worker 2:

WORKER_METRICS_PORT=9002 \
python manage.py shell -c "from path_planner.workers.path_planning_worker import start_worker; start_worker()"

RabbitMQ automatically distributes tasks across workers.

Result:

RabbitMQ handles burst traffic, workers handle computation, Django stays responsive.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
path_planner		path_planner
robotwall		robotwall
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
manage.py		manage.py
prometheus.yml		prometheus.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
robotwall_grafana_dashboard.json		robotwall_grafana_dashboard.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robot Wall Coverage System

1. What This System Does (High Level)

2. Architecture Overview

3. Tech Stack

4. Core Concepts & Models

WallConfig

Trajectory

CoverageTask

5. Input Format

API expects ONLY a grid

6. Running Locally (Dev Mode)

6.1 Clone & Setup

6.2 Start Infrastructure

6.3 Migrate DB

6.4 Start API

6.5 Start Worker (Local)

7. Metrics & Observability

Worker Metrics

8. Testing Strategy

8.1 Unit Tests

9. Load vs Stress Testing (Important) (Depends on System)

Load Testing

Stress Testing

10. Storing Obstacles Instead of Full Grids

11. The Core Problem: RabbitMQ Is Async, Django Is Not

Problem?

Solution: Run Multiple Workers Locally

Result:

About

Uh oh!

Releases

Packages

Languages

ajinzrathod/robotwall

Folders and files

Latest commit

History

Repository files navigation

Robot Wall Coverage System

1. What This System Does (High Level)

2. Architecture Overview

3. Tech Stack

4. Core Concepts & Models

WallConfig

Trajectory

CoverageTask

5. Input Format

API expects ONLY a grid

6. Running Locally (Dev Mode)

6.1 Clone & Setup

6.2 Start Infrastructure

6.3 Migrate DB

6.4 Start API

6.5 Start Worker (Local)

7. Metrics & Observability

Worker Metrics

8. Testing Strategy

8.1 Unit Tests

9. Load vs Stress Testing (Important) (Depends on System)

Load Testing

Stress Testing

10. Storing Obstacles Instead of Full Grids

11. The Core Problem: RabbitMQ Is Async, Django Is Not

Problem?

Solution: Run Multiple Workers Locally

Result:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages