Skip to content

ajinzrathod/robotwall

Repository files navigation

Robot Wall Coverage System

A server-intensive, computation-heavy backend system for autonomous wall-finishing robots. This project focuses on path planning, async task execution, persistence, observability, and stress resilience.


Video Walkthrough

This repository is accompanied by a structured video walkthrough that explains the problem statement, system design, implementation details, and performance characteristics end-to-end.

Meet Toby: The Wall-Painting Robot (Problem Statement & Motivation)

System Architecture: API -> RabbitMQ -> Worker -> Database -> Metrics

Code Walkthrough: Path Planning, Workers, and Persistence Layer

End-to-End Execution: From API Request to Completed Trajectories

Observability in Action: Prometheus with Concurrent Requests

High Concurrency Demo: Handling 100 Parallel Requests

System Visualization: Real-Time Monitoring with Grafana

Load & Stress Testing: Large Grids and High-Volume Requests


1. What This System Does (High Level)

Given a wall grid (2D matrix where 1 = obstacle, 0 = free cell):

  1. Accepts the grid via an API
  2. Enqueues the task to RabbitMQ
  3. A worker:
    • Computes optimized coverage paths (DFS, BFS)
    • Stores wall configuration and trajectories
  4. Exposes metrics for performance analysis
  5. Supports stress testing & scalability analysis

This mimics a real autonomous robot backend where computation, IO, and persistence are decoupled.


2. Architecture Overview

Client/API Call
       |
       v
   Django
       |
       v
  RabbitMQ (task queue)
       |
       v
Worker Container (path planning, DB writes)
       |
       v
PostgreSQL / Redis
       |
       v
Prometheus / Grafana (metrics & dashboards)

3. Tech Stack

Layer Tech
Language Python 3.14.0
Web Django + Django REST Framework
Messaging RabbitMQ
Database PostgreSQL
Metrics, Monitoring Prometheus , Grafana
Testing pytest
Load Testing custom async/threaded scripts
Deployment Docker / docker-compose

4. Core Concepts & Models

WallConfig

Represents a unique wall layout.

  • rows
  • cols
  • obstacles (stored as coordinates)

Trajectory

Represents a path planning result for a given method.

  • wall
  • method
  • path
  • steps

CoverageTask

Tracks request lifecycle.

  • request_id
  • status
  • wall

5. Input Format

API expects ONLY a grid

{
  "grid": [
    [0, 1, 0],
    [0, 0, 0],
    [1, 0, 0]
  ]
}
  • No rows/cols/obstacles needed
  • Backend derives everything

Invalid input will give HTTP 400


6. Running Locally (Dev Mode)

6.1 Clone & Setup

git clone git@github.com:ajinzrathod/robotwall.git
cd robot-wall-coverage-system
python -m venv env
source env/bin/activate
pip install -r requirements.txt

Create .env file like this

# persistence database
POSTGRES_DB=wall_coverage
POSTGRES_USER=wall_user
POSTGRES_PASSWORD=secret_password
POSTGRES_HOST=127.0.0.1
POSTGRES_PORT=5432

# Django
DJANGO_SECRET_KEY=super-secret-key
DJANGO_DEBUG=True

# redis
REDIS_URL=redis://127.0.0.1:6379/1

# rabbitMQ
RABBITMQ_HOST=127.0.0.1
RABBITMQ_PORT=5672
RABBITMQ_USER=guest
RABBITMQ_PASSWORD=guest
RABBITMQ_TASK_QUEUE=wall.compute
RABBITMQ_RESULT_QUEUE=wall.results

6.2 Start Infrastructure

docker-compose up -d

Verify:


6.3 Migrate DB

python manage.py migrate

6.4 Start API

python manage.py runserver

6.5 Start Worker (Local)

python manage.py shell
from path_planner.workers.path_planning_worker import start_worker
start_worker()

Ctrl+C to stop.


7. Metrics & Observability

Worker Metrics

Check at:

http://localhost:9090/targets?search=

Metrics include:

  • task_processing_seconds
  • db_write_seconds
  • task_failures_total

Scrape with Prometheus.

http://localhost:9090/targets?search= Screenshot 2026-01-12 at 14 20 55

Worker metrics: http://localhost:9001/metrics image

django metrics: http://localhost:8000/path_planner/metrics image

Grafana http://localhost:3000 Import the dashboard we have Robotwall Dashboard.json

image

8. Testing Strategy

8.1 Unit Tests

pytest -v
image

9. Load vs Stress Testing (Important) (Depends on System)

Load Testing

Expected traffic under normal conditions

  • 100–200 concurrent requests
  • Medium grids (≤ 25x25)

Stress Testing

Push until things break

  • 200+ concurrency

  • Large grids (100–500 rows/cols)

  • Observe:

    • timeouts
    • DB contention

Timeouts != crash. They define system limits.

10. Storing Obstacles Instead of Full Grids

Question: Why We Store Only Obstacles (Not the Entire Grid)

  • Storing the full grid is wasteful, slow, and scales badly.

What actually matters long-term is where obstacles exist, not every empty cell.

Storing the Full Grid Is a Bad Idea

Example:

  • For large grids (e.g. 500×500):
    • 250,000 cells per request
    • Mostly zeros (empty space)
    • Larger DB rows -> slower writes and slower reads (Impacts Performace)

11. The Core Problem: RabbitMQ Is Async, Django Is Not

RabbitMQ is asynchronous, but that alone does not make the system async end-to-end.

By default, Django is synchronous:

  • One request = one worker thread
  • Heavy CPU work blocks that worker
  • Large grids slow down everything behind them

Problem?

  • 1 large grid -> request hangs (Every other task in queue waits)

RabbitMQ only decouples when work is done, not how it is processed.

Solution: Run Multiple Workers Locally

Worker 1:

WORKER_METRICS_PORT=9001 \
python manage.py shell -c "from path_planner.workers.path_planning_worker import start_worker; start_worker()"

Worker 2:

WORKER_METRICS_PORT=9002 \
python manage.py shell -c "from path_planner.workers.path_planning_worker import start_worker; start_worker()"

RabbitMQ automatically distributes tasks across workers.

image

Result:

RabbitMQ handles burst traffic, workers handle computation, Django stays responsive.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages