A lightweight, real-time vehicle detection pipeline. It captures streams from RTSP sources or webcams, runs object detection via YOLOv8, and serves a live feed with traffic metrics through a WebSocket API.
Mainly to explore real-time inference constraints and multithreaded Python applications. We wanted to understand:
- How to handle RTSP streams without blocking existing frames
- Managing producer-consumer patterns in a latency-sensitive context
- Wiring up a FastAPI backend to serve annotated video streams
- Handling graceful degradation when the model falls behind the camera feed
The system is split into three decoupled components:
- Camera Module: Dedicated threads for reading frames. It handles reconnection logic independently to ensure the stream never dies, even if the feed drops temporarily.
- Inference Engine: A separate worker that pulls the latest frame from a circular buffer. If the model (YOLOv8) is slower than the camera, we drop frames intelligently to stay "real-time" rather than building up lag.
- API & Dashboard: A FastAPI server that exposes a WebSocket endpoint for the stream and REST endpoints for system health.
- CPU Bound: We are currently running this on CPU for portability. While YOLOv8n is fast, you might see lower FPS depending on your hardware. It supports CUDA if available.
- Python Threads: We use Python
threading, which is fine for I/O (camera reading), but the inference is still subject to the GIL. We mitigate this by keeping the inference step fast and decoupled.
The frontend dashboard (static/) was "vibe coded" to quickly visualize the data. It's a simple HTML/JS page that connects to the WebSocket. It's not a production React app, but it gets the job done for the demo.
-
Install dependencies (conda environment recommended):
pip install -r requirements.txt
-
Configure your camera in
config.yaml. By default, it looks for a local webcam (0). -
Start the server:
python main.py
-
Open the dashboard at
http://localhost:8000.
