Fullstack Computer Vision CCTV Analytics Platform for Traffic Monitoring
Real-time traffic analytics system that processes CCTV streams using deep learning, built with Clean Architecture principles. Designed for scalability with a polyglot microservices approach — Python for computer vision & AI, Go for high-performance data processing.
flowchart LR
subgraph Input
HLS[/"HLS Stream<br/>(CCTV Source)"/]
end
subgraph Infrastructure
FFmpeg["FFmpeg<br/>GPU Transcoding"]
MTX["MediaMTX<br/>RTSP Server"]
end
subgraph Services
PD["Python Detector<br/>YOLO + OpenCV"]
GB["Go Backend<br/>High Concurrency"]
AI["AI Agent<br/>SQL Chatbot"]
end
subgraph Storage
PG[("PostgreSQL")]
end
HLS --> FFmpeg --> MTX --> PD
PD -->|Detection Data| GB --> PG
PG <-->|Natural Language Query| AI
| Service | Language | Rationale |
|---|---|---|
| Detector | Python | Rich ML/CV ecosystem (Ultralytics, OpenCV, PyTorch) |
| Backend | Go | Superior concurrency with goroutines for handling multiple CCTV streams, lower memory footprint |
| AI Agent | Python | LLM libraries and agentic frameworks availability |
- Real-time Vehicle Detection — YOLO-based object detection with GPU acceleration
- RTSP Stream Processing — Low-latency video pipeline with MediaMTX
- GPU-Accelerated Encoding — NVENC hardware encoding for output streams
- Clean Architecture — Hexagonal/Ports-Adapters pattern for maintainability
- Vehicle Counting — Track and count vehicles passing through defined zones
- Traffic Density Analysis — Real-time congestion level monitoring
- Crowd Forecasting — Predict traffic patterns using historical data
- SQL Agent Chatbot — Natural language interface to query analytics data
| Technology | Purpose |
|---|---|
| Python 3.12 | Runtime |
| Ultralytics YOLO | Object detection model |
| OpenCV | Video capture & processing |
| Pydantic | Data validation & settings |
| FFmpeg + NVENC | GPU-accelerated stream output |
| Technology | Purpose |
|---|---|
| Go 1.22+ | Runtime |
| PostgreSQL | Time-series detection storage |
| goroutines | Concurrent stream handling |
| Technology | Purpose |
|---|---|
| Python | Runtime |
| LangChain/LlamaIndex | Agentic framework |
| LLM | Natural language to SQL |
The Python Detector follows Clean Architecture / Hexagonal pattern:
python-detector/
├── app/
│ ├── domain/ # Enterprise business rules
│ │ ├── bbox.py # Bounding box value object
│ │ ├── detection.py # Detection entity
│ │ └── frame.py # Frame entity
│ │
│ ├── usecases/ # Application business rules
│ │ ├── ports.py # Abstract interfaces (ports)
│ │ └── detect_objects.py # Detection use case
│ │
│ ├── adapters/ # Interface adapters
│ │ ├── vision/
│ │ │ └── yolo_detector.py # YOLO implementation
│ │ └── video/
│ │ ├── rtsp_stream.py # RTSP input adapter
│ │ └── rtsp_writer.py # RTSP output adapter
│ │
│ ├── utils/ # Utilities
│ └── main.py # Composition root
Key Principles:
- Domain — Pure business entities, no external dependencies
- Usecases — Application logic, depends only on domain and ports
- Adapters — Implementations of ports (YOLO, RTSP, etc.)
- Dependency Rule — Dependencies point inward, outer layers depend on inner layers
visionai/
├── python-detector/ # Computer vision service
│ ├── app/ # Application code (clean architecture)
│ ├── models/ # YOLO model weights
│ ├── Dockerfile
│ └── pyproject.toml
│
├── go-backend/ # Data processing service (planned)
│
├── ai-agent/ # LLM SQL agent (planned)
│
├── docker-compose.yml # Infrastructure orchestration
├── mediamtx.yml # RTSP server configuration
└── .env # Environment variables
- Docker & Docker Compose
- NVIDIA GPU with CUDA support
- NVIDIA Container Toolkit for GPU passthrough
- uv (Python package manager) — for local development
-
Clone the repository
git clone https://github.com/evanhfw/visionai.git cd visionai -
Configure environment variables
cp .env.example .env # Edit .env with your CCTV stream URL -
Start infrastructure services
docker compose up -d mediamtx ffmpeg-hls
-
Run the detector (local development)
cd python-detector uv sync uv run python -m app.main
Create a .env file in the project root:
# Input stream (HLS/RTSP source)
STREAM_SOURCE_HLS_URL=https://your-cctv-stream.m3u8
# Internal RTSP relay
STREAM_RELAY_RTSP_URL=rtsp://mediamtx:8554/cam1
# Detection output stream
STREAM_RTSP_OUT_URL=rtsp://localhost:8554/detected
# Model path
VEHICLE_DETECTOR_MODEL_PATH=models/yolo11m.ptOnce running, access the annotated video stream:
- RTSP:
rtsp://localhost:8554/detected - WebRTC:
http://localhost:8889/detected
| Key | Action |
|---|---|
Q |
Quit the detector |
- Phase 1: Real-time detection pipeline
- Phase 2: Go backend + PostgreSQL integration
- Phase 3: Vehicle counting & traffic density
- Phase 4: AI Agent with SQL chatbot
- Phase 5: Dashboard & visualization
Evan Hanif Widiatama
- GitHub: @evanhfw
This project is licensed under the MIT License — see the LICENSE file for details.