VisionAI

Fullstack Computer Vision CCTV Analytics Platform for Traffic Monitoring

Real-time traffic analytics system that processes CCTV streams using deep learning, built with Clean Architecture principles. Designed for scalability with a polyglot microservices approach — Python for computer vision & AI, Go for high-performance data processing.

Architecture

flowchart LR
    subgraph Input
        HLS[/"HLS Stream<br/>(CCTV Source)"/]
    end
    
    subgraph Infrastructure
        FFmpeg["FFmpeg<br/>GPU Transcoding"]
        MTX["MediaMTX<br/>RTSP Server"]
    end
    
    subgraph Services
        PD["Python Detector<br/>YOLO + OpenCV"]
        GB["Go Backend<br/>High Concurrency"]
        AI["AI Agent<br/>SQL Chatbot"]
    end
    
    subgraph Storage
        PG[("PostgreSQL")]
    end
    
    HLS --> FFmpeg --> MTX --> PD
    PD -->|Detection Data| GB --> PG
    PG <-->|Natural Language Query| AI

Why This Stack?

Service	Language	Rationale
Detector	Python	Rich ML/CV ecosystem (Ultralytics, OpenCV, PyTorch)
Backend	Go	Superior concurrency with goroutines for handling multiple CCTV streams, lower memory footprint
AI Agent	Python	LLM libraries and agentic frameworks availability

Features

Implemented

Real-time Vehicle Detection — YOLO-based object detection with GPU acceleration
RTSP Stream Processing — Low-latency video pipeline with MediaMTX
GPU-Accelerated Encoding — NVENC hardware encoding for output streams
Clean Architecture — Hexagonal/Ports-Adapters pattern for maintainability

Roadmap

Vehicle Counting — Track and count vehicles passing through defined zones
Traffic Density Analysis — Real-time congestion level monitoring
Crowd Forecasting — Predict traffic patterns using historical data
SQL Agent Chatbot — Natural language interface to query analytics data

Tech Stack

Python Detector

Technology	Purpose
Python 3.12	Runtime
Ultralytics YOLO	Object detection model
OpenCV	Video capture & processing
Pydantic	Data validation & settings
FFmpeg + NVENC	GPU-accelerated stream output

Go Backend (Planned)

Technology	Purpose
Go 1.22+	Runtime
PostgreSQL	Time-series detection storage
goroutines	Concurrent stream handling

AI Agent (Planned)

Technology	Purpose
Python	Runtime
LangChain/LlamaIndex	Agentic framework
LLM	Natural language to SQL

Clean Architecture

The Python Detector follows Clean Architecture / Hexagonal pattern:

python-detector/
├── app/
│   ├── domain/           # Enterprise business rules
│   │   ├── bbox.py       # Bounding box value object
│   │   ├── detection.py  # Detection entity
│   │   └── frame.py      # Frame entity
│   │
│   ├── usecases/         # Application business rules
│   │   ├── ports.py      # Abstract interfaces (ports)
│   │   └── detect_objects.py  # Detection use case
│   │
│   ├── adapters/         # Interface adapters
│   │   ├── vision/
│   │   │   └── yolo_detector.py  # YOLO implementation
│   │   └── video/
│   │       ├── rtsp_stream.py    # RTSP input adapter
│   │       └── rtsp_writer.py    # RTSP output adapter
│   │
│   ├── utils/            # Utilities
│   └── main.py           # Composition root

Key Principles:

Domain — Pure business entities, no external dependencies
Usecases — Application logic, depends only on domain and ports
Adapters — Implementations of ports (YOLO, RTSP, etc.)
Dependency Rule — Dependencies point inward, outer layers depend on inner layers

Project Structure

visionai/
├── python-detector/      # Computer vision service
│   ├── app/              # Application code (clean architecture)
│   ├── models/           # YOLO model weights
│   ├── Dockerfile
│   └── pyproject.toml
│
├── go-backend/           # Data processing service (planned)
│
├── ai-agent/             # LLM SQL agent (planned)
│
├── docker-compose.yml    # Infrastructure orchestration
├── mediamtx.yml          # RTSP server configuration
└── .env                  # Environment variables

Getting Started

Prerequisites

Docker & Docker Compose
NVIDIA GPU with CUDA support
NVIDIA Container Toolkit for GPU passthrough
uv (Python package manager) — for local development

Installation

Clone the repository

git clone https://github.com/evanhfw/visionai.git
cd visionai

Configure environment variables

cp .env.example .env
# Edit .env with your CCTV stream URL

Start infrastructure services

docker compose up -d mediamtx ffmpeg-hls

Run the detector (local development)

cd python-detector
uv sync
uv run python -m app.main

Configuration

Create a .env file in the project root:

# Input stream (HLS/RTSP source)
STREAM_SOURCE_HLS_URL=https://your-cctv-stream.m3u8

# Internal RTSP relay
STREAM_RELAY_RTSP_URL=rtsp://mediamtx:8554/cam1

# Detection output stream
STREAM_RTSP_OUT_URL=rtsp://localhost:8554/detected

# Model path
VEHICLE_DETECTOR_MODEL_PATH=models/yolo11m.pt

Usage

Viewing the Detection Stream

Once running, access the annotated video stream:

RTSP: rtsp://localhost:8554/detected
WebRTC: http://localhost:8889/detected

Keyboard Controls

Key	Action
`Q`	Quit the detector

Roadmap

Phase 1: Real-time detection pipeline
Phase 2: Go backend + PostgreSQL integration
Phase 3: Vehicle counting & traffic density
Phase 4: AI Agent with SQL chatbot
Phase 5: Dashboard & visualization

Author

Evan Hanif Widiatama

GitHub: @evanhfw

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
python-detector		python-detector
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
mediamtx.yml		mediamtx.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionAI

Architecture

Why This Stack?

Features

Implemented

Roadmap

Tech Stack

Python Detector

Go Backend (Planned)

AI Agent (Planned)

Clean Architecture

Project Structure

Getting Started

Prerequisites

Installation

Configuration

Usage

Viewing the Detection Stream

Keyboard Controls

Roadmap

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

VisionAI

Architecture

Why This Stack?

Features

Implemented

Roadmap

Tech Stack

Python Detector

Go Backend (Planned)

AI Agent (Planned)

Clean Architecture

Project Structure

Getting Started

Prerequisites

Installation

Configuration

Usage

Viewing the Detection Stream

Keyboard Controls

Roadmap

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages