Unified pipeline for the Virtual Fence project—detect, track, and count people entering a protected zone using YOLO, OmniVLM, and RT-DETR. This README consolidates the specifications under specs/ and the task briefs in task/ into a single reference, including hands-on guidance for dataset sourcing, CVAT annotation, Nexa SDK setup, and model benchmarking.
- Project Specs & Task Overview
- Environment & Tooling
- Dataset Preparation
- Annotation Workflow (CVAT)
- Model Workflows
- Results Summary
- Commands Cheat Sheet
- Raspberry Pi / Edge Notes
- Troubleshooting
- References
- 📄 Specifications:
specs/virtual_fence_task_spec.mddefines objectives, datasets, metrics, and success criteria. - 📥 Task briefs:
task/VirtualFence.mdandtask/Person_Counting_Task.pdfdescribe the zone-counting deliverable. - ✅ Key requirements:
- Combine CrowdHuman, MOT17, and custom annotated clips.
- Implement YOLO, OmniVLM, and a custom detector (RT-DETR).
- Produce annotated output video with zone overlay and entry counter.
- Benchmark all methods on the same clips: mAP, IDF1/MOTA, counting MAE, FPS.
Prerequisites:
- Python 3.13 (conda recommended).
- NVIDIA GPU + CUDA toolkit for accelerated training (optional but preferred).
- Nexa SDK for OmniVLM workflows.
- Docker (for CVAT) and FFmpeg for video processing.
Set up the base environment:
conda create -n fence python=3.13 -y
conda activate fence
pip install -r requirements.txt
# Install CUDA-enabled torch if you have an RTX GPU
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# Windows FFmpeg install
choco install ffmpeg
ffmpeg -versionAdditional tooling:
-
Nexa SDK installer: https://github.com/NexaAI/nexa-sdk?tab=readme-ov-file#install-option-1-executable-installer. Add the install folder to
PATH. -
CVAT setup:
git clone https://github.com/cvat-ai/cvat.git cd cvat docker compose up -d
-
Colab workflow is available in
notebooks/virtual_fence.py(downloads datasets, prepares YOLO splits, trains, and exports models back to Drive).
-
Custom video ingestion (optional but recommended):
-
Source 50–100 permissively licensed clips (5–10 s) from YouTube or Pexels. Suggested keywords:
crowded street,pedestrian crosswalk,festival crowd. -
Record URLs in
data/custom/custom_data.csv—includesource,url,license, andnotescolumns. For YouTube, use yt-dlp; for Pexels, download MP4 assets directly. -
Extract frames with FFmpeg (2 FPS baseline) to stabilize annotation quality:
ffmpeg -i input.mp4 -vf "fps=2,scale=w=min(1280\,iw):h=-2" frames/frame_%05d.jpg
-
Consolidate downloads via helper scripts:
python scripts/download_custom_videos.py ` --csv data/custom/custom_data.csv ` --output-dir data/custom/videos ` --zip-path data/custom/custom_clips.zip python scripts/prep_custom_clips.py ` --input-dir data/custom/videos ` --frames-dir data/custom/frames ` --zip-dir data/custom/zips ` --fps 2 --max-dim 1280 python scripts/rename_custom_zips.py --zip-dir data/custom/zips
-
-
Mirror public datasets (CrowdHuman, MOT17) using the automated setup:
python scripts/data_setup.py ` --crowdhuman-dir d:\datasets\crowdhuman ` --mot17-dir d:\datasets\mot17 ` --custom-zip-root d:\Projects\VirtualFence\data\annotations ` --custom-output d:\datasets\custom_data
-
Unify into YOLO layout for consistent training/validation splits across YOLOv8n, RT-DETR-L, and OmniVLM prompt conditioning:
python YOLO/prepare_yolo_dataset.py ` --output-dir data/yolo_data ` --custom-dir data/custom/custom_data ` --custom-ratio 0.8 0.1 0.1 ` --crowdhuman-root data/crowdhuman ` --crowdhuman-ratio 0.8 0.1 0.1 ` --mot-root data/MOT17 ` --mot-ratio 0.9 0.1 0.0 ` --seed 42
Colab note: run the same commands in /content/VirtualFence, train on GPU (--device 0), then zip YOLO/runs/detect/... back to Drive for local evaluation.
-
Spin up CVAT locally (Docker Compose) or connect to an existing instance with RBAC enabled.
-
Import media: upload FFmpeg-extracted frames (
*.jpg) or the raw videos for interpolation-assisted labeling. -
Label configuration:
- Create a single
personclass with attributesoccludedandtruncated. - Use the
Rectangletool with automatic interpolation between keyframes.
- Create a single
-
Annotation formats:
- Export detection labels as COCO 1.0 for YOLO/RT-DETR fine-tuning.
- Export tracking labels as MOT 1.1 to evaluate IDF1/MOTA and to bootstrap ByteTrack/Hungarian matchers.
-
Quality checks: leverage CVAT’s review mode to flag low-confidence annotations; sync reviewed exports to
data/annotations/cvat_exports/. -
Dataset sync: run
python scripts/data_setup.py --custom-zip-root data/annotations/cvat_exportsto fold reviewed annotations into the unified dataset.
Training (GPU recommended):
python YOLO/train_yolo.py `
--model yolov8n.pt `
--data YOLO/virtual_fence.yaml `
--epochs 100 `
--imgsz 640 `
--device 0 `
--name virtfence_yolov8nEvaluation, metrics, and visualizations:
python YOLO/evaluate_and_visualize.py `
--model YOLO/runs/virtfence_yolov8n/weights/best.pt `
--data YOLO/virtual_fence.yaml `
--split val `
--output-dir YOLO/reports/yolov8n `
--device 0 `
--n-samples 20Zone-count inference (outputs annotated MP4 with live counter):
python YOLO/zone_counter.py `
--model YOLO/runs/virtfence_yolov8n/weights/best.pt `
--source data/input.mp4 `
--zone-config config/fence_zone.yaml `
--output results/yolo/output_annotated.mp4 `
--device 0Export ONNX for edge deployment:
python YOLO/export_yolo_to_onnx.py `
--weights YOLO/runs/virtfence_yolov8n/weights/best.pt `
--output YOLO/exports/virtfence_yolov8n.onnx `
--imgsz 640 `
--dynamicpython scripts/rtdetr_eval.py `
--config specs/rtdetr/rtdetr_config.yml `
--weights d:/models/rtdetr_weights.pdparams `
--overrides EvalReader.dataset.dataset_dir=data/yolo_data `
--device gpu `
--output results/rtdetr/metrics.jsonInstall the appropriate paddlepaddle-gpu or paddlepaddle wheel per https://www.paddlepaddle.org.cn/. Use --device cpu if only CPU wheels are available.
| Method | Dataset Split | mAP@0.5 | mAP@0.5:0.95 | Precision | Recall | FPS | Notes |
|---|---|---|---|---|---|---|---|
| YOLOv8n | data/yolo_data/val |
0.7728 | 0.4792 | 0.8144 | 0.7004 | 95.0 | ByteTrack tracking; MP4:results/yolo/output_annotated.mp4 |
| RT-DETR-L | data/yolo_data/val |
0.5630 | 0.3034 | 0.6248 | 0.5692 | 30.9 | Transformer detector; MP4:results/rtdetr/output_annotated.mp4 |
| OmniVLM (WIP) | omnivlm_multiset |
n/a | n/a | n/a | n/a | — | JSON predictions via Nexa SDK; integrate with tracker for zone analytics |
Reproduce the table by running
python results/compare_models.py --yolo-metrics YOLO/reports/yolov8n/metrics.json --rtdetr-metrics RT_DETR/reports/rtdetrl/metrics.json --output results/comparison.
-
Environment
conda create -n fence python=3.13 -y conda activate fence pip install -r requirements.txt
-
Torch GPU setup
pip uninstall -y torch torchvision torchaudio pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
-
Data consolidation
python scripts/data_setup.py --crowdhuman-dir d:/datasets/crowdhuman --mot17-dir d:/datasets/mot17 --custom-zip-root data/custom/annotations --custom-output d:/datasets/custom_data python YOLO/prepare_yolo_dataset.py --output-dir data/yolo_data --custom-dir data/custom/custom_data --crowdhuman-root data/crowdhuman --mot-root data/MOT17
-
Evaluation helpers
python scripts/yolo_eval.py --model yolov8n.pt --data d:/datasets/virtual_fence/virtual_fence.yaml --split val --output results/yolo/yolov8n_metrics.json python scripts/rtdetr_eval.py --config specs/rtdetr/rtdetr_config.yml --weights d:/models/rtdetr_weights.pdparams --output results/rtdetr/metrics.json --overrides EvalReader.dataset.dataset_dir=data/yolo_data python OmniVLM/generate_manifest.py --custom-root data/custom/custom_data --crowdhuman-root data/crowdhuman --mot17-root data/MOT17 --custom-annotations data/custom/annotations --output OmniVLM/omnivlm_multiset.jsonl
-
Nexa CLI basics
nexa pull NexaAI/OmniVLM-968M --model-type vlm nexa serve --host 127.0.0.1:18181 nexa run NexaAI/OmniVLM-968M
- https://huggingface.co/NexaAI/OmniVLM-968M
- https://colab.research.google.com/#fileId=https://huggingface.co/NexaAI/OmniVLM-968M.ipynb
- https://www.crowdhuman.org/download.html
- https://huggingface.co/datasets/sshao0516/CrowdHuman
- https://github.com/sunsmarterjie/yolov12
- https://yolov12.com/
- https://docs.ultralytics.com/models/yolo12/
- https://github.com/cvat-ai/cvat
- https://www.cvat.ai/
- https://docs.cvat.ai/docs/administration/basics/installation/
- https://docs.ultralytics.com/models/rtdetr/
- https://github.com/lyuwenyu/RT-DETR
- https://huggingface.co/NexaAI/OmniVLM-968M/discussions/4
- https://github.com/NexaAI/nexa-sdk?tab=readme-ov-file#install-option-1-executable-installer
- https://motchallenge.net/data/MOT17.zip