Skip to content

The virtual fence project compares YOLOv8n with RT-DETR-l models on person class detection in crowded intersections.

License

Notifications You must be signed in to change notification settings

MrAliAmani/VirtualFence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Virtual Fence Benchmark Suite

Unified pipeline for the Virtual Fence project—detect, track, and count people entering a protected zone using YOLO, OmniVLM, and RT-DETR. This README consolidates the specifications under specs/ and the task briefs in task/ into a single reference, including hands-on guidance for dataset sourcing, CVAT annotation, Nexa SDK setup, and model benchmarking.

Table of Contents

  1. Project Specs & Task Overview
  2. Environment & Tooling
  3. Dataset Preparation
  4. Annotation Workflow (CVAT)
  5. Model Workflows
  6. Results Summary
  7. Commands Cheat Sheet
  8. Raspberry Pi / Edge Notes
  9. Troubleshooting
  10. References

Project Specs & Task Overview

  • 📄 Specifications: specs/virtual_fence_task_spec.md defines objectives, datasets, metrics, and success criteria.
  • 📥 Task briefs: task/VirtualFence.md and task/Person_Counting_Task.pdf describe the zone-counting deliverable.
  • Key requirements:
    • Combine CrowdHuman, MOT17, and custom annotated clips.
    • Implement YOLO, OmniVLM, and a custom detector (RT-DETR).
    • Produce annotated output video with zone overlay and entry counter.
    • Benchmark all methods on the same clips: mAP, IDF1/MOTA, counting MAE, FPS.

Environment & Tooling

Prerequisites:

  • Python 3.13 (conda recommended).
  • NVIDIA GPU + CUDA toolkit for accelerated training (optional but preferred).
  • Nexa SDK for OmniVLM workflows.
  • Docker (for CVAT) and FFmpeg for video processing.

Set up the base environment:

conda create -n fence python=3.13 -y
conda activate fence
pip install -r requirements.txt

# Install CUDA-enabled torch if you have an RTX GPU
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

# Windows FFmpeg install
choco install ffmpeg
ffmpeg -version

Additional tooling:


Dataset Preparation

  1. Custom video ingestion (optional but recommended):

    • Source 50–100 permissively licensed clips (5–10 s) from YouTube or Pexels. Suggested keywords: crowded street, pedestrian crosswalk, festival crowd.

    • Record URLs in data/custom/custom_data.csv—include source, url, license, and notes columns. For YouTube, use yt-dlp; for Pexels, download MP4 assets directly.

    • Extract frames with FFmpeg (2 FPS baseline) to stabilize annotation quality:

      ffmpeg -i input.mp4 -vf "fps=2,scale=w=min(1280\,iw):h=-2" frames/frame_%05d.jpg
    • Consolidate downloads via helper scripts:

      python scripts/download_custom_videos.py `
        --csv data/custom/custom_data.csv `
        --output-dir data/custom/videos `
        --zip-path data/custom/custom_clips.zip
      
      python scripts/prep_custom_clips.py `
        --input-dir data/custom/videos `
        --frames-dir data/custom/frames `
        --zip-dir data/custom/zips `
        --fps 2 --max-dim 1280
      
      python scripts/rename_custom_zips.py --zip-dir data/custom/zips
  2. Mirror public datasets (CrowdHuman, MOT17) using the automated setup:

    python scripts/data_setup.py `
      --crowdhuman-dir d:\datasets\crowdhuman `
      --mot17-dir d:\datasets\mot17 `
      --custom-zip-root d:\Projects\VirtualFence\data\annotations `
      --custom-output d:\datasets\custom_data
  3. Unify into YOLO layout for consistent training/validation splits across YOLOv8n, RT-DETR-L, and OmniVLM prompt conditioning:

    python YOLO/prepare_yolo_dataset.py `
      --output-dir data/yolo_data `
      --custom-dir data/custom/custom_data `
      --custom-ratio 0.8 0.1 0.1 `
      --crowdhuman-root data/crowdhuman `
      --crowdhuman-ratio 0.8 0.1 0.1 `
      --mot-root data/MOT17 `
      --mot-ratio 0.9 0.1 0.0 `
      --seed 42

Colab note: run the same commands in /content/VirtualFence, train on GPU (--device 0), then zip YOLO/runs/detect/... back to Drive for local evaluation.

Annotation Workflow (CVAT)

  1. Spin up CVAT locally (Docker Compose) or connect to an existing instance with RBAC enabled.

  2. Import media: upload FFmpeg-extracted frames (*.jpg) or the raw videos for interpolation-assisted labeling.

  3. Label configuration:

    • Create a single person class with attributes occluded and truncated.
    • Use the Rectangle tool with automatic interpolation between keyframes.
  4. Annotation formats:

    • Export detection labels as COCO 1.0 for YOLO/RT-DETR fine-tuning.
    • Export tracking labels as MOT 1.1 to evaluate IDF1/MOTA and to bootstrap ByteTrack/Hungarian matchers.
  5. Quality checks: leverage CVAT’s review mode to flag low-confidence annotations; sync reviewed exports to data/annotations/cvat_exports/.

  6. Dataset sync: run python scripts/data_setup.py --custom-zip-root data/annotations/cvat_exports to fold reviewed annotations into the unified dataset.


Model Workflows

YOLO Pipeline

Training (GPU recommended):

python YOLO/train_yolo.py `
  --model yolov8n.pt `
  --data YOLO/virtual_fence.yaml `
  --epochs 100 `
  --imgsz 640 `
  --device 0 `
  --name virtfence_yolov8n

Evaluation, metrics, and visualizations:

python YOLO/evaluate_and_visualize.py `
  --model YOLO/runs/virtfence_yolov8n/weights/best.pt `
  --data YOLO/virtual_fence.yaml `
  --split val `
  --output-dir YOLO/reports/yolov8n `
  --device 0 `
  --n-samples 20

Zone-count inference (outputs annotated MP4 with live counter):

python YOLO/zone_counter.py `
  --model YOLO/runs/virtfence_yolov8n/weights/best.pt `
  --source data/input.mp4 `
  --zone-config config/fence_zone.yaml `
  --output results/yolo/output_annotated.mp4 `
  --device 0

Export ONNX for edge deployment:

python YOLO/export_yolo_to_onnx.py `
  --weights YOLO/runs/virtfence_yolov8n/weights/best.pt `
  --output YOLO/exports/virtfence_yolov8n.onnx `
  --imgsz 640 `
  --dynamic

RT-DETR (PaddleDetection)

python scripts/rtdetr_eval.py `
  --config specs/rtdetr/rtdetr_config.yml `
  --weights d:/models/rtdetr_weights.pdparams `
  --overrides EvalReader.dataset.dataset_dir=data/yolo_data `
  --device gpu `
  --output results/rtdetr/metrics.json

Install the appropriate paddlepaddle-gpu or paddlepaddle wheel per https://www.paddlepaddle.org.cn/. Use --device cpu if only CPU wheels are available.


Results Summary

Method Dataset Split mAP@0.5 mAP@0.5:0.95 Precision Recall FPS Notes
YOLOv8n data/yolo_data/val 0.7728 0.4792 0.8144 0.7004 95.0 ByteTrack tracking; MP4:results/yolo/output_annotated.mp4
RT-DETR-L data/yolo_data/val 0.5630 0.3034 0.6248 0.5692 30.9 Transformer detector; MP4:results/rtdetr/output_annotated.mp4
OmniVLM (WIP) omnivlm_multiset n/a n/a n/a n/a JSON predictions via Nexa SDK; integrate with tracker for zone analytics

Reproduce the table by running python results/compare_models.py --yolo-metrics YOLO/reports/yolov8n/metrics.json --rtdetr-metrics RT_DETR/reports/rtdetrl/metrics.json --output results/comparison.


Commands Cheat Sheet

  • Environment

    conda create -n fence python=3.13 -y
    conda activate fence
    pip install -r requirements.txt
  • Torch GPU setup

    pip uninstall -y torch torchvision torchaudio
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
  • Data consolidation

    python scripts/data_setup.py --crowdhuman-dir d:/datasets/crowdhuman --mot17-dir d:/datasets/mot17 --custom-zip-root data/custom/annotations --custom-output d:/datasets/custom_data
    python YOLO/prepare_yolo_dataset.py --output-dir data/yolo_data --custom-dir data/custom/custom_data --crowdhuman-root data/crowdhuman --mot-root data/MOT17
  • Evaluation helpers

    python scripts/yolo_eval.py --model yolov8n.pt --data d:/datasets/virtual_fence/virtual_fence.yaml --split val --output results/yolo/yolov8n_metrics.json
    python scripts/rtdetr_eval.py --config specs/rtdetr/rtdetr_config.yml --weights d:/models/rtdetr_weights.pdparams --output results/rtdetr/metrics.json --overrides EvalReader.dataset.dataset_dir=data/yolo_data
    python OmniVLM/generate_manifest.py --custom-root data/custom/custom_data --crowdhuman-root data/crowdhuman --mot17-root data/MOT17 --custom-annotations data/custom/annotations --output OmniVLM/omnivlm_multiset.jsonl
  • Nexa CLI basics

    nexa pull NexaAI/OmniVLM-968M --model-type vlm
    nexa serve --host 127.0.0.1:18181
    nexa run NexaAI/OmniVLM-968M

References

About

The virtual fence project compares YOLOv8n with RT-DETR-l models on person class detection in crowded intersections.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published