Skip to content

coozyme/model-product-vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฏ YOLOv8 Product Detection

Production-ready Object Detection system using YOLOv8 for detecting retail products in images.

Python PyTorch YOLOv8 License


๐Ÿ“‹ Table of Contents

  1. Problem Statement
  2. Solution Overview
  3. Project Structure
  4. System Architecture
  5. Installation
  6. Dataset Preparation
  7. Training
  8. Evaluation
  9. Inference
  10. Model Export
  11. Docker Deployment
  12. Results
  13. Future Improvements

๐ŸŽฏ Problem Statement

Business Context

Retail and e-commerce companies need automated systems to:

  • Inventory Management: Automatically count and categorize products
  • Quality Assurance: Verify product placement and arrangement
  • Smart Checkout: Enable cashier-less checkout systems
  • Shelf Analytics: Monitor product availability in real-time

Technical Challenge

Develop an object detection system that can:

  • Detect 3 product categories based on size (small, medium, large)
  • Achieve high accuracy (mAP50 > 0.8) suitable for production
  • Run inference in real-time (< 50ms per image)
  • Export to optimized formats (ONNX, TorchScript) for deployment

๐Ÿ’ก Solution Overview

Approach

  • Model: YOLOv8n (Nano) - Fastest variant with excellent accuracy tradeoff
  • Architecture: CSPDarknet backbone + PANet neck + Decoupled head
  • Training: Transfer learning from COCO pretrained weights
  • Optimization: Adam optimizer with cosine annealing LR schedule

Key Features

โœ… Modular, production-ready codebase
โœ… Comprehensive dataset pipeline (download โ†’ convert โ†’ split)
โœ… Configurable hyperparameters via CLI
โœ… Visualization tools (confusion matrix, PR curves)
โœ… Multi-format export (ONNX, TorchScript)
โœ… Docker containerization for deployment


๐Ÿ“ Project Structure

product-detection-yolov8/
โ”œโ”€โ”€ ๐Ÿ“„ README.md                 # This documentation
โ”œโ”€โ”€ ๐Ÿ“„ requirements.txt          # Python dependencies
โ”œโ”€โ”€ ๐Ÿณ Dockerfile               # Container configuration
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ src/                     # Source code
โ”‚   โ”œโ”€โ”€ __init__.py             # Package initialization
โ”‚   โ”œโ”€โ”€ utils.py                # Utility functions
โ”‚   โ”œโ”€โ”€ download_dataset.py     # Dataset downloader
โ”‚   โ”œโ”€โ”€ convert_to_yolo.py      # Annotation converter
โ”‚   โ”œโ”€โ”€ split_dataset.py        # Train/val splitter
โ”‚   โ”œโ”€โ”€ train.py                # Training pipeline
โ”‚   โ”œโ”€โ”€ evaluate.py             # Evaluation & visualization
โ”‚   โ””โ”€โ”€ inference.py            # Inference engine
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ data/                    # Dataset directory (generated)
โ”‚   โ”œโ”€โ”€ raw/                    # Raw downloaded data
โ”‚   โ”œโ”€โ”€ converted/              # YOLO format data
โ”‚   โ”œโ”€โ”€ train/                  # Training set
โ”‚   โ”‚   โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ””โ”€โ”€ labels/
โ”‚   โ”œโ”€โ”€ val/                    # Validation set
โ”‚   โ”‚   โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ””โ”€โ”€ labels/
โ”‚   โ””โ”€โ”€ data.yaml               # Dataset configuration
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ runs/                    # Training runs (generated)
โ”‚   โ””โ”€โ”€ train_YYYYMMDD_HHMMSS/
โ”‚       โ”œโ”€โ”€ weights/
โ”‚       โ”‚   โ”œโ”€โ”€ best.pt
โ”‚       โ”‚   โ””โ”€โ”€ last.pt
โ”‚       โ””โ”€โ”€ results.csv
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ output/                  # Evaluation & inference results (generated)
โ”‚   โ”œโ”€โ”€ confusion_matrix.png
โ”‚   โ”œโ”€โ”€ pr_curve.png
โ”‚   โ””โ”€โ”€ predictions/
โ”‚
โ””โ”€โ”€ ๐Ÿ“‚ logs/                    # Log files (generated)
    โ””โ”€โ”€ yolov8_YYYYMMDD.log

๐Ÿ—๏ธ System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     DATA PIPELINE                                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚  โ”‚  HuggingFace โ”‚    โ”‚   Convert    โ”‚    โ”‚    Split     โ”‚       โ”‚
โ”‚  โ”‚   Dataset    โ”‚โ”€โ”€โ”€โ–ถโ”‚  to YOLO     โ”‚โ”€โ”€โ”€โ–ถโ”‚  Train/Val   โ”‚       โ”‚
โ”‚  โ”‚   Download   โ”‚    โ”‚   Format     โ”‚    โ”‚   (80/20)    โ”‚       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    TRAINING PIPELINE                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚  โ”‚   YOLOv8n    โ”‚    โ”‚    Train     โ”‚    โ”‚   Export     โ”‚       โ”‚
โ”‚  โ”‚  Pretrained  โ”‚โ”€โ”€โ”€โ–ถโ”‚   20 epochs  โ”‚โ”€โ”€โ”€โ–ถโ”‚  ONNX/TS     โ”‚       โ”‚
โ”‚  โ”‚   Weights    โ”‚    โ”‚   Batch 16   โ”‚    โ”‚              โ”‚       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚                                                                  โ”‚
โ”‚  Hyperparameters:                                                โ”‚
โ”‚  โ€ข Optimizer: Adam (lr=0.001)                                    โ”‚
โ”‚  โ€ข Image Size: 640x640                                           โ”‚
โ”‚  โ€ข Early Stopping: patience=5                                    โ”‚
โ”‚  โ€ข LR Scheduler: Cosine Annealing                                โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   INFERENCE PIPELINE                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚  โ”‚    Input     โ”‚    โ”‚   YOLOv8     โ”‚    โ”‚   Output     โ”‚       โ”‚
โ”‚  โ”‚    Image     โ”‚โ”€โ”€โ”€โ–ถโ”‚  Inference   โ”‚โ”€โ”€โ”€โ–ถโ”‚  Detections  โ”‚       โ”‚
โ”‚  โ”‚              โ”‚    โ”‚  best.pt     โ”‚    โ”‚  + Viz       โ”‚       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚                                                                  โ”‚
โ”‚  Supported Sources:                                              โ”‚
โ”‚  โ€ข Single image (jpg, png)                                       โ”‚
โ”‚  โ€ข Folder of images                                              โ”‚
โ”‚  โ€ข Webcam (real-time)                                            โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Installation

Prerequisites

  • Python 3.10+
  • CUDA 11.8+ (for GPU training)
  • 8GB+ RAM
  • 10GB+ disk space

Setup

# Clone repository
cd product-detection-yolov8

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
# or: .\venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Verify Installation

python -c "from ultralytics import YOLO; print('YOLOv8 OK')"
python -c "import torch; print(f'PyTorch OK, CUDA: {torch.cuda.is_available()}')"

๐Ÿ“ฆ Dataset Preparation

Option 1: Download from HuggingFace

python src/download_dataset.py --dataset sku110k --output data/raw

Supported datasets:

  • sku110k - Retail store shelf images
  • grocery - Grocery store products
  • cppe-5 - Personal protective equipment

Option 2: Use Custom Dataset

Place your data in YOLO format:

data/custom/
โ”œโ”€โ”€ images/
โ”‚   โ”œโ”€โ”€ img001.jpg
โ”‚   โ””โ”€โ”€ img002.jpg
โ””โ”€โ”€ labels/
    โ”œโ”€โ”€ img001.txt  # class_id x_center y_center width height (normalized)
    โ””โ”€โ”€ img002.txt

Convert Annotations (if needed)

# From COCO format
python src/convert_to_yolo.py --input data/raw --format coco

# From Pascal VOC format
python src/convert_to_yolo.py --input data/raw --format voc

Split Dataset

python src/split_dataset.py --input data/converted --train-ratio 0.8

๐Ÿ‹๏ธ Training

Basic Training

python src/train.py

Custom Configuration

python src/train.py \
    --epochs 50 \
    --batch 32 \
    --imgsz 640 \
    --model yolov8s.pt \
    --optimizer Adam \
    --lr 0.001 \
    --patience 10

Training Options

Parameter Default Description
--model yolov8n.pt Pretrained model (n/s/m/l/x)
--epochs 20 Training epochs
--batch 16 Batch size
--imgsz 640 Image size
--optimizer Adam Optimizer (Adam/SGD/AdamW)
--lr 0.001 Initial learning rate
--patience 5 Early stopping patience
--export None Export format (onnx/torchscript)

Resume Training

python src/train.py --resume runs/train_latest/weights/last.pt

๐Ÿ“Š Evaluation

Run Evaluation

python src/evaluate.py --model best.pt

Custom Evaluation

python src/evaluate.py \
    --model runs/train_20240101_120000/weights/best.pt \
    --data data/data.yaml \
    --conf 0.25 \
    --iou 0.45

Generated Outputs

File Description
confusion_matrix.png Class-wise prediction accuracy
pr_curve.png Precision-Recall curve
f1_curve.png F1 score vs confidence
class_performance.png Per-class metrics bar chart
metrics.json Numeric metrics (mAP, precision, recall)

๐Ÿ” Inference

Single Image

python src/inference.py --source image.jpg

Folder of Images

python src/inference.py --source ./images/ --conf 0.5

Webcam (Real-time)

python src/inference.py --source webcam --show

Inference Options

Parameter Default Description
--source (required) Image path, folder, or 'webcam'
--model best.pt Trained model path
--conf 0.25 Confidence threshold
--iou 0.45 NMS IOU threshold
--imgsz 640 Inference image size
--show False Display results in window
--no-save False Don't save output images

๐Ÿ“ค Model Export

Export during Training

python src/train.py --export onnx,torchscript

Export Existing Model

from ultralytics import YOLO

model = YOLO('best.pt')

# ONNX (for inference servers)
model.export(format='onnx', dynamic=True, simplify=True)

# TorchScript (for C++ inference)
model.export(format='torchscript')

# TensorRT (NVIDIA GPUs)
model.export(format='engine', device=0)

# CoreML (Apple devices)
model.export(format='coreml')

Export Formats

Format File Use Case
ONNX best.onnx Cross-platform deployment
TorchScript best.torchscript PyTorch production
TensorRT best.engine NVIDIA GPU optimization
CoreML best.mlmodel iOS/macOS apps
OpenVINO best_openvino/ Intel hardware

๐Ÿณ Docker Deployment

Build Image

# CPU version
docker build -t yolov8-product-detection --target production .

# GPU version (requires NVIDIA Container Toolkit)
docker build -t yolov8-product-detection:gpu --target gpu .

Run Training

docker run --gpus all \
    -v $(pwd)/data:/app/data \
    -v $(pwd)/runs:/app/runs \
    yolov8-product-detection:gpu \
    python src/train.py --epochs 20

Run Inference

docker run --gpus all \
    -v $(pwd)/test_images:/app/images \
    -v $(pwd)/results:/app/output \
    yolov8-product-detection:gpu \
    python src/inference.py --source /app/images

Docker Compose (Optional)

version: '3.8'
services:
  training:
    build:
      context: .
      target: gpu
    volumes:
      - ./data:/app/data
      - ./runs:/app/runs
    command: python src/train.py
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

๐Ÿ“ˆ Results

Training Results (Example)

Metric Value
mAP50 0.85
mAP50-95 0.62
Precision 0.88
Recall 0.82
Inference Time 12ms (RTX 3080)

Class Performance

Class Precision Recall mAP50
product_small 0.82 0.78 0.80
product_medium 0.90 0.85 0.88
product_large 0.92 0.83 0.87

Training Curves

The training process generates loss curves and metric plots in the runs/ directory:

  • results.png - Training & validation loss curves
  • confusion_matrix.png - Validation confusion matrix
  • P_curve.png - Precision curve
  • R_curve.png - Recall curve
  • PR_curve.png - Precision-Recall curve

๐Ÿ”ฎ Future Improvements

Short-term

  • Add data augmentation pipeline (Albumentations)
  • Implement hyperparameter optimization (Optuna)
  • Add model ensemble support
  • Create REST API endpoint (FastAPI)

Medium-term

  • Implement real-time video processing
  • Add tracking (ByteTrack, BoT-SORT)
  • Support for instance segmentation
  • Mobile deployment (TFLite, CoreML)

Long-term

  • Active learning pipeline
  • Multi-camera support
  • Edge deployment (Jetson, Raspberry Pi)
  • Integration with inventory management systems

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ™ Acknowledgments


๐Ÿ“ง Contact

For questions or support, please open an issue on GitHub.


Happy Detecting! ๐ŸŽฏ

About

Model Product Vision

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors