Production-ready Object Detection system using YOLOv8 for detecting retail products in images.
- Problem Statement
- Solution Overview
- Project Structure
- System Architecture
- Installation
- Dataset Preparation
- Training
- Evaluation
- Inference
- Model Export
- Docker Deployment
- Results
- Future Improvements
Retail and e-commerce companies need automated systems to:
- Inventory Management: Automatically count and categorize products
- Quality Assurance: Verify product placement and arrangement
- Smart Checkout: Enable cashier-less checkout systems
- Shelf Analytics: Monitor product availability in real-time
Develop an object detection system that can:
- Detect 3 product categories based on size (small, medium, large)
- Achieve high accuracy (mAP50 > 0.8) suitable for production
- Run inference in real-time (< 50ms per image)
- Export to optimized formats (ONNX, TorchScript) for deployment
- Model: YOLOv8n (Nano) - Fastest variant with excellent accuracy tradeoff
- Architecture: CSPDarknet backbone + PANet neck + Decoupled head
- Training: Transfer learning from COCO pretrained weights
- Optimization: Adam optimizer with cosine annealing LR schedule
โ
Modular, production-ready codebase
โ
Comprehensive dataset pipeline (download โ convert โ split)
โ
Configurable hyperparameters via CLI
โ
Visualization tools (confusion matrix, PR curves)
โ
Multi-format export (ONNX, TorchScript)
โ
Docker containerization for deployment
product-detection-yolov8/
โโโ ๐ README.md # This documentation
โโโ ๐ requirements.txt # Python dependencies
โโโ ๐ณ Dockerfile # Container configuration
โ
โโโ ๐ src/ # Source code
โ โโโ __init__.py # Package initialization
โ โโโ utils.py # Utility functions
โ โโโ download_dataset.py # Dataset downloader
โ โโโ convert_to_yolo.py # Annotation converter
โ โโโ split_dataset.py # Train/val splitter
โ โโโ train.py # Training pipeline
โ โโโ evaluate.py # Evaluation & visualization
โ โโโ inference.py # Inference engine
โ
โโโ ๐ data/ # Dataset directory (generated)
โ โโโ raw/ # Raw downloaded data
โ โโโ converted/ # YOLO format data
โ โโโ train/ # Training set
โ โ โโโ images/
โ โ โโโ labels/
โ โโโ val/ # Validation set
โ โ โโโ images/
โ โ โโโ labels/
โ โโโ data.yaml # Dataset configuration
โ
โโโ ๐ runs/ # Training runs (generated)
โ โโโ train_YYYYMMDD_HHMMSS/
โ โโโ weights/
โ โ โโโ best.pt
โ โ โโโ last.pt
โ โโโ results.csv
โ
โโโ ๐ output/ # Evaluation & inference results (generated)
โ โโโ confusion_matrix.png
โ โโโ pr_curve.png
โ โโโ predictions/
โ
โโโ ๐ logs/ # Log files (generated)
โโโ yolov8_YYYYMMDD.log
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DATA PIPELINE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ HuggingFace โ โ Convert โ โ Split โ โ
โ โ Dataset โโโโโถโ to YOLO โโโโโถโ Train/Val โ โ
โ โ Download โ โ Format โ โ (80/20) โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TRAINING PIPELINE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ YOLOv8n โ โ Train โ โ Export โ โ
โ โ Pretrained โโโโโถโ 20 epochs โโโโโถโ ONNX/TS โ โ
โ โ Weights โ โ Batch 16 โ โ โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ
โ Hyperparameters: โ
โ โข Optimizer: Adam (lr=0.001) โ
โ โข Image Size: 640x640 โ
โ โข Early Stopping: patience=5 โ
โ โข LR Scheduler: Cosine Annealing โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ INFERENCE PIPELINE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Input โ โ YOLOv8 โ โ Output โ โ
โ โ Image โโโโโถโ Inference โโโโโถโ Detections โ โ
โ โ โ โ best.pt โ โ + Viz โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ
โ Supported Sources: โ
โ โข Single image (jpg, png) โ
โ โข Folder of images โ
โ โข Webcam (real-time) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Python 3.10+
- CUDA 11.8+ (for GPU training)
- 8GB+ RAM
- 10GB+ disk space
# Clone repository
cd product-detection-yolov8
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
# or: .\venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtpython -c "from ultralytics import YOLO; print('YOLOv8 OK')"
python -c "import torch; print(f'PyTorch OK, CUDA: {torch.cuda.is_available()}')"python src/download_dataset.py --dataset sku110k --output data/rawSupported datasets:
sku110k- Retail store shelf imagesgrocery- Grocery store productscppe-5- Personal protective equipment
Place your data in YOLO format:
data/custom/
โโโ images/
โ โโโ img001.jpg
โ โโโ img002.jpg
โโโ labels/
โโโ img001.txt # class_id x_center y_center width height (normalized)
โโโ img002.txt
# From COCO format
python src/convert_to_yolo.py --input data/raw --format coco
# From Pascal VOC format
python src/convert_to_yolo.py --input data/raw --format vocpython src/split_dataset.py --input data/converted --train-ratio 0.8python src/train.pypython src/train.py \
--epochs 50 \
--batch 32 \
--imgsz 640 \
--model yolov8s.pt \
--optimizer Adam \
--lr 0.001 \
--patience 10| Parameter | Default | Description |
|---|---|---|
--model |
yolov8n.pt | Pretrained model (n/s/m/l/x) |
--epochs |
20 | Training epochs |
--batch |
16 | Batch size |
--imgsz |
640 | Image size |
--optimizer |
Adam | Optimizer (Adam/SGD/AdamW) |
--lr |
0.001 | Initial learning rate |
--patience |
5 | Early stopping patience |
--export |
None | Export format (onnx/torchscript) |
python src/train.py --resume runs/train_latest/weights/last.ptpython src/evaluate.py --model best.ptpython src/evaluate.py \
--model runs/train_20240101_120000/weights/best.pt \
--data data/data.yaml \
--conf 0.25 \
--iou 0.45| File | Description |
|---|---|
confusion_matrix.png |
Class-wise prediction accuracy |
pr_curve.png |
Precision-Recall curve |
f1_curve.png |
F1 score vs confidence |
class_performance.png |
Per-class metrics bar chart |
metrics.json |
Numeric metrics (mAP, precision, recall) |
python src/inference.py --source image.jpgpython src/inference.py --source ./images/ --conf 0.5python src/inference.py --source webcam --show| Parameter | Default | Description |
|---|---|---|
--source |
(required) | Image path, folder, or 'webcam' |
--model |
best.pt | Trained model path |
--conf |
0.25 | Confidence threshold |
--iou |
0.45 | NMS IOU threshold |
--imgsz |
640 | Inference image size |
--show |
False | Display results in window |
--no-save |
False | Don't save output images |
python src/train.py --export onnx,torchscriptfrom ultralytics import YOLO
model = YOLO('best.pt')
# ONNX (for inference servers)
model.export(format='onnx', dynamic=True, simplify=True)
# TorchScript (for C++ inference)
model.export(format='torchscript')
# TensorRT (NVIDIA GPUs)
model.export(format='engine', device=0)
# CoreML (Apple devices)
model.export(format='coreml')| Format | File | Use Case |
|---|---|---|
| ONNX | best.onnx | Cross-platform deployment |
| TorchScript | best.torchscript | PyTorch production |
| TensorRT | best.engine | NVIDIA GPU optimization |
| CoreML | best.mlmodel | iOS/macOS apps |
| OpenVINO | best_openvino/ | Intel hardware |
# CPU version
docker build -t yolov8-product-detection --target production .
# GPU version (requires NVIDIA Container Toolkit)
docker build -t yolov8-product-detection:gpu --target gpu .docker run --gpus all \
-v $(pwd)/data:/app/data \
-v $(pwd)/runs:/app/runs \
yolov8-product-detection:gpu \
python src/train.py --epochs 20docker run --gpus all \
-v $(pwd)/test_images:/app/images \
-v $(pwd)/results:/app/output \
yolov8-product-detection:gpu \
python src/inference.py --source /app/imagesversion: '3.8'
services:
training:
build:
context: .
target: gpu
volumes:
- ./data:/app/data
- ./runs:/app/runs
command: python src/train.py
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]| Metric | Value |
|---|---|
| mAP50 | 0.85 |
| mAP50-95 | 0.62 |
| Precision | 0.88 |
| Recall | 0.82 |
| Inference Time | 12ms (RTX 3080) |
| Class | Precision | Recall | mAP50 |
|---|---|---|---|
| product_small | 0.82 | 0.78 | 0.80 |
| product_medium | 0.90 | 0.85 | 0.88 |
| product_large | 0.92 | 0.83 | 0.87 |
The training process generates loss curves and metric plots in the runs/ directory:
results.png- Training & validation loss curvesconfusion_matrix.png- Validation confusion matrixP_curve.png- Precision curveR_curve.png- Recall curvePR_curve.png- Precision-Recall curve
- Add data augmentation pipeline (Albumentations)
- Implement hyperparameter optimization (Optuna)
- Add model ensemble support
- Create REST API endpoint (FastAPI)
- Implement real-time video processing
- Add tracking (ByteTrack, BoT-SORT)
- Support for instance segmentation
- Mobile deployment (TFLite, CoreML)
- Active learning pipeline
- Multi-camera support
- Edge deployment (Jetson, Raspberry Pi)
- Integration with inventory management systems
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or support, please open an issue on GitHub.
Happy Detecting! ๐ฏ