This repository contains a TensorFlow 2.x pipeline for training an ultra-light license plate detector on the CCPD2019 dataset. The project includes data preparation scripts, configurable training/evaluation modules, and dockerized tooling so you can reproduce our experiments or extend them to new deployment targets (e.g., MCUs).
---- Patch-based training tuned for edge deployments, following the workflow described in docs/data_and_aug.md.
- Loss design: Binary Focal Loss (extreme FG/BG imbalance) + Masked CIoU (positives-only regression) for stable patch-based training. See docs/loss.md for details.
- Multiple neck architectures (
fpn,sharedneck,s2_neck) and optional ULSAM attention blocks. - End-to-end Docker support with GPU passthrough and a one-click script for building/running training containers.
- Quantization-aware (QAT) and post-training (PTQ) flows bundled into the main training script.
- MCU‑friendly ops, memory‑aware model sizes, and reference latency.
- Evaluation tools for both desktop inference (
lpd/test.py) and embedded benchmarks.
├── docker-compose.yml # Docker stack for training with GPU passthrough
├── Dockerfile # TensorFlow 2.12 (GPU) base image + project deps
├── lpd/ # Core Python package (configs, models, trainers, utils, …)
├── scripts/
│ ├── train_in_docker.sh # One-click helper to build & launch container training
│ ├── prep_tfrecords.sh # Resize CCPD images and create TFRecord splits
│ └── misc/tensorboard.md # Optional TensorBoard setup instructions
├── dataset/ # Expected location for CCPD2019 + generated TFRecords
├── models/ # Saved `.h5` checkpoints (created at runtime)
├── logs/ # TensorBoard logs (created at runtime)
├── results/ # Visualizations & evaluation artifacts (created at runtime)
└── README.md
- Docker 20.10+ with the NVIDIA Container Toolkit for GPU training.
- Docker Compose v2 (
docker compose) or v1 (docker-compose).
- Python ≥ 3.10 (3.11 tested).
pip install -r requirements.txt- NVIDIA drivers + CUDA-capable GPU for best performance (fallback to CPU works but is slow).
-
Download the CCPD2019 dataset and unpack it under
dataset/CCPD2019/. -
Generate resized crops and TFRecords:
scripts/prep_tfrecords.sh
This produces:
dataset/CCPD_resized/– canonicalized images for patch sampling.dataset/CCPD_2019_tfrecords/– the TFRecords used by the training pipeline (dataset_stats.jsonis expected here).
For a deeper look at the sampling strategy and augmentations, read docs/data_and_aug.md.
scripts/train_in_docker.sh --help
scripts/train_in_docker.sh --neck sharedneck --qatWhat the script does:
- Validates Docker/Compose availability and the TFRecords directory.
- Optionally rebuilds the image defined in
docker-compose.yml. - Launches the
edge-lpdcontainer with host UID/GID mapping. - Runs
python lpd/train.pyinside the container with the options you provide (--neck,--ulsam,--qat,--force-train, plus any extra arguments).
Training artifacts are written to the mounted models/, logs/, cache/, and results/ folders in your workspace.
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
python lpd/train.py --neck sharedneck --ulsamEnvironment variables such as CUDA_VISIBLE_DEVICES can be set to control GPU visibility. The script automatically enables memory growth on detected GPUs.
-
Desktop evaluation of a saved Keras model:
python lpd/test.py --model_path models/my_detector_best.h5 --conf_thresh 0.5 --iou_thresh 0.5
-
TensorBoard (optional):
tensorboard --logdir=./logs --port=6006 --bind_all
-
MCU profiling / embedded evaluation: leverage the MCU-OD-Profiler for on-device benchmarks.
Generated visualizations and feature maps are saved under results/ during training runs.
Detection Results on ESP32‑S3 (conf=0.5, IoU=0.5; regression diagnostics on TPs only)
| Model | TP | FP | FN | TN | Precision | Recall | F1-Score | Accuracy | AP@0.50 | Mean IoU (TPs) | ROC-AUC (cls) | PR-AUC (cls) | Mean CIoU | Norm Center Err | Norm Width Err | Norm Height Err |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| sharedneck-standard | 32833 | 2290 | 67 | 310 | 0.9348 | 0.9980 | 0.9653 | 0.9336 | 0.9096 | 0.8270 | 0.3584 | 0.9099 | 0.8259 | 0.0134 | 0.0662 | 0.1070 |
| sharedneck-ulsam | 32897 | 2234 | 62 | 308 | 0.9364 | 0.9981 | 0.9663 | 0.9353 | 0.9002 | 0.8275 | 0.3279 | 0.9004 | 0.8264 | 0.0132 | 0.0644 | 0.1107 |
| fpn-standard | 32928 | 2162 | 99 | 312 | 0.9384 | 0.9970 | 0.9668 | 0.9363 | 0.9062 | 0.8267 | 0.3331 | 0.9064 | 0.8257 | 0.0133 | 0.0630 | 0.1074 |
| fpn-ulsam | 32868 | 2171 | 140 | 322 | 0.9380 | 0.9958 | 0.9660 | 0.9349 | 0.8995 | 0.8255 | 0.3185 | 0.8998 | 0.8244 | 0.0131 | 0.0575 | 0.1192 |
Model Complexity & MCU Latency
| Model | Params | FLOPs | TFLite Size | Inference Latency (ESP32-S3) |
|---|---|---|---|---|
| fpn | 137,557 | 0.0350 GFLOPs | 269.10 KB | 215 ms |
| fpn + ulsam | 140,009 | 0.0359 GFLOPs | 420.39 KB | 300 ms |
| sharedneck | 113,525 | 0.0217 GFLOPs | 240.52 KB | 190 ms |
| sharedneck + ulsam | 113,993 | 0.0224 GFLOPs | 263.57 KB | 195 ms |
Interpretation:
sharedneckreduces params/FLOPs and improves MCU latency without sacrificing detection quality. Adding ULSAM slightly increases size/latency while keeping overall metrics competitive—use it if you need extra robustness under challenging lighting/backgrounds.
Core training hyperparameters live in lpd/configs/default_config.py. Notable settings include:
IMAGE_HEIGHT,IMAGE_WIDTH– patch size (96×96) used for training.AUGMENTATION_PROBS_CONFIG– scenario-aware augmentation schedules.TFRECORD_PATH,MODEL_PATH,RESULTS_PATH,LOG_DIR– runtime directories (point to/app/...in-container).
Adjust these values or provide alternative config modules as needed for your experiments.
PRs and issues are welcome! Please describe your environment, dataset protocol, and steps to reproduce.
This project is licensed under the Apache‑2.0 License (see LICENSE).
