This folder contains scripts for exporting models to TensorRT format for NVIDIA Triton Inference Server deployment.
The export process transforms PyTorch models into optimized TensorRT engines for high-performance GPU inference.
| Script | Purpose | Output |
|---|---|---|
export_models.py |
YOLO11 object detection with end2end NMS | TensorRT engine |
export_scrfd.py |
SCRFD-10G face detection + landmarks | TensorRT engine |
export_face_recognition.py |
ArcFace face embeddings | TensorRT engine |
export_mobileclip_image_encoder.py |
MobileCLIP image encoder | TensorRT engine |
export_mobileclip_text_encoder.py |
MobileCLIP text encoder | TensorRT engine |
export_paddleocr_det.py |
PP-OCRv5 text detection | TensorRT engine |
export_paddleocr_rec.py |
PP-OCRv5 text recognition | TensorRT engine |
download_face_models.py |
Download pre-trained face models | PyTorch weights |
download_paddleocr.py |
Download PP-OCRv5 models | ONNX models |
download_pytorch_models.py |
Download YOLO11 PyTorch models | PyTorch weights |
pytorch_models/
├── yolo11s.pt # YOLO11 PyTorch model
├── arcface_w600k_r50.onnx # ArcFace ONNX model
├── mobileclip2_s2/ # MobileCLIP checkpoint
├── mobileclip2_s2_image_encoder.onnx # MobileCLIP image encoder ONNX
└── mobileclip2_s2_text_encoder.onnx # MobileCLIP text encoder ONNX
models/
├── yolov11_small_trt/ # YOLO11 TensorRT (standard)
│ ├── 1/model.plan
│ └── config.pbtxt
├── yolov11_small_trt_end2end/ # YOLO11 TensorRT with GPU NMS
│ ├── 1/model.plan
│ └── config.pbtxt
├── scrfd_10g_bnkps/ # SCRFD-10G face detection TensorRT
│ ├── 1/model.plan
│ └── config.pbtxt
├── arcface_w600k_r50/ # ArcFace TensorRT
│ ├── 1/model.plan
│ └── config.pbtxt
├── mobileclip2_s2_image_encoder/ # MobileCLIP image encoder
│ ├── 1/model.plan
│ └── config.pbtxt
├── mobileclip2_s2_text_encoder/ # MobileCLIP text encoder
│ ├── 1/model.plan
│ └── config.pbtxt
├── ppocr_det_v5/ # PP-OCRv5 detection
│ ├── 1/model.plan
│ └── config.pbtxt
└── ppocr_rec_v5/ # PP-OCRv5 recognition
├── 1/model.plan
└── config.pbtxt
# Export TensorRT with GPU NMS (recommended)
make export-models
# Or directly:
docker compose exec yolo-api python /app/export/export_models.py \
--models small \
--formats trt trt_end2end \
--normalize-boxesdocker compose exec yolo-api python /app/export/export_scrfd.py# Download pre-trained model
docker compose exec yolo-api python /app/export/download_face_models.py
# Export to TensorRT
docker compose exec yolo-api python /app/export/export_face_recognition.py# Export both image and text encoders
make export-mobileclip
# Or individually:
docker compose exec yolo-api python /app/export/export_mobileclip_image_encoder.py
docker compose exec yolo-api python /app/export/export_mobileclip_text_encoder.py# Download PP-OCRv5 models
docker compose exec yolo-api python /app/export/download_paddleocr.py
# Export detection and recognition
docker compose exec yolo-api python /app/export/export_paddleocr_det.py
docker compose exec yolo-api python /app/export/export_paddleocr_rec.py- Input:
[B, 3, 640, 640]FP16, normalized [0, 1] - Output (end2end):
num_dets,det_boxes,det_scores,det_classes - Dynamic batching: 1-64 (configurable)
- Input:
[B, 3, 640, 640]FP32, RGB, (x-127.5)/128.0 normalized - Output: 9 tensors (3 FPN strides x score/bbox/kps), CPU post-processed
- Dynamic batching: 1-32
- Input:
[B, 3, 112, 112]FP16, aligned face crops - Output:
[B, 512]L2-normalized embeddings - Dynamic batching: 1-128
- Input:
[B, 3, 256, 256]FP32, normalized [0, 1] - Output:
[B, 512]L2-normalized embeddings - Dynamic batching: 1-128
- Input:
[B, 77]INT64 token IDs - Output:
[B, 512]L2-normalized embeddings - Dynamic batching: 1-64
- Input:
[B, 3, H, W]FP32, dynamic size - Output: Text region polygons
- Input:
[B, 3, 48, W]FP32, dynamic width - Output: Character sequence probabilities
All exports use these common settings:
- Precision: FP16 (configurable)
- Workspace: 4GB
- Optimization profiles for dynamic batching
Download PyTorch models first:
make download-pytorchCheck GPU memory and reduce batch size if needed:
docker compose exec yolo-api nvidia-smiVerify file structure and restart Triton:
ls -lh models/{model_name}/1/
docker compose restart triton-server