Complete guide for deploying and using PP-OCRv5 text detection and recognition on NVIDIA Triton Inference Server.
- Architecture Overview
- Model Specifications
- Prerequisites
- Setup and Deployment
- Configuration
- API Usage
- Performance Tuning
- Troubleshooting
- File Reference
The OCR pipeline consists of three Triton models working together:
+-------------------+
| ocr_pipeline |
| (Python BLS) |
+--------+----------+
|
+------------------------+------------------------+
| |
+---------v----------+ +-----------v---------+
| paddleocr_det_trt | | paddleocr_rec_trt |
| (TensorRT) | | (TensorRT) |
| DB++ Detection | | SVTR-LCNet Recog |
+--------------------+ +---------------------+
| Model | Architecture | Purpose |
|---|---|---|
paddleocr_det_trt |
DB++ (Differentiable Binarization) | Detect text regions in images |
paddleocr_rec_trt |
SVTR-LCNet | Recognize text within cropped regions |
ocr_pipeline |
Python BLS (Backend Language Support) | Orchestrate detection and recognition |
- Input: Raw image bytes (JPEG/PNG)
- Preprocessing: Resize, normalize (x/127.5 - 1), pad to 32-boundary
- Detection: Probability map identifying text regions
- Post-processing: Threshold, contour detection, box expansion (unclip)
- Cropping: Perspective transform to extract text crops
- Recognition: CTC decoder produces text strings
- Output: Text strings with bounding boxes and confidence scores
| Property | Value |
|---|---|
| Model | PP-OCRv5 Mobile Detection |
| Architecture | DB++ (Differentiable Binarization) |
| Input Shape | [B, 3, H, W] where H,W are multiples of 32, max 960 |
| Input Format | FP32, BGR, normalized (x/127.5 - 1) |
| Output Shape | [B, 1, H, W] probability map |
| Output Range | [0, 1] sigmoid probabilities |
| Max Batch Size | 4 |
| Property | Value |
|---|---|
| Model | PP-OCRv5 Mobile Recognition (Multilingual) |
| Architecture | SVTR-LCNet |
| Input Shape | [B, 3, 48, W] where W is 48-2048 (dynamic) |
| Input Format | FP32, BGR, normalized (x/127.5 - 1) |
| Output Shape | [B, T, 18385] character probabilities |
| Output Format | Softmax probabilities for CTC decoding |
| Character Set | 18385 characters (18383 from dict + blank + space) |
| Max Batch Size | 64 |
The recognition model uses ppocrv5_dict.txt with 18383 characters:
- Chinese, Japanese, Korean characters
- English uppercase/lowercase letters
- Digits 0-9
- Common punctuation and symbols (multilingual)
- Special tokens: blank (index 0), space (index 18384)
- NVIDIA GPU with at least 4GB VRAM
- Recommended: RTX 3000/4000 series or A-series datacenter GPUs
- 16GB+ system RAM for TensorRT engine building
- Docker with NVIDIA Container Toolkit
- NVIDIA GPU with TensorRT 10.x support
- Python 3.10+ with the following packages:
onnx,onnxruntime-gpu
# From the yolo-api container:
docker compose exec yolo-api python /app/export/download_paddleocr.py
# Or run the convenience script:
./scripts/export_paddleocr.sh downloadThis downloads:
ppocr_det_v5_mobile.onnx(~5MB) - Detection modelppocr_rec_v5_mobile.onnx(~16MB) - Multilingual recognition model- Character dictionaries
CRITICAL: TensorRT requires sufficient workspace memory. Use correct syntax for TensorRT 10+:
# CORRECT syntax:
--memPoolSize=workspace:4G
# INCORRECT (will cause "Cudnn Error: CUDNN_STATUS_NOT_SUPPORTED"):
--workspace=4096 # Old deprecated syntax
--memPoolSize=4096 # Missing workspace: prefix and unitdocker compose exec triton-server /usr/src/tensorrt/bin/trtexec \
--onnx=/models/ppocr_det_v5_mobile.onnx \
--saveEngine=/models/paddleocr_det_trt/1/model.plan \
--fp16 \
--minShapes=x:1x3x32x32 \
--optShapes=x:1x3x736x736 \
--maxShapes=x:4x3x960x960 \
--memPoolSize=workspace:4Gdocker compose exec triton-server /usr/src/tensorrt/bin/trtexec \
--onnx=/models/ppocr_rec_v5_mobile.onnx \
--saveEngine=/models/paddleocr_rec_trt/1/model.plan \
--fp16 \
--minShapes=x:1x3x48x48 \
--optShapes=x:32x3x48x320 \
--maxShapes=x:64x3x48x2048 \
--memPoolSize=workspace:4GUse the provided script for complete export:
./scripts/export_paddleocr.sh allThis handles:
- Model download (if needed)
- GPU memory management (unloads other models)
- TensorRT conversion with correct parameters
- Dictionary file placement
- Triton config generation
The dictionary file must be in the recognition model directory:
# Verify dictionary exists (should have 18383 lines, plus blank+space = 18385 total)
wc -l models/paddleocr_rec_trt/ppocrv5_dict.txt
# Expected output: 18383 lines
# Total classes: 18385 (18383 dict + blank token + space token)# Reload all models
docker compose restart triton-server
# Or load specific models via API
curl -X POST localhost:4600/v2/repository/models/paddleocr_det_trt/load
curl -X POST localhost:4600/v2/repository/models/paddleocr_rec_trt/load
curl -X POST localhost:4600/v2/repository/models/ocr_pipeline/loadcurl -s localhost:4600/v2/models | jq '.models[] | select(.name | startswith("paddle") or startswith("ocr"))'Expected output:
{"name": "paddleocr_det_trt", "state": "READY"}
{"name": "paddleocr_rec_trt", "state": "READY"}
{"name": "ocr_pipeline", "state": "READY"}# Test via API endpoint
python scripts/test_ocr_pipeline.py
# Or use curl
curl -X POST http://localhost:4603/ocr/predict \
-F "image=@test_images/ocr-synthetic/hello_world.jpg"File: models/paddleocr_det_trt/config.pbtxt
name: "paddleocr_det_trt"
platform: "tensorrt_plan"
max_batch_size: 4
input [
{
name: "x"
data_type: TYPE_FP32
dims: [ 3, -1, -1 ] # Dynamic H, W (multiples of 32)
}
]
output [
{
name: "fetch_name_0"
data_type: TYPE_FP32
dims: [ 1, -1, -1 ] # Same H, W as input
}
]
instance_group [
{
count: 2
kind: KIND_GPU
gpus: [0]
}
]
dynamic_batching {
preferred_batch_size: [ 1, 2, 4 ]
max_queue_delay_microseconds: 5000
}File: models/paddleocr_rec_trt/config.pbtxt
name: "paddleocr_rec_trt"
platform: "tensorrt_plan"
max_batch_size: 64
input [
{
name: "x"
data_type: TYPE_FP32
dims: [ 3, 48, -1 ] # Dynamic width: 48-2048
}
]
output [
{
name: "fetch_name_0"
data_type: TYPE_FP32
dims: [ -1, 18385 ] # Dynamic timesteps, 18385 characters (multilingual)
}
]
instance_group [
{
count: 2
kind: KIND_GPU
gpus: [0]
}
]
dynamic_batching {
preferred_batch_size: [ 16, 32, 64 ]
max_queue_delay_microseconds: 10000
}File: models/ocr_pipeline/config.pbtxt
name: "ocr_pipeline"
backend: "python"
max_batch_size: 0 # BLS handles batching
input [
{
name: "ocr_images"
data_type: TYPE_FP32
dims: [ 3, -1, -1 ] # Preprocessed for detection
},
{
name: "original_image"
data_type: TYPE_FP32
dims: [ 3, -1, -1 ] # For text crop extraction
},
{
name: "orig_shape"
data_type: TYPE_INT32
dims: [ 2 ] # [H, W]
}
]
output [
{
name: "num_texts"
data_type: TYPE_INT32
dims: [ 1 ]
},
{
name: "text_boxes"
data_type: TYPE_FP32
dims: [ 128, 8 ] # Quadrilateral coordinates
},
{
name: "text_boxes_normalized"
data_type: TYPE_FP32
dims: [ 128, 4 ] # Axis-aligned [x1, y1, x2, y2]
},
{
name: "texts"
data_type: TYPE_STRING
dims: [ 128 ]
},
{
name: "text_scores"
data_type: TYPE_FP32
dims: [ 128 ] # Detection confidence
},
{
name: "rec_scores"
data_type: TYPE_FP32
dims: [ 128 ] # Recognition confidence
}
]
instance_group [
{
count: 2
kind: KIND_GPU
gpus: [0]
}
]
parameters: {
key: "FORCE_CPU_ONLY_INPUT_TENSORS"
value: { string_value: "no" }
}curl -X POST http://localhost:4603/ocr/predict \
-F "image=@your_image.jpg" \
-F "min_det_score=0.5" \
-F "min_rec_score=0.8"Response:
{
"status": "success",
"texts": ["Hello", "World"],
"boxes": [[x1,y1,x2,y2,x3,y3,x4,y4], ...],
"boxes_normalized": [[0.1, 0.2, 0.5, 0.3], ...],
"det_scores": [0.95, 0.88],
"rec_scores": [0.99, 0.97],
"num_texts": 2,
"image_size": [480, 640]
}curl -X POST http://localhost:4603/ocr/batch \
-F "images=@image1.jpg" \
-F "images=@image2.jpg" \
-F "images=@image3.jpg"curl -X POST http://localhost:4603/search/ocr \
-H "Content-Type: application/json" \
-d '{"query": "invoice", "top_k": 10}'import requests
def extract_text(image_path: str, api_url: str = "http://localhost:4603") -> dict:
"""Extract text from an image using OCR API."""
with open(image_path, 'rb') as f:
response = requests.post(
f"{api_url}/ocr/predict",
files={"image": f},
data={"min_det_score": 0.5, "min_rec_score": 0.8}
)
return response.json()
# Usage
result = extract_text("test_images/ocr-synthetic/invoice.jpg")
print(f"Found {result['num_texts']} text regions:")
for text, score in zip(result['texts'], result['rec_scores']):
print(f" '{text}' (confidence: {score:.2f})")| Parameter | Default | Tuning |
|---|---|---|
max_batch_size |
4 | Increase for batch processing |
instance_count |
2 | Match to GPU utilization |
max_queue_delay |
5ms | Lower for latency, higher for throughput |
| Parameter | Default | Tuning |
|---|---|---|
max_batch_size |
64 | Increase for batch processing |
instance_count |
2 | Increase for parallel crops |
| Dynamic width | 48-2048 | Narrow range if text length is known |
-
Reduce max shapes if you know image sizes:
--maxShapes=x:4x3x640x640 # Instead of 960x960 -
Use FP16 (already enabled by default)
-
Enable CUDA graphs for repeated inference:
optimization { cuda { graphs: true } }
| Configuration | Throughput | Latency (p50) |
|---|---|---|
| Single image | 15-20 RPS | 50-70ms |
| Batch (4 images) | 40-50 RPS | 100-150ms |
Cause: Insufficient TensorRT workspace memory.
Solution: Use correct syntax for workspace allocation:
# TensorRT 10+ syntax:
--memPoolSize=workspace:4G
# Not:
--workspace=4096 # DeprecatedCause: TensorRT engine was built with different GPU or TensorRT version.
Solution: Rebuild the engine on the target GPU:
rm models/paddleocr_det_trt/1/model.plan
rm models/paddleocr_rec_trt/1/model.plan
./scripts/export_paddleocr.sh trtCause: Model not loaded or config error.
Solution:
- Check model files exist:
ls -la models/paddleocr_det_trt/1/model.plan ls -la models/paddleocr_rec_trt/1/model.plan
- Check Triton logs:
docker compose logs triton-server | grep -i error - Reload models:
curl -X POST localhost:4600/v2/repository/models/paddleocr_det_trt/load
Cause: Detection threshold too high or preprocessing mismatch.
Solution:
- Lower detection threshold:
curl -X POST ... -F "min_det_score=0.3" - Check preprocessing:
- Input should be BGR, not RGB
- Normalization: (x / 127.5) - 1 = range [-1, 1]
- Image padded to 32-pixel boundary
Cause: Multiple models competing for GPU memory.
Solution:
- Unload unused models:
curl -X POST localhost:4600/v2/repository/models/yolov11_small_trt/unload
- Reduce instance count in config.pbtxt
- Use TensorRT FP16 mode (already default)
models/
├── paddleocr_det_trt/
│ ├── 1/model.plan # TensorRT detection engine
│ └── config.pbtxt # Triton config
├── paddleocr_rec_trt/
│ ├── 1/model.plan # TensorRT recognition engine
│ ├── config.pbtxt # Triton config
│ └── ppocrv5_dict.txt # Character dictionary (18383 chars, multilingual)
└── ocr_pipeline/
├── 1/model.py # Python BLS orchestrator
└── config.pbtxt # Triton config
export/
├── download_paddleocr.py # Download ONNX models
├── export_paddleocr_det.py # Export detection to TRT
└── export_paddleocr_rec.py # Export recognition to TRT
scripts/
├── export_paddleocr.sh # Automated export script
└── test_ocr_pipeline.py # OCR testing script
src/
├── services/ocr_service.py # OCR service wrapper
└── clients/triton_client.py # Triton gRPC client (infer_ocr method)
Version History
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2025-01-02 | Initial PP-OCRv5 implementation |
| 1.1 | 2025-01-10 | Added workspace memory fix documentation |
| 2.0 | 2026-01-26 | Consolidated documentation |
| 3.0 | 2026-01-27 | Switched to multilingual model (18385 classes), removed PaddleX dependency |