PP-OCRv5 OCR Documentation

Complete guide for deploying and using PP-OCRv5 text detection and recognition on NVIDIA Triton Inference Server.

Architecture Overview
Model Specifications
Prerequisites
Setup and Deployment
Configuration
API Usage
Performance Tuning
Troubleshooting
File Reference

Architecture Overview

The OCR pipeline consists of three Triton models working together:

                              +-------------------+
                              |   ocr_pipeline    |
                              |   (Python BLS)    |
                              +--------+----------+
                                       |
              +------------------------+------------------------+
              |                                                 |
    +---------v----------+                          +-----------v---------+
    | paddleocr_det_trt  |                          | paddleocr_rec_trt   |
    |    (TensorRT)      |                          |    (TensorRT)       |
    |   DB++ Detection   |                          | SVTR-LCNet Recog    |
    +--------------------+                          +---------------------+

Components

Model	Architecture	Purpose
`paddleocr_det_trt`	DB++ (Differentiable Binarization)	Detect text regions in images
`paddleocr_rec_trt`	SVTR-LCNet	Recognize text within cropped regions
`ocr_pipeline`	Python BLS (Backend Language Support)	Orchestrate detection and recognition

Data Flow

Input: Raw image bytes (JPEG/PNG)
Preprocessing: Resize, normalize (x/127.5 - 1), pad to 32-boundary
Detection: Probability map identifying text regions
Post-processing: Threshold, contour detection, box expansion (unclip)
Cropping: Perspective transform to extract text crops
Recognition: CTC decoder produces text strings
Output: Text strings with bounding boxes and confidence scores

Model Specifications

Detection Model (DB++)

Property	Value
Model	PP-OCRv5 Mobile Detection
Architecture	DB++ (Differentiable Binarization)
Input Shape	`[B, 3, H, W]` where H,W are multiples of 32, max 960
Input Format	FP32, BGR, normalized (x/127.5 - 1)
Output Shape	`[B, 1, H, W]` probability map
Output Range	[0, 1] sigmoid probabilities
Max Batch Size	4

Recognition Model (SVTR-LCNet)

Property	Value
Model	PP-OCRv5 Mobile Recognition (Multilingual)
Architecture	SVTR-LCNet
Input Shape	`[B, 3, 48, W]` where W is 48-2048 (dynamic)
Input Format	FP32, BGR, normalized (x/127.5 - 1)
Output Shape	`[B, T, 18385]` character probabilities
Output Format	Softmax probabilities for CTC decoding
Character Set	18385 characters (18383 from dict + blank + space)
Max Batch Size	64

Dictionary Configuration

The recognition model uses ppocrv5_dict.txt with 18383 characters:

Chinese, Japanese, Korean characters
English uppercase/lowercase letters
Digits 0-9
Common punctuation and symbols (multilingual)
Special tokens: blank (index 0), space (index 18384)

Prerequisites

Hardware Requirements

NVIDIA GPU with at least 4GB VRAM
Recommended: RTX 3000/4000 series or A-series datacenter GPUs
16GB+ system RAM for TensorRT engine building

Software Requirements

Docker with NVIDIA Container Toolkit
NVIDIA GPU with TensorRT 10.x support
Python 3.10+ with the following packages:
- onnx, onnxruntime-gpu

Setup and Deployment

Step 1: Download Models

Option A: Use Pre-converted ONNX Models

# From the yolo-api container:
docker compose exec yolo-api python /app/export/download_paddleocr.py

# Or run the convenience script:
./scripts/export_paddleocr.sh download

This downloads:

ppocr_det_v5_mobile.onnx (~5MB) - Detection model
ppocr_rec_v5_mobile.onnx (~16MB) - Multilingual recognition model
Character dictionaries

Step 2: Export to TensorRT

CRITICAL: TensorRT requires sufficient workspace memory. Use correct syntax for TensorRT 10+:

# CORRECT syntax:
--memPoolSize=workspace:4G

# INCORRECT (will cause "Cudnn Error: CUDNN_STATUS_NOT_SUPPORTED"):
--workspace=4096    # Old deprecated syntax
--memPoolSize=4096  # Missing workspace: prefix and unit

Export Detection Model

docker compose exec triton-server /usr/src/tensorrt/bin/trtexec \
    --onnx=/models/ppocr_det_v5_mobile.onnx \
    --saveEngine=/models/paddleocr_det_trt/1/model.plan \
    --fp16 \
    --minShapes=x:1x3x32x32 \
    --optShapes=x:1x3x736x736 \
    --maxShapes=x:4x3x960x960 \
    --memPoolSize=workspace:4G

Export Recognition Model

docker compose exec triton-server /usr/src/tensorrt/bin/trtexec \
    --onnx=/models/ppocr_rec_v5_mobile.onnx \
    --saveEngine=/models/paddleocr_rec_trt/1/model.plan \
    --fp16 \
    --minShapes=x:1x3x48x48 \
    --optShapes=x:32x3x48x320 \
    --maxShapes=x:64x3x48x2048 \
    --memPoolSize=workspace:4G

Automated Export Script

Use the provided script for complete export:

./scripts/export_paddleocr.sh all

This handles:

Model download (if needed)
GPU memory management (unloads other models)
TensorRT conversion with correct parameters
Dictionary file placement
Triton config generation

Step 3: Setup Dictionary

The dictionary file must be in the recognition model directory:

# Verify dictionary exists (should have 18383 lines, plus blank+space = 18385 total)
wc -l models/paddleocr_rec_trt/ppocrv5_dict.txt

# Expected output: 18383 lines
# Total classes: 18385 (18383 dict + blank token + space token)

Step 4: Deploy and Test

Restart Triton

# Reload all models
docker compose restart triton-server

# Or load specific models via API
curl -X POST localhost:4600/v2/repository/models/paddleocr_det_trt/load
curl -X POST localhost:4600/v2/repository/models/paddleocr_rec_trt/load
curl -X POST localhost:4600/v2/repository/models/ocr_pipeline/load

Verify Models Loaded

curl -s localhost:4600/v2/models | jq '.models[] | select(.name | startswith("paddle") or startswith("ocr"))'

Expected output:

{"name": "paddleocr_det_trt", "state": "READY"}
{"name": "paddleocr_rec_trt", "state": "READY"}
{"name": "ocr_pipeline", "state": "READY"}

Run Tests

# Test via API endpoint
python scripts/test_ocr_pipeline.py

# Or use curl
curl -X POST http://localhost:4603/ocr/predict \
    -F "image=@test_images/ocr-synthetic/hello_world.jpg"

Configuration

Detection Model Config

File: models/paddleocr_det_trt/config.pbtxt

name: "paddleocr_det_trt"
platform: "tensorrt_plan"
max_batch_size: 4

input [
  {
    name: "x"
    data_type: TYPE_FP32
    dims: [ 3, -1, -1 ]  # Dynamic H, W (multiples of 32)
  }
]

output [
  {
    name: "fetch_name_0"
    data_type: TYPE_FP32
    dims: [ 1, -1, -1 ]  # Same H, W as input
  }
]

instance_group [
  {
    count: 2
    kind: KIND_GPU
    gpus: [0]
  }
]

dynamic_batching {
  preferred_batch_size: [ 1, 2, 4 ]
  max_queue_delay_microseconds: 5000
}

Recognition Model Config

File: models/paddleocr_rec_trt/config.pbtxt

name: "paddleocr_rec_trt"
platform: "tensorrt_plan"
max_batch_size: 64

input [
  {
    name: "x"
    data_type: TYPE_FP32
    dims: [ 3, 48, -1 ]  # Dynamic width: 48-2048
  }
]

output [
  {
    name: "fetch_name_0"
    data_type: TYPE_FP32
    dims: [ -1, 18385 ]  # Dynamic timesteps, 18385 characters (multilingual)
  }
]

instance_group [
  {
    count: 2
    kind: KIND_GPU
    gpus: [0]
  }
]

dynamic_batching {
  preferred_batch_size: [ 16, 32, 64 ]
  max_queue_delay_microseconds: 10000
}

OCR Pipeline Config

File: models/ocr_pipeline/config.pbtxt

name: "ocr_pipeline"
backend: "python"
max_batch_size: 0  # BLS handles batching

input [
  {
    name: "ocr_images"
    data_type: TYPE_FP32
    dims: [ 3, -1, -1 ]  # Preprocessed for detection
  },
  {
    name: "original_image"
    data_type: TYPE_FP32
    dims: [ 3, -1, -1 ]  # For text crop extraction
  },
  {
    name: "orig_shape"
    data_type: TYPE_INT32
    dims: [ 2 ]  # [H, W]
  }
]

output [
  {
    name: "num_texts"
    data_type: TYPE_INT32
    dims: [ 1 ]
  },
  {
    name: "text_boxes"
    data_type: TYPE_FP32
    dims: [ 128, 8 ]  # Quadrilateral coordinates
  },
  {
    name: "text_boxes_normalized"
    data_type: TYPE_FP32
    dims: [ 128, 4 ]  # Axis-aligned [x1, y1, x2, y2]
  },
  {
    name: "texts"
    data_type: TYPE_STRING
    dims: [ 128 ]
  },
  {
    name: "text_scores"
    data_type: TYPE_FP32
    dims: [ 128 ]  # Detection confidence
  },
  {
    name: "rec_scores"
    data_type: TYPE_FP32
    dims: [ 128 ]  # Recognition confidence
  }
]

instance_group [
  {
    count: 2
    kind: KIND_GPU
    gpus: [0]
  }
]

parameters: {
  key: "FORCE_CPU_ONLY_INPUT_TENSORS"
  value: { string_value: "no" }
}

API Usage

Extract Text (Single Image)

curl -X POST http://localhost:4603/ocr/predict \
    -F "image=@your_image.jpg" \
    -F "min_det_score=0.5" \
    -F "min_rec_score=0.8"

Response:

{
  "status": "success",
  "texts": ["Hello", "World"],
  "boxes": [[x1,y1,x2,y2,x3,y3,x4,y4], ...],
  "boxes_normalized": [[0.1, 0.2, 0.5, 0.3], ...],
  "det_scores": [0.95, 0.88],
  "rec_scores": [0.99, 0.97],
  "num_texts": 2,
  "image_size": [480, 640]
}

Batch Processing

curl -X POST http://localhost:4603/ocr/batch \
    -F "images=@image1.jpg" \
    -F "images=@image2.jpg" \
    -F "images=@image3.jpg"

Search Images by Text

curl -X POST http://localhost:4603/search/ocr \
    -H "Content-Type: application/json" \
    -d '{"query": "invoice", "top_k": 10}'

Python Client

import requests

def extract_text(image_path: str, api_url: str = "http://localhost:4603") -> dict:
    """Extract text from an image using OCR API."""
    with open(image_path, 'rb') as f:
        response = requests.post(
            f"{api_url}/ocr/predict",
            files={"image": f},
            data={"min_det_score": 0.5, "min_rec_score": 0.8}
        )
    return response.json()

# Usage
result = extract_text("test_images/ocr-synthetic/invoice.jpg")
print(f"Found {result['num_texts']} text regions:")
for text, score in zip(result['texts'], result['rec_scores']):
    print(f"  '{text}' (confidence: {score:.2f})")

Performance Tuning

Optimizing Detection

Parameter	Default	Tuning
`max_batch_size`	4	Increase for batch processing
`instance_count`	2	Match to GPU utilization
`max_queue_delay`	5ms	Lower for latency, higher for throughput

Optimizing Recognition

Parameter	Default	Tuning
`max_batch_size`	64	Increase for batch processing
`instance_count`	2	Increase for parallel crops
Dynamic width	48-2048	Narrow range if text length is known

Memory Optimization

Reduce max shapes if you know image sizes:

--maxShapes=x:4x3x640x640  # Instead of 960x960

Use FP16 (already enabled by default)

Enable CUDA graphs for repeated inference:

optimization {
  cuda {
    graphs: true
  }
}

Throughput Benchmarks

Configuration	Throughput	Latency (p50)
Single image	15-20 RPS	50-70ms
Batch (4 images)	40-50 RPS	100-150ms

Troubleshooting

Common Errors

"Cudnn Error: CUDNN_STATUS_NOT_SUPPORTED"

Cause: Insufficient TensorRT workspace memory.

Solution: Use correct syntax for workspace allocation:

# TensorRT 10+ syntax:
--memPoolSize=workspace:4G

# Not:
--workspace=4096  # Deprecated

"Engine deserialization failed"

Cause: TensorRT engine was built with different GPU or TensorRT version.

Solution: Rebuild the engine on the target GPU:

rm models/paddleocr_det_trt/1/model.plan
rm models/paddleocr_rec_trt/1/model.plan
./scripts/export_paddleocr.sh trt

"Model not found: paddleocr_det_trt"

Cause: Model not loaded or config error.

Solution:

Check model files exist:

ls -la models/paddleocr_det_trt/1/model.plan
ls -la models/paddleocr_rec_trt/1/model.plan

Check Triton logs:

docker compose logs triton-server | grep -i error

Reload models:

curl -X POST localhost:4600/v2/repository/models/paddleocr_det_trt/load

Empty OCR Results

Cause: Detection threshold too high or preprocessing mismatch.

Solution:

Lower detection threshold:
```
curl -X POST ... -F "min_det_score=0.3"
```
Check preprocessing:
- Input should be BGR, not RGB
- Normalization: (x / 127.5) - 1 = range [-1, 1]
- Image padded to 32-pixel boundary

GPU Out of Memory

Cause: Multiple models competing for GPU memory.

Solution:

Unload unused models:

curl -X POST localhost:4600/v2/repository/models/yolov11_small_trt/unload

Reduce instance count in config.pbtxt
Use TensorRT FP16 mode (already default)

File Reference

Model Files

models/
├── paddleocr_det_trt/
│   ├── 1/model.plan              # TensorRT detection engine
│   └── config.pbtxt              # Triton config
├── paddleocr_rec_trt/
│   ├── 1/model.plan              # TensorRT recognition engine
│   ├── config.pbtxt              # Triton config
│   └── ppocrv5_dict.txt           # Character dictionary (18383 chars, multilingual)
└── ocr_pipeline/
    ├── 1/model.py                # Python BLS orchestrator
    └── config.pbtxt              # Triton config

Source Files

export/
├── download_paddleocr.py         # Download ONNX models
├── export_paddleocr_det.py       # Export detection to TRT
└── export_paddleocr_rec.py       # Export recognition to TRT

scripts/
├── export_paddleocr.sh           # Automated export script
└── test_ocr_pipeline.py          # OCR testing script

src/
├── services/ocr_service.py       # OCR service wrapper
└── clients/triton_client.py      # Triton gRPC client (infer_ocr method)

References

Version History

Version	Date	Changes
1.0	2025-01-02	Initial PP-OCRv5 implementation
1.1	2025-01-10	Added workspace memory fix documentation
2.0	2026-01-26	Consolidated documentation
3.0	2026-01-27	Switched to multilingual model (18385 classes), removed PaddleX dependency

FilesExpand file tree

OCR.md

Latest commit

History

OCR.md

File metadata and controls

PP-OCRv5 OCR Documentation

Table of Contents

Architecture Overview

Components

Data Flow

Model Specifications

Detection Model (DB++)

Recognition Model (SVTR-LCNet)

Dictionary Configuration

Prerequisites

Hardware Requirements

Software Requirements

Setup and Deployment

Step 1: Download Models

Option A: Use Pre-converted ONNX Models

Step 2: Export to TensorRT

Export Detection Model

Export Recognition Model

Automated Export Script

Step 3: Setup Dictionary

Step 4: Deploy and Test

Restart Triton

Verify Models Loaded

Run Tests

Configuration

Detection Model Config

Recognition Model Config

OCR Pipeline Config

API Usage

Extract Text (Single Image)

Batch Processing

Search Images by Text

Python Client

Performance Tuning

Optimizing Detection

Optimizing Recognition

Memory Optimization

Throughput Benchmarks

Troubleshooting

Common Errors

"Cudnn Error: CUDNN_STATUS_NOT_SUPPORTED"

"Engine deserialization failed"

"Model not found: paddleocr_det_trt"

Empty OCR Results

GPU Out of Memory

File Reference

Model Files

Source Files

References