Skip to content

High-Performance Zero-Host-Copy Inference Pipeline (C++/CUDA)

License

Notifications You must be signed in to change notification settings

Igkho/ZeroHostCopyInference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

High-Performance Zero-Host-Copy Inference Pipeline (C++/CUDA)

Status Platform Porting Language

Author: Igor Khozhanov

Contact: khozhanov@gmail.com

Copyright: Β© 2026 Igor Khozhanov. All Rights Reserved.


🎬 Real-Time Output

Processing 1440p Video Stream @ ~118 FPS on RTX 3060 Ti.

Crop & Weed Detection Demo


⚠️ Current Development Status: Phase 5 (Porting)

The previous phases Phase 3 (Integration) and Phase 4 (Functional Inference) completed. The pipeline now supports full end-to-end detection and tracking using TensorRT and ONNX backends with mathematically verified kernels.

Note for Reviewers: This repository is currently under active development. The pipeline is being implemented in stages to ensure memory safety and zero-host-copy verification.

Module / Stage Status Notes
FFMpeg Source βœ… Stable Handles stream connection and packet extraction.
Stub Detector βœ… Stable Pass-through module, validated for pipeline latency profiling.
Output / NVJpeg βœ… Stable Saves frames from GPU memory to disk as separate *.jpg images.
Inference Pipeline βœ… Stable Connects all the stages together.
ONNX Detector βœ… Stable Implemented with Zero-Copy input.
TensorRT Detector βœ… Stable Engine builder & enqueueV3 implemented.
Object Tracker βœ… Stable Kernels for position prediction, IOU matching, velocity filtering.
Post-Processing βœ… Stable Custom CUDA kernels for YOLOv8 output decoding & NMS.
Windows Port 🚧 WIP Adapting CMake & CUDA
Jetson Port 🚧 WIP ARM64 optimization.

Project Overview

This project implements a high-performance video inference pipeline designed to minimize CPU-GPU bandwidth usage. Unlike standard OpenCV implementations, this pipeline keeps data entirely on the VRAM (Zero-Host-Copy) from decoding to inference.

πŸ“₯ Model Setup (Required)

This repository contains the Inference Engine (MIT Licensed). It does not include pre-trained model weights.

To reproduce the demo results (Crop & Weed Detection), you must download pre-trained YOLOv8 model separately.

Download the Model

The model is hosted in the research repository (AGPL-3.0):

  • Download: best.onnx
  • License: AGPL-3.0 (Derived from Ultralytics YOLOv8)

How to Build & Test (Current Version)

Compatibility

Supported Platforms

  • βœ… Linux x64 (Verified on Ubuntu 24.04 / RTX 3060 Ti)
  • 🚧 Windows 10/11 (Build scripts implemented, pending validation)
  • 🚧 Nvidia Jetson Orin (CMake configuration ready, pending hardware tests)

Note: The CMakeLists.txt contains specific logic for vcpkg (Windows) and aarch64 (Jetson), but these targets are currently experimental.

Dependencies

Build Time

  • CMake 3.19+
  • CUDA Toolkit (12.x)
  • TensorRT 10.x+
  • FFmpeg: Required.
    • Linux Users: Install via package manager or build from source with --enable-shared.

Runtime Requirements

  • NVIDIA cuDNN: Required by ONNX Runtime CUDA provider.
    • Note: Ensure libcudnn.so is in your LD_LIBRARY_PATH or installed system-wide.

Compilation & Run

Build & Run (Native)

git clone https://github.com/Igkho/ZeroHostCopyInference.git
cd ZeroHostCopyInference

mkdir build
mv ~/Downloads/best.onnx ./build/

cd build
cmake ..
make -j$(nproc)

Run pipeline

./ZeroCopyInference -i ../video/Moving.mp4 --backend trt --model best.onnx -b 16 -o Moving

Run tests

./ZeroCopyInferenceTests

Quick Start (Docker)

No C++ compilation required. Requires NVIDIA Container Toolkit.

Run pipeline

git clone https://github.com/Igkho/ZeroHostCopyInference.git
cd ZeroHostCopyInference

mkdir models
mv ~/Downloads/best.onnx ./models/

docker run --rm --gpus all \
  -v $(pwd)/video:/app/video \
  -v $(pwd)/models:/app/models \
  ghcr.io/igkho/zerohostcopyinference:main \
  -i video/Moving.mp4 \
  --backend trt \
  --model /app/models/best.onnx \
  -b 16 \
  -o video/output

Run tests

docker run --rm --gpus all \
  --entrypoint ./build/ZeroCopyInferenceTests \
  ghcr.io/igkho/zerohostcopyinference:main

πŸš€ Performance Benchmarks

Benchmarks performed on NVIDIA RTX 3060 Ti. Input: 1440p Video Stream. Model: YOLOv8 Medium (YOLOv8m) @ 1024x1024 Resolution.

1. Infrastructure Ceiling (Stub Mode)

To measure the raw overhead of the pipeline architecture (I/O latency), a pass-through (Stub) detector should be used.

Metric Result Notes
Throughput ~300 FPS Maximum theoretical speed without AI model.
Latency 3.3 ms Combined Decoding + Memory Management overhead.

2. Real-World Inference (TensorRT FP16 Mode)

Running YOLOv8m (FP16 optimized) with full object tracking and NVJpeg output.

Metric Result Notes
Total Throughput 118.10 FPS Wall time (End-to-End). 2x Real-Time.
Pipeline Latency ~8.5 ms Average per frame.
Bottleneck Decoding Inference is so fast (5.5ms) that Video Decoding (7ms) becomes the primary factor.

Workload Distribution:

  • Decoding: ~7.07 ms/frame (48% load)
  • Inference: ~5.58 ms/frame (38% load)
  • Storage/IO: ~1.93 ms/frame (13% load)

3. Backend Comparison

Both TensorRT (Highly Optimized) and ONNX Runtime (Generic Compatibility) are supported.

Scenario: 1024x1024 Input Resolution on RTX 3060 Ti.

Backend FPS Latency (Inf) Speedup Factor Notes
TensorRT (FP16) 118.1 FPS ~5.6 ms 1.0x (Ref) Utilizes Tensor Cores. Recommended.
ONNX Runtime ~10.5 FPS ~94.8 ms 0.08x Generic execution. Useful for testing new models.

βš–οΈ License

The source code of this project is licensed under the MIT License. You are free to use, modify, and distribute this infrastructure code for any purpose, including commercial applications.

πŸ›‘ Asset & Model Licensing Exceptions

While the code is MIT-licensed, the assets and models used in this repository are subject to different terms. Please review them carefully before redistributing:

1. Video Assets (Non-Commercial Only)

  • Files: Content located in the video/ directory (e.g., Moving.mp4, Moving_annotated.gif).
  • Source: Generated using KlingAI (Free Tier).
  • Terms: These assets are provided for demonstration and educational purposes only. They are strictly non-commercial. You may not use these specific video files in any commercial product or service.
  • Attribution: The watermarks on these videos must remain intact as per the platform's Terms of Service.

2. Model Licensing

  • Example: If you use YOLOv8 (Ultralytics) with this pipeline, be aware that YOLOv8 is licensed under AGPL-3.0.
  • Implication: Integrating an AGPL-3.0 model may legally require your entire combined application to comply with AGPL-3.0 terms (i.e., open-sourcing your entire project).

User Responsibility: This repository provides the execution engine only. No models are bundled. You are responsible for verifying and complying with the license of any specific ONNX/TensorRT model you choose to load.

About

High-Performance Zero-Host-Copy Inference Pipeline (C++/CUDA)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages