ReTiDe-Real-Time-Denoising-for-Energy-Efficient-Motion-Picture-Processing-with-FPGAs

Changhong Li, Clement Bled, Rosa Fernandez, Shreejith Shanker.

Conference Homepage: CVMP 2025

Abstract

Denoising is a core operation in modern video pipelines. In codecs, in-loop filters suppress sensor noise and quantisation artefacts to improve rate-distortion performance; in cinema post-production, denoisers are used for restoration, grain management, and plate clean-up. However, state-of-the-art deep denoisers are computationally intensive and, at scale, are typically deployed on GPUs, incurring high power and cost for real-time, high-resolution streams. ReTiDe (Real-Time Denoise) is a hardware-accelerated denoising system that serves inference on data-centre Field Programmable Gate Arrays (FPGAs). A compact convolutional model is quantised (post-training quantisation plus quantisation-aware fine-tuning) to INT8 and compiled for AMD Deep Learning Processor Unit (DPU)-based FPGAs. A client-server integration offloads computation from the host CPU/GPU to a networked FPGA service, while remaining callable from existing workflows, e.g., NUKE, without disrupting artist tooling. On representative benchmarks, ReTiDe delivers 37.71× Giga Operations Per Second (GOPS) throughput and 5.29× higher energy efficiency than prior FPGA denoising accelerators, with negligible degradation in Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM). These results indicate that specialised accelerators can provide practical, scalable denoising for both encoding pipelines and post-production, reducing energy per frame without sacrificing quality or workflow compatibility.

Dataset

The datasets we used for benchmarking are listed as follows:

Architecture

The integration workflow of Vitis AI with NUKE is illustrated in the following figure. The AMD Vitis AI toolchain provides an end-to-end workflow for deploying quantised neural networks, bridging the gap between machine learning frameworks and FPGA-based deployment. Moreover, we also provide general client-server interfaces for the integration of other software. Starting from FP32 model descriptions written in popular frameworks such as TensorFlow and PyTorch, the toolchain performs model quantisation and operator conversion. The converted models can then be accelerated on the Deep Learning Processing Unit (DPU), which is specifically designed for convolutional and matrix-intensive workloads. By mapping computation-intensive kernels directly onto dedicated hardware engines, the toolchain not only reduces CPU overhead but also maximises parallelism and memory bandwidth utilisation. This hardware–software co-design significantly improves inference throughput while simultaneously reducing power consumption.
U50 server-level FPGA accelerator card. Figure~\ref{fig:4} demonstrates a processing flow from the original noisy image ($\sigma=50$) to the denoised image with segmentation and corresponding parallelisation. Upon receiving denoising requests initiated by remote or local hosts, the incoming image or video stream is first processed by a pre-processor, which segments and batches the media into standardised input formats suitable for the model. This also facilitates parallel processing across multiple threads and DPU units.

Result

We benchmarked our model with both color and grayscale datasets and the results can be found as follows.

PSNR (dB) Comparison of Various Algorithms for Grayscale Image Denoising

Type	Method	BSD68 15	BSD68 25	BSD68 50	URBAN100 15	URBAN100 25	URBAN100 50
FP32	BM3D	30.95	25.32	24.89	31.91	29.06	24.45
	FFDNet	31.45	28.96	25.16	33.76	31.41	28.09
	IRCNN	31.46	28.79	25.11	33.08	29.62	24.53
	DnCNN-20	31.60	29.14	26.20	33.76	30.19	19.34
	SwinIR	31.76	29.10	25.40	33.44	30.43	25.47
	ReTiDe	31.48	29.09	26.20	33.25	30.60	26.55
QNNs	L-DnCNN	31.44	29.01	26.08	-	-	-
	ReTiDe (P)	29.92	28.35	26.73	29.92	28.35	25.52
	ReTiDe (Q)	30.94	29.23	26.73	30.20	28.46	25.61

PSNR (dB) and SSIM Comparison of Popular Denoisers for Colour Denoising on BSD100

Type	Method	Blind/Nonblind	5 PSNR	5 SSIM	15 PSNR	15 SSIM	25 PSNR	25 SSIM	35 PSNR	35 SSIM	45 PSNR	45 SSIM
FP32	BM3D	Nonblind	39.85	0.98	33.17	0.9223	30.16	0.8598	28.17	0.8007	26.62	0.7470
	DnCNN	Blind	39.72	0.9728	33.46	0.9245	30.56	0.8711	28.62	0.8217	27.11	0.7766
	FFDNet	Nonblind	39.84	0.9788	33.65	0.9265	31.00	0.8772	29.37	0.8337	28.23	0.7958
	IRCNN	Nonblind	39.95	0.9789	33.41	0.9234	30.45	0.8678	28.43	0.8114	26.87	0.7578
	ReTiDe	Blind	39.46	0.9761	33.27	0.9205	30.65	0.8682	29.03	0.8224	27.89	0.7826
QNNs	ReTiDe (P)	Blind	32.94	0.8943	30.38	0.8414	28.95	0.8149	27.96	0.7941	27.06	0.7713
	ReTiDe (Q)	Blind	33.22	0.9425	31.03	0.9008	29.37	0.8576	28.14	0.8164	27.14	0.7811

Performance Comparison Across Platforms: Frequency, Throughput, Power, and Energy Efficiency

Method	Platform	Thr. (GOPS)	Power (W)	Energy Eff. (GOPS/W)
L-DnCNN	I7-7700HQ CPU	29.5	45	0.66
TNet-mini	I5-12400F CPU	164.3	65	2.53
ReTiDe	U7-265K CPU	770.2	42.1	18.30
L-DnCNN	RTX 1070 GPU	1066.7	115	9.28
TNet-mini	RTX 2080Ti GPU	1785.7	250	7.14
ReTiDe	A4000 GPU	8,285.5	236.3	35.06
L-DnCNN	MZU03A-EG FPGA	41.8	2.4	17.18
TNet-mini	MZU03A-EG FPGA	99.3	2.6	38.51
ReTiDe	Alveo U50 FPGA	3,746.1	18.4	203.59

Citation

If you found this work is useful for you, please cite our work below.

@article{li2025retide,
  title={ReTiDe: Real-Time Denoising for Energy-Efficient Motion Picture Processing with FPGAs},
  author={Li, Changhong and Bled, Cl{\'e}ment and Fernandez, Rosa and Shanker, Shreejith},
  journal={arXiv preprint arXiv:2510.03812},
  year={2025}
}

Demo

A demo video could be found at: https://www.youtube.com/watch?v=0epcrRA_f2w

Acknowledgement

This work was funded by the Horizon CL4 2022 - EU Project Emerald – 101119800.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
figs		figs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReTiDe-Real-Time-Denoising-for-Energy-Efficient-Motion-Picture-Processing-with-FPGAs

Abstract

Dataset

Architecture

Result

PSNR (dB) Comparison of Various Algorithms for Grayscale Image Denoising

PSNR (dB) and SSIM Comparison of Popular Denoisers for Colour Denoising on BSD100

Performance Comparison Across Platforms: Frequency, Throughput, Power, and Energy Efficiency

Citation

Demo

Acknowledgement

About

Uh oh!

Releases

Packages

RCSL-TCD/ReTiDe

Folders and files

Latest commit

History

Repository files navigation

ReTiDe-Real-Time-Denoising-for-Energy-Efficient-Motion-Picture-Processing-with-FPGAs

Abstract

Dataset

Architecture

Result

PSNR (dB) Comparison of Various Algorithms for Grayscale Image Denoising

PSNR (dB) and SSIM Comparison of Popular Denoisers for Colour Denoising on BSD100

Performance Comparison Across Platforms: Frequency, Throughput, Power, and Energy Efficiency

Citation

Demo

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages