Using super-resolution techniques, such as U-Net and SRGAN, to build a video super-resolution model.
This Project is part of the course project for MIT 6.8300 Advances in Computer Vision.
All development and data originally stored on Google Drive and Google Colab.
- Train U-Net model using pairs of LR and HR image dataset.
- Process video into the same dimension (256 * 256).
- Complie the model with chosen metrics and loss function, either from pre-defined library function or user-defined fuction. Train with image dataset.
- Read the HR resized video, blur with user-defined kernel size, and predict super-resolution image.
- Generate SR video using predicted frames. That's the 1st version of SR video.
- Compute optical flow using LR video. Generate interpolated frames.
- Use frame_upscale to generate high resolution.
- Use blend_interpolation_with_HR to blend upscaled interpolated frames with SR frames.
- Combine the frames using either blending or stacking function.
- Generate the final SR video. That's the 2nd version of SR video with temporal filtering technique.
Kaggle pulic HR and LR image dataset is used to train both the U-net and SRGAN model.
Source: Aditya Chandrasekhar, “Image super resolution,” Kaggle. Available: https://www.kaggle.com/datasets/adityachandrasekhar/image-super-resolution. Aug. 2020.
Source: Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
A SRGAN model that's almost identical to the paper above was trained with the same training image dataset that's used for U-net.
For demonstration, please go to https://docs.google.com/presentation/d/1YSsu3zNezw-1LPxYm1epp52qNcqd33nW/edit?usp=sharing&ouid=108352897844425359010&rtpof=true&sd=true