- @phantom0174
- @yuchenleeBB
This repository is the public release of the final project for the NTHU Parallel Programming course.
It is designed to significantly accelerate the original photo-mosaic implementation, making it feasible to generate photo mosaics for every frame of a video within a reasonable amount of time.
Additional details can be found in the project slides.
.
├── src/common.h # shared configuration
├── src/opt_turbo.cu # CUDA version (migrated from HIP version)
├── src/opt_turbo.cpp # HIP version
├── src/make_cache.cpp # preprocess tile library
├── tools/sewing.txt # ffmpeg command for stitching frames
├── tools/process_pdf.py # preprocess Epstein pdf files into images
├── bad_frames.zip # Bad Apple frame DB
- target: Bad Apple (6571 frames)
- tile library: Bad Apple frames
- resolution: 40x40
- output scale: 3.0 (1440*1080)
| version | system spec | core matching time | equivalent speedup | comment |
|---|---|---|---|---|
| photo-mosaic | TR 7960X w/ 2x 5070 Ti 16G | ~10 mins. | baseline | |
| HIP | EPYC 7742 w/ 2x MI100 | 33 secs. | x18.2 | I/O bound (CPU) |
| CUDA (opt) | Ultra 7 265K w/ 1x 5060 Ti 16G | 6.5 secs. | x184.6 |
- Host OS: Windows 11
- Linux Environment: WSL2
- Kernel: 6.6.87.2-microsoft-standard-WSL2
- Architecture: x86_64
- GPU: NVIDIA RTX 5060 Ti 16GB
- NVIDIA Driver: supports CUDA 13.1
- CUDA Runtime: 13.1
- OS: Debian GNU/Linux
- Kernel: 6.1.0-39-amd64
- Architecture: x86_64
- GPU: AMD MI100
- ROCm: unavailable (currently offline)
frame extraction for target videos (
get_frames.py) can be found in photo-mosaic
- Get the essential tools:
run
get_ffmpeg.sh,get_img_turbo.sh - Tune matching parameters in
common.h - Build the cache generator:
make make-cache
- Build the executable:
make opt-turbo (hip) / opt-cuda (cuda)
- Generate the tile cache:
run
make-cache - Run the mosaic generator:
run
opt-turbo/opt-cuda - Stitch frames into a video using the command in
sewing.txt - Done.