Skip to content

Add Apple Silicon MPS support, PyTorch 2.6+ compat, and Real-ESRGAN enhancement#161

Open
chuzirui wants to merge 3 commits intoHypoX64:masterfrom
chuzirui:fix/mps-support-and-pytorch2-compat
Open

Add Apple Silicon MPS support, PyTorch 2.6+ compat, and Real-ESRGAN enhancement#161
chuzirui wants to merge 3 commits intoHypoX64:masterfrom
chuzirui:fix/mps-support-and-pytorch2-compat

Conversation

@chuzirui
Copy link
Copy Markdown

@chuzirui chuzirui commented Mar 13, 2026

Summary

  • Apple Silicon MPS GPU support: Auto-detect and use MPS backend when CUDA is unavailable on macOS
  • PyTorch 2.6+ compatibility: Add weights_only=False to all torch.load() calls
  • Performance optimizations: Batched segmentation, parallel disk I/O, prefetch threading, hardware video encoding (h264_videotoolbox)
  • Real-ESRGAN enhancement (--enhance flag): Optional post-processing that applies Real-ESRGAN super-resolution to cleaned mosaic patches before compositing, significantly improving output quality
  • Process hang fixes: Replace multiprocessing.Queue with threading.Queue, guard all input() calls for non-interactive terminals, add error handling and sentinel to prefetch thread

Changes

New files

  • models/enhancer.py — Real-ESRGAN wrapper that lazy-loads the model and enhances cleaned patches

Modified files

  • cores/options.py — MPS auto-detection, --enhance flag, input() guards
  • cores/clean.py — Batched segmentation, prefetch with error handling, parallel I/O, Real-ESRGAN integration in all clean paths, stdout flush
  • cores/init.py — Non-interactive terminal support
  • models/loadmodel.pyweights_only=False, torch.compile guards for MPS/CPU
  • models/model_util.pyget_device() abstraction for MPS/CUDA/CPU
  • models/runmodel.pytorch.no_grad(), autocast (CUDA only), batch segmentation
  • util/data.pynp.ascontiguousarray() for tensor stride fixes, MPS device support
  • util/ffmpeg.py — Hardware encoder detection (h264_videotoolbox)
  • deepmosaic.pyinput() guards for non-interactive mode
  • train/add/train.pyweights_only=False

Usage

# Standard clean
python deepmosaic.py --media_path video.mp4 --model_path ./pretrained_models/mosaic/clean_youknow_video.pth --gpu_id 0

# With Real-ESRGAN enhancement for better quality
python deepmosaic.py --media_path video.mp4 --model_path ./pretrained_models/mosaic/clean_youknow_video.pth --gpu_id 0 --enhance

Test plan

  • Tested on macOS 14.7 / Apple Silicon (M-series) with MPS backend
  • Verified torch.load compatibility with PyTorch 2.10
  • Verified --enhance flag loads Real-ESRGAN and processes patches
  • Verified process exits cleanly (no hang) with --no_preview

- Add MPS (Metal Performance Shaders) backend detection so Apple Silicon
  Macs use GPU acceleration instead of falling back to CPU
- Add weights_only=False to all torch.load() calls for PyTorch 2.6+
  which changed the default to weights_only=True
- Add torch.no_grad() to inference paths (run_segment, run_pix2pix) to
  avoid unnecessary gradient computation
- Fix Queue.qsize() NotImplementedError on macOS
- Add prefetch thread for frame I/O in video fusion to overlap disk
  reads with model inference

Made-with: Cursor
Performance:
- Batch segmentation: process 4 frames at once through BiSeNet
- ThreadPoolExecutor for concurrent disk writes (masks + results)
- Prefetch I/O thread for frame loading in video fusion
- torch.compile() for CUDA model acceleration (skipped on MPS)
- float16 autocast for CUDA inference paths
- Contiguous tensor conversion to avoid stride mismatches
- Hardware-accelerated h264_videotoolbox encoder on macOS

Fixes:
- Replace multiprocessing.Queue with threading queue.Queue to
  prevent process hang from undrained pipe buffers on exit
- Guard all input() prompts with sys.stdin.isatty() so the process
  exits cleanly in non-interactive terminals
- Auto-resume unfinished videos in non-interactive mode

Made-with: Cursor
Integrate Real-ESRGAN as an optional post-processing step (--enhance flag)
that sharpens the 256x256 cleaned mosaic patches before compositing them
back into the frame. Also harden the prefetch thread with error handling
and a sentinel to prevent silent hangs, and flush stdout for step progress.

Made-with: Cursor
@chuzirui chuzirui changed the title Add Apple Silicon MPS GPU support and fix PyTorch 2.6+ compatibility Add Apple Silicon MPS support, PyTorch 2.6+ compat, and Real-ESRGAN enhancement Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant