Add Apple Silicon MPS support, PyTorch 2.6+ compat, and Real-ESRGAN enhancement#161
Open
chuzirui wants to merge 3 commits intoHypoX64:masterfrom
Open
Add Apple Silicon MPS support, PyTorch 2.6+ compat, and Real-ESRGAN enhancement#161chuzirui wants to merge 3 commits intoHypoX64:masterfrom
chuzirui wants to merge 3 commits intoHypoX64:masterfrom
Conversation
- Add MPS (Metal Performance Shaders) backend detection so Apple Silicon Macs use GPU acceleration instead of falling back to CPU - Add weights_only=False to all torch.load() calls for PyTorch 2.6+ which changed the default to weights_only=True - Add torch.no_grad() to inference paths (run_segment, run_pix2pix) to avoid unnecessary gradient computation - Fix Queue.qsize() NotImplementedError on macOS - Add prefetch thread for frame I/O in video fusion to overlap disk reads with model inference Made-with: Cursor
Performance: - Batch segmentation: process 4 frames at once through BiSeNet - ThreadPoolExecutor for concurrent disk writes (masks + results) - Prefetch I/O thread for frame loading in video fusion - torch.compile() for CUDA model acceleration (skipped on MPS) - float16 autocast for CUDA inference paths - Contiguous tensor conversion to avoid stride mismatches - Hardware-accelerated h264_videotoolbox encoder on macOS Fixes: - Replace multiprocessing.Queue with threading queue.Queue to prevent process hang from undrained pipe buffers on exit - Guard all input() prompts with sys.stdin.isatty() so the process exits cleanly in non-interactive terminals - Auto-resume unfinished videos in non-interactive mode Made-with: Cursor
Integrate Real-ESRGAN as an optional post-processing step (--enhance flag) that sharpens the 256x256 cleaned mosaic patches before compositing them back into the frame. Also harden the prefetch thread with error handling and a sentinel to prevent silent hangs, and flush stdout for step progress. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
weights_only=Falseto alltorch.load()callsh264_videotoolbox)--enhanceflag): Optional post-processing that applies Real-ESRGAN super-resolution to cleaned mosaic patches before compositing, significantly improving output qualitymultiprocessing.Queuewiththreading.Queue, guard allinput()calls for non-interactive terminals, add error handling and sentinel to prefetch threadChanges
New files
models/enhancer.py— Real-ESRGAN wrapper that lazy-loads the model and enhances cleaned patchesModified files
cores/options.py— MPS auto-detection,--enhanceflag,input()guardscores/clean.py— Batched segmentation, prefetch with error handling, parallel I/O, Real-ESRGAN integration in all clean paths, stdout flushcores/init.py— Non-interactive terminal supportmodels/loadmodel.py—weights_only=False,torch.compileguards for MPS/CPUmodels/model_util.py—get_device()abstraction for MPS/CUDA/CPUmodels/runmodel.py—torch.no_grad(), autocast (CUDA only), batch segmentationutil/data.py—np.ascontiguousarray()for tensor stride fixes, MPS device supportutil/ffmpeg.py— Hardware encoder detection (h264_videotoolbox)deepmosaic.py—input()guards for non-interactive modetrain/add/train.py—weights_only=FalseUsage
Test plan
torch.loadcompatibility with PyTorch 2.10--enhanceflag loads Real-ESRGAN and processes patches--no_preview