diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..2abaf4d --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,94 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +DepthCrafter is a deep learning project for generating temporally consistent long depth sequences from open-world videos. It uses a diffusion-based model built on Stable Video Diffusion to estimate depth maps without requiring camera poses or optical flow. + +## Architecture + +### Core Components + +1. **Main Pipeline (`depthcrafter/depth_crafter_ppl.py`)**: Implements the DepthCrafterPipeline extending diffusers for depth estimation +2. **UNet Model (`depthcrafter/unet.py`)**: Custom spatio-temporal UNet for depth prediction +3. **Inference Scripts**: + - `run.py`: Main CLI for single video inference + - `app.py`: Gradio web interface + - `benchmark/infer/infer_batch.py`: Batch processing for benchmarks + +### Key Directories + +- `depthcrafter/`: Core model implementation +- `benchmark/`: Dataset evaluation scripts and CSV metadata +- `examples/`: Sample video files for testing +- `visualization/`: Point cloud visualization tools + +## Common Commands + +### Installation +```bash +pip install -r requirements.txt +``` + +### Single Video Inference + +High-resolution (requires ~26GB GPU memory): +```bash +python run.py --video-path examples/example_01.mp4 +``` + +Low-resolution (requires ~9GB GPU memory): +```bash +python run.py --video-path examples/example_01.mp4 --max-res 512 +``` + +### Gradio Demo +```bash +gradio app.py +``` + +### Benchmark Evaluation + +Run inference on all datasets: +```bash +bash benchmark/infer/infer.sh +``` + +Evaluate results: +```bash +bash benchmark/eval/eval.sh +``` + +### Key Parameters + +- `--process-length`: Number of frames to process (default: 195) +- `--window-size`: Sliding window size (default: 110) +- `--overlap`: Frame overlap between windows (default: 25) +- `--max-res`: Maximum resolution (default: 1024) +- `--num-denoising-steps`: Denoising steps (default: 5) +- `--guidance-scale`: Guidance scale for inference (default: 1.0) +- `--save-npz`: Save depth as NPZ file +- `--save-exr`: Save depth as EXR file + +## Model Loading + +The model uses two key components from Hugging Face: +1. DepthCrafter UNet: `tencent/DepthCrafter` +2. Base diffusion model: `stabilityai/stable-video-diffusion-img2vid-xt` + +## Dependencies + +Key dependencies: +- PyTorch 2.0.1 +- Diffusers 0.29.1 +- Transformers 4.41.2 +- XFormers 0.0.20 (for memory efficient attention) +- OpenEXR 3.2.4 (for EXR output) + +## Performance Notes + +- v1.0.1 improvements: ~4x faster inference (465ms/frame vs 1914ms/frame at 1024x576) +- Memory optimization options via `--cpu-offload` parameter: + - `"model"`: Standard CPU offloading + - `"sequential"`: Sequential offloading (slower but saves more memory) \ No newline at end of file diff --git a/README_macOS.md b/README_macOS.md new file mode 100644 index 0000000..83330c4 --- /dev/null +++ b/README_macOS.md @@ -0,0 +1,348 @@ +# DepthCrafter for macOS (Apple Silicon & Intel) + +This is a modified version of DepthCrafter optimized for macOS, with full support for Apple Silicon (M1/M2/M3) and Intel Macs. The modifications enable CPU-based processing since MPS (Metal Performance Shaders) doesn't support Conv3D operations required by the model. + +## šŸŽ Key Modifications for macOS + +### 1. **CPU-Only Processing** +- Removed CUDA dependencies +- Disabled MPS due to Conv3D limitations +- Uses CPU for all computations (slower but fully functional) + +### 2. **FP32 Precision** +- Changed from FP16 to FP32 for CPU compatibility +- Ensures numerical stability on CPU + +### 3. **Enhanced Video Processing** +- FFmpeg-based video handling with fallback support +- Automatic format conversion to MP4 (HEVC/H.264) +- Smart video trimming and frame extraction +- Progress indicators for all conversions + +### 4. **Interactive CLI Interface** +- User-friendly terminal UI +- Preset management system +- Visual progress tracking + +## šŸ“‹ Requirements + +### System Requirements +- **macOS**: 10.15 (Catalina) or later +- **Python**: 3.8 - 3.11 +- **RAM**: 16GB minimum, 32GB recommended +- **Storage**: 10GB free space for models and processing + +### Software Dependencies +```bash +# Install Homebrew if not already installed +/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" + +# Install FFmpeg (required) +brew install ffmpeg + +# Install Python via Homebrew (if needed) +brew install python@3.11 +``` + +## šŸš€ Installation + +### 1. Clone the Repository +```bash +git clone https://github.com/Tencent/DepthCrafter.git +cd DepthCrafter +``` + +### 2. Create Virtual Environment +```bash +# Create virtual environment +python3 -m venv venv + +# Activate virtual environment +source venv/bin/activate +``` + +### 3. Install Dependencies +```bash +# Upgrade pip +pip install --upgrade pip + +# Install PyTorch (CPU version for macOS) +pip install torch==2.0.1 torchvision==0.15.2 + +# Install other requirements +pip install diffusers==0.29.1 +pip install transformers==4.41.2 +pip install accelerate==0.30.1 +pip install numpy==1.26.4 +pip install matplotlib==3.8.4 +pip install mediapy==1.2.0 +pip install fire==0.6.0 +pip install opencv-python==4.9.0.80 +pip install gradio # For web UI (optional) + +# Optional: Install decord for better video processing +# Note: May require additional setup +# pip install decord +``` + +### 4. Download Model Weights +The models will be automatically downloaded from Hugging Face on first run: +- `tencent/DepthCrafter` - Main UNet model +- `stabilityai/stable-video-diffusion-img2vid-xt` - Base diffusion model + +## šŸ’» Usage + +### Option 1: Interactive CLI (Recommended) +```bash +# Launch the interactive interface +python interactive_cli.py + +# Or use the launcher +./depthcrafter_ui +``` + +The interactive CLI provides: +- Step-by-step guided workflow +- Video preview and information +- Quality presets (Fast/Balanced/High) +- Frame range selection +- Preset save/load functionality + +### Option 2: Command Line + +#### Basic Usage +```bash +# Process entire video at 512px resolution +python run.py --video-path input.mp4 --max-res 512 + +# Process with custom settings +python run.py --video-path input.mp4 \ + --max-res 768 \ + --num-inference-steps 10 \ + --guidance-scale 1.2 +``` + +#### Video Trimming +```bash +# Process first 50 frames only (faster for testing) +python run.py --video-path input.mp4 --max-frames 50 --max-res 512 + +# Process frames 100-200 +python run.py --video-path input.mp4 --start-frame 100 --max-frames 100 --max-res 512 + +# Process specific time range (e.g., seconds 10-20) +python run.py --video-path input.mp4 --start-frame 300 --max-frames 300 --max-res 512 +# (assuming 30fps: start at 10s = frame 300, 10s duration = 300 frames) +``` + +#### Output Options +```bash +# Save depth data in multiple formats +python run.py --video-path input.mp4 \ + --save-npz \ # Save as NPZ file + --save-exr \ # Save as EXR sequence + --save-folder output/ # Custom output directory +``` + +### Option 3: Web UI (Gradio) +```bash +python app.py +# Opens browser at http://localhost:7860 +``` + +## āš™ļø Parameters + +### Resolution Settings +- `--max-res`: Maximum resolution (512/768/1024) + - 512: ~9GB RAM, fastest + - 768: ~15GB RAM, balanced + - 1024: ~26GB RAM, highest quality + +### Quality Settings +- `--num-inference-steps`: Denoising steps (1-25, default: 5) +- `--guidance-scale`: Guidance strength (0.5-2.0, default: 1.0) + +### Video Settings +- `--max-frames`: Limit number of frames to process +- `--start-frame`: Starting frame index +- `--target-fps`: Output video FPS (default: 15) +- `--process-length`: Alternative to max-frames + +### Processing Settings +- `--window-size`: Sliding window size (default: 110) +- `--overlap`: Frame overlap between windows (default: 25) +- `--seed`: Random seed for reproducibility + +## šŸ“ Output Files + +The script generates the following outputs in the specified folder: + +``` +output_folder/ +ā”œā”€ā”€ videoname_input.mp4 # Preprocessed/trimmed input +ā”œā”€ā”€ videoname_vis.mp4 # Colored depth visualization +ā”œā”€ā”€ videoname_depth.mp4 # Raw depth video +ā”œā”€ā”€ videoname.npz # (Optional) Numpy depth data +└── frame_XXXX.exr # (Optional) EXR depth frames +``` + +## šŸŽÆ Performance Tips + +### Memory Management +1. **Start with low resolution** (512px) for testing +2. **Use frame limits** to process shorter segments +3. **Close other applications** to free up RAM +4. **Monitor Activity Monitor** for memory usage + +### Speed Optimization +```bash +# Fastest settings (lower quality) +python run.py --video-path input.mp4 \ + --max-res 512 \ + --num-inference-steps 3 \ + --max-frames 50 + +# Balanced settings +python run.py --video-path input.mp4 \ + --max-res 768 \ + --num-inference-steps 5 \ + --max-frames 100 + +# Best quality (slowest) +python run.py --video-path input.mp4 \ + --max-res 1024 \ + --num-inference-steps 10 +``` + +### Processing Times (Approximate) +On M1 MacBook Pro (16GB RAM) for 150 frames: +- 512px: ~30 minutes +- 768px: ~60 minutes +- 1024px: ~120 minutes + +*Note: Intel Macs will be slower. Apple Silicon (M1/M2/M3) provides better CPU performance.* + +## šŸŽ¬ Supported Video Formats + +### Input Formats +- MP4, MOV, AVI, MKV, WEBM, FLV, and most formats supported by FFmpeg +- Automatic conversion to MP4 for compatibility + +### Automatic Optimizations +- Converts to HEVC/H.264 codec +- Adjusts to 15 FPS for consistency +- Maintains aspect ratio +- Removes audio tracks + +## šŸ”§ Troubleshooting + +### Common Issues + +#### 1. "Torch not compiled with CUDA enabled" +This is expected on macOS. The code has been modified to use CPU instead. + +#### 2. "Conv3D is not supported on MPS" +This is why we use CPU processing. MPS doesn't support 3D convolutions yet. + +#### 3. Memory Errors +- Reduce `--max-res` to 512 +- Process fewer frames with `--max-frames` +- Close other applications +- Consider upgrading RAM + +#### 4. FFmpeg Errors +```bash +# Verify FFmpeg installation +ffmpeg -version + +# Reinstall if needed +brew reinstall ffmpeg +``` + +#### 5. Slow Processing +This is normal for CPU processing. Tips: +- Use lower resolution (512px) +- Process shorter segments +- Run overnight for long videos +- Consider cloud GPU services for faster processing + +### Check System Resources +```bash +# Monitor CPU and memory usage +top + +# Check available disk space +df -h + +# Check Python memory usage +python -c "import psutil; print(f'Available RAM: {psutil.virtual_memory().available / (1024**3):.1f} GB')" +``` + +## šŸ†• Features Added for macOS + +1. **Automatic Video Trimming** + - Extract specific frame ranges before processing + - Reduces memory usage and processing time + +2. **Smart Format Conversion** + - Automatic conversion to compatible MP4 + - Preserves quality while ensuring compatibility + +3. **Progress Indicators** + - Real-time conversion progress + - Processing status updates + +4. **Interactive CLI** + - User-friendly interface + - No need to remember commands + - Visual feedback and validation + +5. **Preset System** + - Save frequently used settings + - Share configurations with team + +## šŸ“Š Comparison with Original + +| Feature | Original | macOS Version | +|---------|----------|---------------| +| GPU Support | CUDA | CPU only | +| MPS Support | No | No (Conv3D limitation) | +| Precision | FP16 | FP32 | +| Video Handling | Decord | FFmpeg + fallbacks | +| Trimming | Manual | Automatic | +| Interface | CLI only | CLI + Interactive UI | +| Presets | No | Yes | + +## šŸ¤ Contributing + +Contributions to improve macOS compatibility are welcome! Areas of interest: +- MPS optimization when Conv3D support is added +- Memory usage optimization +- Processing speed improvements +- Additional video format support + +## šŸ“ License + +This macOS version maintains the same license as the original DepthCrafter project. + +## šŸ™ Acknowledgments + +- Original DepthCrafter team at Tencent AI Lab +- PyTorch team for CPU optimizations +- FFmpeg for robust video processing + +## šŸ“® Support + +For macOS-specific issues: +1. Check this README first +2. Search existing issues +3. Create a new issue with: + - macOS version + - Hardware (Intel/M1/M2/M3) + - Python version + - Error messages + - Command used + +--- + +**Note:** This is a CPU-based implementation optimized for macOS. For faster processing, consider using the original version on a CUDA-capable GPU or cloud services. \ No newline at end of file diff --git a/app.py b/app.py index 26d9615..944843d 100644 --- a/app.py +++ b/app.py @@ -25,18 +25,20 @@ ] +# Detect device - use CPU since MPS doesn't support Conv3D +device = "cuda" if torch.cuda.is_available() else "cpu" + unet = DiffusersUNetSpatioTemporalConditionModelDepthCrafter.from_pretrained( "tencent/DepthCrafter", low_cpu_mem_usage=True, - torch_dtype=torch.float16, + torch_dtype=torch.float32, ) pipe = DepthCrafterPipeline.from_pretrained( "stabilityai/stable-video-diffusion-img2vid-xt", unet=unet, - torch_dtype=torch.float16, - variant="fp16", + torch_dtype=torch.float32, ) -pipe.to("cuda") +pipe.to(device) @spaces.GPU(duration=120) @@ -56,7 +58,12 @@ def infer_depth( save_npz: bool = False, ): set_seed(seed) - pipe.enable_xformers_memory_efficient_attention() + # Only enable xformers for CUDA devices + if torch.cuda.is_available(): + try: + pipe.enable_xformers_memory_efficient_attention() + except Exception as e: + print(f"Xformers not enabled: {e}") frames, target_fps = read_video_frames(video, process_length, target_fps, max_res) @@ -91,7 +98,8 @@ def infer_depth( # clear the cache for the next video gc.collect() - torch.cuda.empty_cache() + if torch.cuda.is_available(): + torch.cuda.empty_cache() return [ save_path + "_input.mp4", diff --git a/benchmark/demo.sh b/benchmark/demo.sh index 7cc9f1f..339a364 100644 --- a/benchmark/demo.sh +++ b/benchmark/demo.sh @@ -10,7 +10,7 @@ saved_dataset_folder=$5 overlap=$6 dataset=$7 -CUDA_VISIBLE_DEVICES=${gpu_id} PYTHONPATH=. python run.py \ +PYTHONPATH=. python run.py \ --video-path ${test_case} \ --save-folder ${saved_root}/${saved_dataset_folder} \ --process-length ${process_length} \ diff --git a/depthcrafter/depth_crafter_ppl.py b/depthcrafter/depth_crafter_ppl.py index b7d070d..f29f965 100644 --- a/depthcrafter/depth_crafter_ppl.py +++ b/depthcrafter/depth_crafter_ppl.py @@ -15,7 +15,46 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name +def _resize_with_antialiasing_safe(input, size, interpolation="bicubic", align_corners=True): + """Wrapper for resize that uses the standard function.""" + # Since we're not using MPS anymore, we can use the original function + return _resize_with_antialiasing(input, size) + + class DepthCrafterPipeline(StableVideoDiffusionPipeline): + + @property + def _execution_device(self): + """ + Returns the device on which the pipeline should be executed. + Note: MPS is not used due to lack of Conv3D support. + """ + # If device attribute exists and is set + if hasattr(self, 'device') and self.device is not None: + if self.device != torch.device("meta"): + return self.device + + # Check if model has hooks (for CPU offloading) + if hasattr(self.unet, "_hf_hook"): + for module in self.unet.modules(): + if ( + hasattr(module, "_hf_hook") + and hasattr(module._hf_hook, "execution_device") + and module._hf_hook.execution_device is not None + ): + return torch.device(module._hf_hook.execution_device) + + # Try to get device from model parameters + try: + return next(self.unet.parameters()).device + except: + pass + + # Default fallback based on availability + # MPS doesn't support Conv3D, so we use CPU for Apple Silicon + if torch.cuda.is_available(): + return torch.device("cuda") + return torch.device("cpu") @torch.inference_mode() def encode_video( @@ -29,7 +68,7 @@ def encode_video( :return: image_embeddings in shape of [b, 1024] """ - video_224 = _resize_with_antialiasing(video.float(), (224, 224)) + video_224 = _resize_with_antialiasing_safe(video.float(), (224, 224)) video_224 = (video_224 + 1.0) / 2.0 # [-1, 1] -> [0, 1] embeddings = [] @@ -153,18 +192,18 @@ def __call__( video = video * 2.0 - 1.0 # [0,1] -> [-1,1], in [t, c, h, w] if track_time: - start_event = torch.cuda.Event(enable_timing=True) - encode_event = torch.cuda.Event(enable_timing=True) - denoise_event = torch.cuda.Event(enable_timing=True) - decode_event = torch.cuda.Event(enable_timing=True) - start_event.record() + import time + start_time = time.time() + encode_time = None + denoise_time = None video_embeddings = self.encode_video( video, chunk_size=decode_chunk_size ).unsqueeze( 0 ) # [1, t, 1024] - torch.cuda.empty_cache() + if torch.cuda.is_available(): + torch.cuda.empty_cache() # 4. Encode input image using VAE noise = randn_tensor( video.shape, generator=generator, device=device, dtype=video.dtype @@ -173,7 +212,7 @@ def __call__( # pdb.set_trace() needs_upcasting = ( - self.vae.dtype == torch.float16 and self.vae.config.force_upcast + self.vae.dtype == torch.float32 and self.vae.config.force_upcast ) if needs_upcasting: self.vae.to(dtype=torch.float32) @@ -186,16 +225,16 @@ def __call__( ) # [1, t, c, h, w] if track_time: - encode_event.record() - torch.cuda.synchronize() - elapsed_time_ms = start_event.elapsed_time(encode_event) - print(f"Elapsed time for encoding video: {elapsed_time_ms} ms") + encode_time = time.time() + elapsed_time_ms = (encode_time - start_time) * 1000 + print(f"Elapsed time for encoding video: {elapsed_time_ms:.2f} ms") - torch.cuda.empty_cache() + if torch.cuda.is_available(): + torch.cuda.empty_cache() - # cast back to fp16 if needed + # cast back to fp32 if needed if needs_upcasting: - self.vae.to(dtype=torch.float16) + self.vae.to(dtype=torch.float32) # 5. Get Added Time IDs added_time_ids = self._get_add_time_ids( @@ -238,7 +277,8 @@ def __call__( else: weights = None - torch.cuda.empty_cache() + if torch.cuda.is_available(): + torch.cuda.empty_cache() # inference strategy for long videos # two main strategies: 1. noise init from previous frame, 2. segments stitching @@ -335,22 +375,20 @@ def __call__( idx_start += stride if track_time: - denoise_event.record() - torch.cuda.synchronize() - elapsed_time_ms = encode_event.elapsed_time(denoise_event) - print(f"Elapsed time for denoising video: {elapsed_time_ms} ms") + denoise_time = time.time() + elapsed_time_ms = (denoise_time - encode_time) * 1000 + print(f"Elapsed time for denoising video: {elapsed_time_ms:.2f} ms") if not output_type == "latent": - # cast back to fp16 if needed + # cast back to fp32 if needed if needs_upcasting: - self.vae.to(dtype=torch.float16) + self.vae.to(dtype=torch.float32) frames = self.decode_latents(latents_all, num_frames, decode_chunk_size) if track_time: - decode_event.record() - torch.cuda.synchronize() - elapsed_time_ms = denoise_event.elapsed_time(decode_event) - print(f"Elapsed time for decoding video: {elapsed_time_ms} ms") + decode_time = time.time() + elapsed_time_ms = (decode_time - denoise_time) * 1000 + print(f"Elapsed time for decoding video: {elapsed_time_ms:.2f} ms") frames = self.video_processor.postprocess_video( video=frames, output_type=output_type diff --git a/depthcrafter/unet.py b/depthcrafter/unet.py index 0066a71..7472803 100644 --- a/depthcrafter/unet.py +++ b/depthcrafter/unet.py @@ -39,7 +39,7 @@ def forward( t_emb = self.time_proj(timesteps) # `Timesteps` does not contain any weights and will always return f32 tensors - # but time_embedding might actually be running in fp16. so we need to cast here. + # time_embedding should be running in fp32. Cast to ensure compatibility. # there might be better ways to encapsulate this. t_emb = t_emb.to(dtype=self.conv_in.weight.dtype) diff --git a/depthcrafter/utils.py b/depthcrafter/utils.py index 2ac50e8..cb113dd 100644 --- a/depthcrafter/utils.py +++ b/depthcrafter/utils.py @@ -5,7 +5,17 @@ import matplotlib.cm as cm import mediapy import torch -from decord import VideoReader, cpu +import subprocess +import json +import os +import sys +import re +import warnings +try: + from decord import VideoReader, cpu + DECORD_AVAILABLE = True +except ImportError: + DECORD_AVAILABLE = False dataset_res_dict = { "sintel": [448, 1024], @@ -16,12 +26,98 @@ } -def read_video_frames(video_path, process_length, target_fps, max_res, dataset="open"): +def get_video_info_ffmpeg(video_path): + """Get video metadata using ffprobe.""" + cmd = [ + 'ffprobe', + '-v', 'error', + '-select_streams', 'v:0', + '-count_frames', + '-show_entries', 'stream=width,height,r_frame_rate,nb_frames', + '-of', 'json', + video_path + ] + + try: + result = subprocess.run(cmd, capture_output=True, text=True, check=True) + info = json.loads(result.stdout) + stream = info['streams'][0] + + # Parse frame rate + fps_str = stream['r_frame_rate'] + if '/' in fps_str: + num, den = map(float, fps_str.split('/')) + fps = num / den + else: + fps = float(fps_str) + + return { + 'width': int(stream['width']), + 'height': int(stream['height']), + 'fps': fps, + 'nb_frames': int(stream.get('nb_frames', 0)) + } + except (subprocess.CalledProcessError, KeyError, ValueError) as e: + # Fallback: get basic info without frame count + cmd = [ + 'ffprobe', + '-v', 'error', + '-select_streams', 'v:0', + '-show_entries', 'stream=width,height,r_frame_rate', + '-of', 'json', + video_path + ] + result = subprocess.run(cmd, capture_output=True, text=True, check=True) + info = json.loads(result.stdout) + stream = info['streams'][0] + + fps_str = stream['r_frame_rate'] + if '/' in fps_str: + num, den = map(float, fps_str.split('/')) + fps = num / den + else: + fps = float(fps_str) + + # Estimate frame count + duration_cmd = [ + 'ffprobe', + '-v', 'error', + '-show_entries', 'format=duration', + '-of', 'json', + video_path + ] + duration_result = subprocess.run(duration_cmd, capture_output=True, text=True, check=True) + duration_info = json.loads(duration_result.stdout) + duration = float(duration_info['format']['duration']) + + return { + 'width': int(stream['width']), + 'height': int(stream['height']), + 'fps': fps, + 'nb_frames': int(duration * fps) + } + + +def read_video_frames_ffmpeg(video_path, process_length, target_fps, max_res, dataset="open"): + """Read video frames using ffmpeg.""" + # Convert to absolute path + video_path = os.path.abspath(video_path) + + if not os.path.exists(video_path): + raise RuntimeError(f"Video file not found: {video_path}") + + print("==> processing video directly with ffmpeg: ", video_path) + + # Get video info + video_info = get_video_info_ffmpeg(video_path) + original_width = video_info['width'] + original_height = video_info['height'] + original_fps = video_info['fps'] + total_frames = video_info['nb_frames'] + + print(f"==> original video shape: ({total_frames}, {original_height}, {original_width}, 3)") + if dataset == "open": - print("==> processing video: ", video_path) - vid = VideoReader(video_path, ctx=cpu(0)) - print("==> original video shape: ", (len(vid), *vid.get_batch([0]).shape[1:])) - original_height, original_width = vid.get_batch([0]).shape[1:3] height = round(original_height / 64) * 64 width = round(original_width / 64) * 64 if max(height, width) > max_res: @@ -31,23 +127,476 @@ def read_video_frames(video_path, process_length, target_fps, max_res, dataset=" else: height = dataset_res_dict[dataset][0] width = dataset_res_dict[dataset][1] - - vid = VideoReader(video_path, ctx=cpu(0), width=width, height=height) - - fps = vid.get_avg_fps() if target_fps == -1 else target_fps - stride = round(vid.get_avg_fps() / fps) + + fps = original_fps if target_fps == -1 else target_fps + stride = round(original_fps / fps) stride = max(stride, 1) - frames_idx = list(range(0, len(vid), stride)) - print( - f"==> downsampled shape: {len(frames_idx), *vid.get_batch([0]).shape[1:]}, with stride: {stride}" - ) + + # Calculate which frames to extract + frames_idx = list(range(0, total_frames, stride)) if process_length != -1 and process_length < len(frames_idx): frames_idx = frames_idx[:process_length] - print( - f"==> final processing shape: {len(frames_idx), *vid.get_batch([0]).shape[1:]}" - ) - frames = vid.get_batch(frames_idx).asnumpy().astype("float32") / 255.0 + + print(f"==> downsampled shape: ({len(frames_idx)}, {height}, {width}, 3), with stride: {stride}") + print(f"==> final processing shape: ({len(frames_idx)}, {height}, {width}, 3)") + + # Build ffmpeg command to extract frames + # Simplified approach: extract at target fps and scale + vf_filters = [] + + # Add fps filter to get the right frame rate + if stride > 1: + vf_filters.append(f"fps={fps}") + + # Add scaling + vf_filters.append(f"scale={width}:{height}:force_original_aspect_ratio=decrease") + vf_filters.append(f"pad={width}:{height}:(ow-iw)/2:(oh-ih)/2") + + # Limit frames if needed + if process_length != -1: + vf_filters.append(f"select='lt(n\,{process_length})'") + + vf_string = ','.join(vf_filters) + + cmd = [ + 'ffmpeg', + '-i', video_path, + '-vf', vf_string, + '-f', 'rawvideo', + '-pix_fmt', 'rgb24', + '-v', 'error', + '-' + ] + + # Run ffmpeg and capture output + process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) + stdout, stderr = process.communicate() + + if process.returncode != 0: + raise RuntimeError(f"ffmpeg failed: {stderr.decode()}") + + # Convert raw RGB data to numpy array + frames = np.frombuffer(stdout, dtype=np.uint8) + frames = frames.reshape((-1, height, width, 3)) + frames = frames.astype(np.float32) / 255.0 + + # Ensure we have the expected number of frames + if frames.shape[0] != len(frames_idx): + print(f"Warning: Expected {len(frames_idx)} frames, got {frames.shape[0]}") + + return frames, fps + + +def get_video_duration(input_path): + """Get video duration in seconds using ffprobe.""" + cmd = [ + 'ffprobe', + '-v', 'error', + '-show_entries', 'format=duration', + '-of', 'json', + input_path + ] + try: + result = subprocess.run(cmd, capture_output=True, text=True, check=True) + info = json.loads(result.stdout) + return float(info['format']['duration']) + except: + return None + +def show_progress(current_time, total_duration, width=50): + """Display a progress bar for video conversion.""" + if total_duration is None or total_duration == 0: + return + + progress = min(current_time / total_duration, 1.0) + filled = int(width * progress) + bar = 'ā–ˆ' * filled + 'ā–‘' * (width - filled) + percent = progress * 100 + + # Clear the line and print progress + sys.stdout.write(f'\rConverting: [{bar}] {percent:.1f}% ({current_time:.1f}s/{total_duration:.1f}s)') + sys.stdout.flush() + +def convert_to_mp4(input_path, output_path=None, target_fps=15, max_frames=None, start_frame=0): + """Convert video to MP4 format matching the example videos' settings. + + Args: + input_path: Path to input video + output_path: Path to output MP4 (if None, creates temp file) + target_fps: Target frame rate (default: 15) + max_frames: Maximum number of frames to extract (if None, extract all) + start_frame: Starting frame number (default: 0) + """ + # Convert to absolute path to avoid path issues + input_path = os.path.abspath(input_path) + + if not os.path.exists(input_path): + raise RuntimeError(f"Input video file not found: {input_path}") + + if output_path is None: + # Create a temporary MP4 file + temp_file = tempfile.NamedTemporaryFile(suffix='.mp4', delete=False) + output_path = temp_file.name + temp_file.close() + else: + output_path = os.path.abspath(output_path) + + # Check if input is already MP4 with correct codec + if input_path.lower().endswith('.mp4'): + # Check if it's actually a valid MP4 that can be read + try: + # Quick probe to see if it's readable and has correct codec + cmd = ['ffprobe', '-v', 'error', '-select_streams', 'v:0', + '-show_entries', 'stream=codec_name', '-of', 'json', input_path] + result = subprocess.run(cmd, capture_output=True, text=True, check=True) + info = json.loads(result.stdout) + codec = info['streams'][0]['codec_name'] + # If it's already HEVC or H264, and readable, return original + if codec in ['hevc', 'h264']: + return input_path + except (subprocess.CalledProcessError, KeyError, json.JSONDecodeError): + # If not readable or wrong codec, proceed with conversion + pass + + # Determine if we're trimming the video + trimming = max_frames is not None or start_frame > 0 + + if trimming: + print(f"Trimming and converting video to MP4: {os.path.basename(input_path)}") + print(f" Extracting frames {start_frame} to {start_frame + (max_frames or 'end')}") + else: + print(f"Converting video to MP4 format: {os.path.basename(input_path)}") + + print(f" Input path: {input_path}") + print(f" Output path: {output_path}") + print(f" File exists: {os.path.exists(input_path)}") + print(f" File size: {os.path.getsize(input_path) / (1024*1024):.1f} MB" if os.path.exists(input_path) else "") + + # Get video info for progress tracking and trimming + video_info = get_video_info_ffmpeg(input_path) + original_fps = video_info.get('fps', 30) + duration = get_video_duration(input_path) + + # Calculate time ranges if trimming + if trimming: + start_time = start_frame / original_fps if start_frame > 0 else 0 + if max_frames: + # IMPORTANT: Use original_fps to calculate duration, not target_fps + # We want to extract max_frames from the original video + duration_time = max_frames / original_fps + # Adjust duration for progress bar + duration = min(duration_time, duration - start_time if duration else duration_time) + else: + duration_time = None + + def run_ffmpeg_with_progress(cmd, codec_name): + """Run ffmpeg command with progress tracking.""" + # Add progress output to the command + # Need to insert -progress and -stats before the input file (-i) + try: + i_index = cmd.index('-i') + cmd_with_progress = cmd[:i_index] + ['-progress', 'pipe:1', '-stats'] + cmd[i_index:] + except ValueError: + # If -i not found, add at position 2 (after ffmpeg) + cmd_with_progress = cmd[:1] + ['-progress', 'pipe:1', '-stats'] + cmd[1:] + + process = subprocess.Popen( + cmd_with_progress, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + universal_newlines=True + ) + + # Pattern to match time from ffmpeg progress output + time_pattern = re.compile(r'out_time_ms=(\d+)') + stderr_lines = [] + + # Read stderr in background + import threading + def read_stderr(): + for line in process.stderr: + stderr_lines.append(line) + + stderr_thread = threading.Thread(target=read_stderr) + stderr_thread.daemon = True + stderr_thread.start() + + for line in process.stdout: + match = time_pattern.search(line) + if match: + current_time_ms = int(match.group(1)) + current_time = current_time_ms / 1_000_000 # Convert microseconds to seconds + show_progress(current_time, duration) + + # Wait for process to complete + process.wait() + stderr_thread.join(timeout=1) + + if process.returncode == 0: + print(f"\nāœ“ Video successfully converted to MP4 ({codec_name})") + return True + else: + stderr = ''.join(stderr_lines) + print(f"\nāœ— {codec_name} conversion failed") + # Print relevant error messages + if 'Unknown encoder' in stderr or 'not found' in stderr: + print(f" Error: {codec_name} encoder not available in ffmpeg") + elif 'Invalid' in stderr or 'Error' in stderr: + # Extract error lines + error_lines = [line.strip() for line in stderr_lines if 'Error' in line or 'Invalid' in line] + if error_lines: + print(f" Error details: {error_lines[0]}") + return False + + # Build base command + def build_command(codec, codec_lib, preset='medium', crf='23', use_target_fps=True): + cmd = ['ffmpeg'] + + # Add trimming options BEFORE input (for fast seek) + if trimming: + if start_frame > 0: + # Use format HH:MM:SS.mmm for better compatibility + hours = int(start_time // 3600) + minutes = int((start_time % 3600) // 60) + seconds = start_time % 60 + time_str = f"{hours:02d}:{minutes:02d}:{seconds:06.3f}" + cmd.extend(['-ss', time_str]) + if max_frames: + # Duration also in time format + hours = int(duration_time // 3600) + minutes = int((duration_time % 3600) // 60) + seconds = duration_time % 60 + duration_str = f"{hours:02d}:{minutes:02d}:{seconds:06.3f}" + cmd.extend(['-t', duration_str]) + + cmd.extend(['-i', input_path]) + + # Video encoding options + cmd.extend([ + '-c:v', codec_lib, + '-preset', preset, + '-crf', crf, + '-pix_fmt', 'yuv420p', + ]) + + # Only set output frame rate if requested and different from input + # This prevents frame duplication/interpolation + if use_target_fps and target_fps != -1: + cmd.extend(['-r', str(target_fps)]) + + if codec == 'hevc': + cmd.extend(['-tag:v', 'hev1']) + + # Add metadata for trimmed videos + if trimming: + cmd.extend([ + '-metadata', f'title=Trimmed from {os.path.basename(input_path)}', + '-metadata', f'comment=Frames {start_frame}-{start_frame + (max_frames or "end")} at {target_fps}fps', + ]) + + cmd.extend([ + '-an', # No audio + '-movflags', '+faststart', + '-y', + output_path + ]) + + return cmd + + # Build ffmpeg command for conversion matching example videos + # First try with HEVC (H.265) like the examples + # Don't change fps when trimming to preserve frame count + use_target_fps = not trimming or target_fps == -1 + cmd_hevc = build_command('hevc', 'libx265', use_target_fps=use_target_fps) + + # Try HEVC first + if run_ffmpeg_with_progress(cmd_hevc, 'HEVC'): + return output_path + + print("Falling back to H.264...") + + # Fallback to H.264 if HEVC fails (better compatibility) + cmd_h264 = build_command('h264', 'libx264', use_target_fps=use_target_fps) + + if run_ffmpeg_with_progress(cmd_h264, 'H.264'): + return output_path + + # If both conversions failed, try a more basic conversion + print("\nTrying basic MP4 conversion with default settings...") + + cmd_basic = build_command('h264', 'libx264', preset='fast', crf='28', use_target_fps=use_target_fps) + + if run_ffmpeg_with_progress(cmd_basic, 'H.264 (basic)'): + return output_path + + # Last resort: try with minimal options + print("\nTrying minimal conversion...") + cmd_minimal = ['ffmpeg'] + if trimming: + if start_frame > 0: + # Use time format for compatibility + hours = int(start_time // 3600) + minutes = int((start_time % 3600) // 60) + seconds = start_time % 60 + time_str = f"{hours:02d}:{minutes:02d}:{seconds:06.3f}" + cmd_minimal.extend(['-ss', time_str]) + if max_frames: + # Duration in time format + hours = int(duration_time // 3600) + minutes = int((duration_time % 3600) // 60) + seconds = duration_time % 60 + duration_str = f"{hours:02d}:{minutes:02d}:{seconds:06.3f}" + cmd_minimal.extend(['-t', duration_str]) + cmd_minimal.extend([ + '-i', input_path, + '-c:v', 'libx264', + '-an', + '-y', + output_path + ]) + + process = subprocess.run(cmd_minimal, capture_output=True, text=True) + if process.returncode == 0: + print("āœ“ Video converted with minimal settings") + return output_path + else: + print(f"āœ— Minimal conversion also failed") + print(f"Error: {process.stderr[:500]}...") + raise RuntimeError(f"Failed to convert video to MP4. Please check if ffmpeg is properly installed and the input video is valid.") + +def read_video_frames(video_path, process_length, target_fps, max_res, dataset="open", skip_conversion=False, start_frame=0): + """Read video frames with MP4 conversion and fallback to ffmpeg if decord fails. + + Args: + video_path: Path to input video + process_length: Number of frames to process (-1 for all) + target_fps: Target frame rate (-1 to keep original) + max_res: Maximum resolution + dataset: Dataset type for resolution presets + skip_conversion: Skip MP4 conversion if True + start_frame: Starting frame for trimming (default: 0) + """ + + # Convert to absolute path + video_path = os.path.abspath(video_path) + + if not os.path.exists(video_path): + raise RuntimeError(f"Video file not found: {video_path}") + + # Convert to MP4 first if needed + converted_path = None + original_path = video_path + + # Convert non-MP4 files or problematic files to MP4 (unless skip_conversion is True) + # Use 15 fps by default (matching example videos) unless specified + default_fps = 15 if target_fps == -1 else target_fps + + # Determine if we need to trim or convert + needs_conversion = not video_path.lower().endswith('.mp4') + needs_trimming = process_length > 0 and process_length != -1 + + if not skip_conversion and (needs_conversion or needs_trimming): + try: + # If trimming is needed, always convert (even MP4s) to create a trimmed version + if needs_trimming: + print(f"\nCreating trimmed video: {process_length} frames starting from frame {start_frame}") + # For trimming, use -1 for target_fps to keep original fps + # This prevents frame count changes + converted_path = convert_to_mp4( + video_path, + target_fps=-1, # Keep original fps to maintain frame count + max_frames=process_length, + start_frame=start_frame + ) + video_path = converted_path + # After trimming, we don't need to limit frames again + process_length = -1 + elif needs_conversion: + converted_path = convert_to_mp4(video_path, target_fps=default_fps) + video_path = converted_path + except RuntimeError as e: + print(f"\nWarning: MP4 conversion/trimming failed: {e}") + print("Attempting to process the original video directly...") + video_path = original_path + + # Try using decord first if available + if DECORD_AVAILABLE: + try: + if dataset == "open": + print("==> processing video: ", video_path) + vid = VideoReader(video_path, ctx=cpu(0)) + print("==> original video shape: ", (len(vid), *vid.get_batch([0]).shape[1:])) + original_height, original_width = vid.get_batch([0]).shape[1:3] + height = round(original_height / 64) * 64 + width = round(original_width / 64) * 64 + if max(height, width) > max_res: + scale = max_res / max(original_height, original_width) + height = round(original_height * scale / 64) * 64 + width = round(original_width * scale / 64) * 64 + else: + height = dataset_res_dict[dataset][0] + width = dataset_res_dict[dataset][1] + + vid = VideoReader(video_path, ctx=cpu(0), width=width, height=height) + fps = vid.get_avg_fps() if target_fps == -1 else target_fps + stride = round(vid.get_avg_fps() / fps) + stride = max(stride, 1) + frames_idx = list(range(0, len(vid), stride)) + print( + f"==> downsampled shape: {len(frames_idx), *vid.get_batch([0]).shape[1:]}, with stride: {stride}" + ) + if process_length != -1 and process_length < len(frames_idx): + frames_idx = frames_idx[:process_length] + print( + f"==> final processing shape: {len(frames_idx), *vid.get_batch([0]).shape[1:]}" + ) + frames = vid.get_batch(frames_idx).asnumpy().astype("float32") / 255.0 + + # Clean up temporary file if created + if converted_path and converted_path != original_path: + try: + os.remove(converted_path) + except: + pass + + return frames, fps + + except Exception as e: + print(f"Decord failed to read video: {e}") + # If decord fails on MP4, try converting again with different settings + if not skip_conversion and video_path == original_path: # Only convert if we haven't already + print("Attempting to convert video to MP4...") + default_fps = 15 if target_fps == -1 else target_fps + try: + # Try conversion with trimming if needed + if process_length > 0 and process_length != -1: + converted_path = convert_to_mp4( + video_path, + target_fps=default_fps, + max_frames=process_length, + start_frame=start_frame + ) + process_length = -1 # Reset since video is now trimmed + else: + converted_path = convert_to_mp4(video_path, target_fps=default_fps) + video_path = converted_path + except RuntimeError: + print("Conversion failed, using original video") + video_path = original_path + print("Falling back to ffmpeg direct processing...") + + # Fallback to ffmpeg + try: + frames, fps = read_video_frames_ffmpeg(video_path, process_length, target_fps, max_res, dataset) + finally: + # Clean up temporary file if created + if converted_path and converted_path != original_path: + try: + os.remove(converted_path) + except: + pass + return frames, fps diff --git a/depthcrafter_ui b/depthcrafter_ui new file mode 100755 index 0000000..78e2900 --- /dev/null +++ b/depthcrafter_ui @@ -0,0 +1,21 @@ +#!/usr/bin/env python3 +""" +DepthCrafter UI Launcher +Quick launcher for the interactive CLI +""" + +import sys +import os + +# Add current directory to path +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + +# Import and run the interactive CLI +from interactive_cli import main + +if __name__ == "__main__": + try: + main() + except KeyboardInterrupt: + print("\n\nExiting...") + sys.exit(0) \ No newline at end of file diff --git a/interactive_cli.py b/interactive_cli.py new file mode 100755 index 0000000..ec4a782 --- /dev/null +++ b/interactive_cli.py @@ -0,0 +1,469 @@ +#!/usr/bin/env python3 +""" +Interactive CLI for DepthCrafter +A user-friendly command-line interface for video depth estimation +""" + +import os +import sys +import glob +import json +import subprocess +from pathlib import Path +from typing import Optional, Dict, Any +import shutil + +# Color codes for terminal output +class Colors: + HEADER = '\033[95m' + BLUE = '\033[94m' + CYAN = '\033[96m' + GREEN = '\033[92m' + YELLOW = '\033[93m' + RED = '\033[91m' + ENDC = '\033[0m' + BOLD = '\033[1m' + UNDERLINE = '\033[4m' + +def clear_screen(): + """Clear the terminal screen""" + os.system('cls' if os.name == 'nt' else 'clear') + +def print_header(): + """Print the application header""" + clear_screen() + print(f"{Colors.CYAN}{Colors.BOLD}") + print("═" * 60) + print(" DepthCrafter Interactive CLI ") + print(" Generate Depth Maps from Videos ") + print("═" * 60) + print(f"{Colors.ENDC}") + +def print_section(title: str): + """Print a section header""" + print(f"\n{Colors.YELLOW}ā–¶ {title}{Colors.ENDC}") + print("-" * 40) + +def get_video_info(video_path: str) -> Optional[Dict[str, Any]]: + """Get video information using ffprobe""" + try: + cmd = [ + 'ffprobe', '-v', 'error', + '-select_streams', 'v:0', + '-count_frames', + '-show_entries', 'stream=width,height,r_frame_rate,nb_frames,codec_name', + '-show_entries', 'format=duration,size', + '-of', 'json', + video_path + ] + result = subprocess.run(cmd, capture_output=True, text=True) + if result.returncode == 0: + info = json.loads(result.stdout) + stream = info['streams'][0] + format_info = info['format'] + + # Parse frame rate + fps_str = stream.get('r_frame_rate', '30/1') + if '/' in fps_str: + num, den = map(float, fps_str.split('/')) + fps = num / den + else: + fps = float(fps_str) + + return { + 'width': int(stream.get('width', 0)), + 'height': int(stream.get('height', 0)), + 'fps': fps, + 'frames': int(stream.get('nb_frames', 0)), + 'duration': float(format_info.get('duration', 0)), + 'size': int(format_info.get('size', 0)), + 'codec': stream.get('codec_name', 'unknown') + } + except: + return None + +def format_size(size_bytes: int) -> str: + """Format file size in human-readable format""" + for unit in ['B', 'KB', 'MB', 'GB']: + if size_bytes < 1024.0: + return f"{size_bytes:.1f} {unit}" + size_bytes /= 1024.0 + return f"{size_bytes:.1f} TB" + +def format_duration(seconds: float) -> str: + """Format duration in human-readable format""" + hours = int(seconds // 3600) + minutes = int((seconds % 3600) // 60) + secs = int(seconds % 60) + + if hours > 0: + return f"{hours}h {minutes}m {secs}s" + elif minutes > 0: + return f"{minutes}m {secs}s" + else: + return f"{secs}s" + +def select_video() -> Optional[str]: + """Let user select a video file""" + print_section("Select Video File") + + # Option 1: Recent files + recent_videos = [] + for pattern in ['*.mp4', '*.webm', '*.avi', '*.mov', '*.mkv']: + recent_videos.extend(glob.glob(pattern)) + recent_videos.extend(glob.glob(f"examples/{pattern}")) + + if recent_videos: + print(f"{Colors.GREEN}Found videos:{Colors.ENDC}") + for i, video in enumerate(recent_videos[:10], 1): + print(f" {i}. {video}") + print(f" {Colors.CYAN}0. Enter custom path{Colors.ENDC}") + + choice = input(f"\n{Colors.BLUE}Select video (1-{len(recent_videos[:10])} or 0): {Colors.ENDC}") + + if choice.isdigit(): + idx = int(choice) + if 1 <= idx <= len(recent_videos[:10]): + return recent_videos[idx - 1] + + # Option 2: Custom path + video_path = input(f"{Colors.BLUE}Enter video path: {Colors.ENDC}").strip() + + if video_path.startswith('"') and video_path.endswith('"'): + video_path = video_path[1:-1] + + if os.path.exists(video_path): + return video_path + else: + print(f"{Colors.RED}Error: File not found!{Colors.ENDC}") + return None + +def display_video_info(video_path: str, info: Dict[str, Any]): + """Display video information""" + print_section("Video Information") + print(f" šŸ“¹ File: {Colors.CYAN}{os.path.basename(video_path)}{Colors.ENDC}") + print(f" šŸ“ Resolution: {info['width']}x{info['height']}") + print(f" šŸŽ¬ Codec: {info['codec']}") + print(f" ā±ļø Duration: {format_duration(info['duration'])}") + print(f" šŸŽžļø Frames: {info['frames']} @ {info['fps']:.1f} fps") + print(f" šŸ’¾ Size: {format_size(info['size'])}") + +def get_processing_options() -> Dict[str, Any]: + """Get processing options from user""" + print_section("Processing Options") + + # Presets + print(f"\n{Colors.GREEN}Quality Presets:{Colors.ENDC}") + print(" 1. šŸš€ Fast (512px, 5 steps) - ~2 min for 150 frames") + print(" 2. āš–ļø Balanced (768px, 5 steps) - ~4 min for 150 frames") + print(" 3. šŸŽÆ High Quality (1024px, 10 steps) - ~8 min for 150 frames") + print(" 4. šŸŽØ Custom settings") + + preset = input(f"\n{Colors.BLUE}Select preset (1-4): {Colors.ENDC}").strip() + + options = {} + + if preset == '1': + options['max_res'] = 512 + options['num_inference_steps'] = 5 + options['guidance_scale'] = 1.0 + elif preset == '2': + options['max_res'] = 768 + options['num_inference_steps'] = 5 + options['guidance_scale'] = 1.0 + elif preset == '3': + options['max_res'] = 1024 + options['num_inference_steps'] = 10 + options['guidance_scale'] = 1.2 + else: + # Custom settings + print(f"\n{Colors.YELLOW}Custom Settings:{Colors.ENDC}") + + max_res = input(f" Max resolution ({Colors.CYAN}512/768/1024{Colors.ENDC}) [512]: ").strip() + options['max_res'] = int(max_res) if max_res else 512 + + steps = input(f" Inference steps ({Colors.CYAN}1-25{Colors.ENDC}) [5]: ").strip() + options['num_inference_steps'] = int(steps) if steps else 5 + + guidance = input(f" Guidance scale ({Colors.CYAN}0.5-2.0{Colors.ENDC}) [1.0]: ").strip() + options['guidance_scale'] = float(guidance) if guidance else 1.0 + + return options + +def get_frame_range(video_info: Dict[str, Any]) -> Dict[str, Any]: + """Get frame range options from user""" + print_section("Frame Range") + + total_frames = video_info['frames'] + fps = video_info['fps'] + + print(f"Total frames: {total_frames} ({format_duration(total_frames/fps)})") + print(f"\n{Colors.GREEN}Options:{Colors.ENDC}") + print(" 1. šŸŽ¬ Process entire video") + print(" 2. šŸŽžļø First N frames") + print(" 3. āœ‚ļø Custom range") + print(" 4. ā±ļø Time-based selection") + + choice = input(f"\n{Colors.BLUE}Select option (1-4): {Colors.ENDC}").strip() + + if choice == '2': + n = input(f" Number of frames to process: ").strip() + return {'max_frames': int(n), 'start_frame': 0} + elif choice == '3': + start = input(f" Start frame (0-{total_frames}): ").strip() + end = input(f" End frame (0-{total_frames}): ").strip() + start_frame = int(start) if start else 0 + end_frame = int(end) if end else total_frames + return {'max_frames': end_frame - start_frame, 'start_frame': start_frame} + elif choice == '4': + start_time = input(f" Start time (seconds): ").strip() + duration = input(f" Duration (seconds): ").strip() + start_frame = int(float(start_time) * fps) if start_time else 0 + max_frames = int(float(duration) * fps) if duration else -1 + return {'max_frames': max_frames, 'start_frame': start_frame} + else: + return {'max_frames': -1, 'start_frame': 0} + +def get_output_options() -> Dict[str, Any]: + """Get output options from user""" + print_section("Output Options") + + options = {} + + # Output folder + default_folder = "./demo_output" + folder = input(f"Output folder [{Colors.CYAN}{default_folder}{Colors.ENDC}]: ").strip() + options['save_folder'] = folder if folder else default_folder + + # Output formats + print(f"\n{Colors.GREEN}Additional outputs:{Colors.ENDC}") + save_npz = input(f" Save NPZ depth data (y/n) [n]: ").strip().lower() + options['save_npz'] = save_npz == 'y' + + save_exr = input(f" Save EXR depth files (y/n) [n]: ").strip().lower() + options['save_exr'] = save_exr == 'y' + + # FPS + target_fps = input(f"\nTarget FPS ({Colors.CYAN}-1 for auto{Colors.ENDC}) [15]: ").strip() + options['target_fps'] = int(target_fps) if target_fps else 15 + + return options + +def build_command(video_path: str, options: Dict[str, Any]) -> str: + """Build the command to run""" + cmd = ["python", "run.py", "--video-path", video_path] + + # Add all options + for key, value in options.items(): + if isinstance(value, bool): + if value: + cmd.append(f"--{key.replace('_', '-')}") + else: + cmd.append(f"--{key.replace('_', '-')}") + cmd.append(str(value)) + + return " ".join(cmd) + +def run_processing(command: str) -> bool: + """Run the processing command""" + print_section("Processing") + print(f"{Colors.CYAN}Command:{Colors.ENDC}") + print(f" {command}") + print() + + confirm = input(f"{Colors.YELLOW}Start processing? (y/n): {Colors.ENDC}").strip().lower() + + if confirm != 'y': + print(f"{Colors.RED}Cancelled.{Colors.ENDC}") + return False + + print(f"\n{Colors.GREEN}Processing started...{Colors.ENDC}") + print("=" * 60) + + try: + # Run the command + process = subprocess.Popen( + command, + shell=True, + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT, + universal_newlines=True, + bufsize=1 + ) + + # Stream output + for line in process.stdout: + print(line, end='') + + process.wait() + + if process.returncode == 0: + print("=" * 60) + print(f"{Colors.GREEN}āœ“ Processing completed successfully!{Colors.ENDC}") + return True + else: + print(f"{Colors.RED}āœ— Processing failed with error code {process.returncode}{Colors.ENDC}") + return False + + except KeyboardInterrupt: + print(f"\n{Colors.YELLOW}Processing interrupted by user.{Colors.ENDC}") + return False + except Exception as e: + print(f"{Colors.RED}Error: {e}{Colors.ENDC}") + return False + +def save_preset(options: Dict[str, Any]): + """Save current settings as a preset""" + print_section("Save Preset") + + name = input(f"Preset name: ").strip() + if not name: + return + + preset_file = f".depthcrafter_preset_{name}.json" + + with open(preset_file, 'w') as f: + json.dump(options, f, indent=2) + + print(f"{Colors.GREEN}āœ“ Preset saved to {preset_file}{Colors.ENDC}") + +def load_preset() -> Optional[Dict[str, Any]]: + """Load a saved preset""" + presets = glob.glob(".depthcrafter_preset_*.json") + + if not presets: + print(f"{Colors.YELLOW}No saved presets found.{Colors.ENDC}") + return None + + print_section("Load Preset") + for i, preset in enumerate(presets, 1): + name = preset.replace(".depthcrafter_preset_", "").replace(".json", "") + print(f" {i}. {name}") + + choice = input(f"\n{Colors.BLUE}Select preset (1-{len(presets)}): {Colors.ENDC}").strip() + + if choice.isdigit(): + idx = int(choice) - 1 + if 0 <= idx < len(presets): + with open(presets[idx], 'r') as f: + return json.load(f) + + return None + +def main(): + """Main interactive CLI loop""" + print_header() + print(f"{Colors.GREEN}Welcome to DepthCrafter Interactive CLI!{Colors.ENDC}") + print("This tool will help you create depth maps from your videos.\n") + + while True: + print(f"\n{Colors.CYAN}Main Menu:{Colors.ENDC}") + print(" 1. šŸŽ„ Process a video") + print(" 2. šŸ“ Load preset") + print(" 3. šŸ“š View examples") + print(" 4. ā“ Help") + print(" 5. 🚪 Exit") + + choice = input(f"\n{Colors.BLUE}Select option (1-5): {Colors.ENDC}").strip() + + if choice == '1': + # Process video workflow + video_path = select_video() + if not video_path: + continue + + # Get video info + info = get_video_info(video_path) + if info: + display_video_info(video_path, info) + else: + print(f"{Colors.YELLOW}Warning: Could not read video information{Colors.ENDC}") + info = {'frames': -1, 'fps': 30} + + # Get options + options = {} + + # Check if user wants to load preset + if input(f"\n{Colors.BLUE}Load preset? (y/n): {Colors.ENDC}").strip().lower() == 'y': + preset = load_preset() + if preset: + options.update(preset) + + if not options: + options.update(get_processing_options()) + options.update(get_frame_range(info)) + options.update(get_output_options()) + + # Offer to save as preset + if input(f"\n{Colors.BLUE}Save these settings as preset? (y/n): {Colors.ENDC}").strip().lower() == 'y': + save_preset(options) + + # Build and run command + command = build_command(video_path, options) + success = run_processing(command) + + if success: + output_folder = options.get('save_folder', './demo_output') + print(f"\n{Colors.GREEN}Output files saved to: {output_folder}{Colors.ENDC}") + + input(f"\n{Colors.CYAN}Press Enter to continue...{Colors.ENDC}") + print_header() + + elif choice == '2': + # Load preset + preset = load_preset() + if preset: + print(f"{Colors.GREEN}Preset loaded successfully!{Colors.ENDC}") + print(json.dumps(preset, indent=2)) + input(f"\n{Colors.CYAN}Press Enter to continue...{Colors.ENDC}") + print_header() + + elif choice == '3': + # View examples + print_section("Example Commands") + print(f"{Colors.GREEN}Quick test (50 frames):{Colors.ENDC}") + print(" python run.py --video-path video.mp4 --max-frames 50 --max-res 512") + print(f"\n{Colors.GREEN}High quality:{Colors.ENDC}") + print(" python run.py --video-path video.mp4 --max-res 1024 --num-inference-steps 10") + print(f"\n{Colors.GREEN}Custom range:{Colors.ENDC}") + print(" python run.py --video-path video.mp4 --start-frame 100 --max-frames 200") + input(f"\n{Colors.CYAN}Press Enter to continue...{Colors.ENDC}") + print_header() + + elif choice == '4': + # Help + print_section("Help") + print("DepthCrafter generates depth maps from videos using AI.") + print("\nšŸ“‹ Requirements:") + print(" • Python 3.8+") + print(" • PyTorch 2.0+") + print(" • FFmpeg") + print(" • ~8GB GPU memory (512px) or ~26GB (1024px)") + print("\nšŸŽÆ Tips:") + print(" • Start with low resolution (512px) for testing") + print(" • Use frame limits to test on short segments") + print(" • Save presets for frequently used settings") + print(" • Check output folder for _vis.mp4 (visualization)") + input(f"\n{Colors.CYAN}Press Enter to continue...{Colors.ENDC}") + print_header() + + elif choice == '5': + print(f"\n{Colors.GREEN}Thank you for using DepthCrafter!{Colors.ENDC}") + break + else: + print(f"{Colors.RED}Invalid option. Please try again.{Colors.ENDC}") + +if __name__ == "__main__": + try: + # Check if ffmpeg is available + if shutil.which('ffmpeg') is None: + print(f"{Colors.RED}Error: FFmpeg is not installed or not in PATH{Colors.ENDC}") + print("Please install FFmpeg first: https://ffmpeg.org/download.html") + sys.exit(1) + + main() + except KeyboardInterrupt: + print(f"\n\n{Colors.YELLOW}Exiting...{Colors.ENDC}") + except Exception as e: + print(f"\n{Colors.RED}Fatal error: {e}{Colors.ENDC}") + sys.exit(1) \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index 9c696da..d388f21 100644 --- a/requirements.txt +++ b/requirements.txt @@ -7,5 +7,5 @@ accelerate==0.30.1 xformers==0.0.20 mediapy==1.2.0 fire==0.6.0 -decord==0.6.0 +#decord==0.6.0 OpenEXR==3.2.4 diff --git a/run.py b/run.py index 7279cf1..11bb417 100644 --- a/run.py +++ b/run.py @@ -21,18 +21,26 @@ def __init__( unet = DiffusersUNetSpatioTemporalConditionModelDepthCrafter.from_pretrained( unet_path, low_cpu_mem_usage=True, - torch_dtype=torch.float16, + torch_dtype=torch.float32, ) # load weights of other components from the provided checkpoint self.pipe = DepthCrafterPipeline.from_pretrained( pre_train_path, unet=unet, - torch_dtype=torch.float16, - variant="fp16", + torch_dtype=torch.float32, ) - # for saving memory, we can offload the model to CPU, or even run the model sequentially to save more memory - if cpu_offload is not None: + # Determine the target device + # Note: MPS doesn't support Conv3D operations required by this model + # So we use CPU for Apple Silicon devices + if torch.cuda.is_available(): + device = "cuda" + else: + device = "cpu" + + # Handle model placement and offloading + if cpu_offload is not None and torch.cuda.is_available(): + # CPU offloading only works with CUDA if cpu_offload == "sequential": # This will slow, but save more memory self.pipe.enable_sequential_cpu_offload() @@ -41,13 +49,14 @@ def __init__( else: raise ValueError(f"Unknown cpu offload option: {cpu_offload}") else: - self.pipe.to("cuda") - # enable attention slicing and xformers memory efficient attention - try: - self.pipe.enable_xformers_memory_efficient_attention() - except Exception as e: - print(e) - print("Xformers is not enabled") + # For MPS or CPU, just move the entire model to the device + self.pipe.to(device) + # enable attention slicing and xformers memory efficient attention (only for CUDA) + if torch.cuda.is_available(): + try: + self.pipe.enable_xformers_memory_efficient_attention() + except Exception as e: + print(f"Xformers not enabled: {e}") self.pipe.enable_attention_slicing() def infer( @@ -66,15 +75,21 @@ def infer( track_time: bool = True, save_npz: bool = False, save_exr: bool = False, + max_frames: int = -1, + start_frame: int = 0, ): set_seed(seed) + # Use max_frames if specified, otherwise use process_length + frame_limit = max_frames if max_frames > 0 else process_length + frames, target_fps = read_video_frames( video, - process_length, + frame_limit, target_fps, max_res, dataset, + start_frame=start_frame, ) # inference the depth map using the DepthCrafter pipeline with torch.inference_mode(): @@ -149,7 +164,8 @@ def run( ) # clear the cache for the next video gc.collect() - torch.cuda.empty_cache() + if torch.cuda.is_available(): + torch.cuda.empty_cache() return res_path[:2] @@ -159,7 +175,7 @@ def main( unet_path: str = "tencent/DepthCrafter", pre_train_path: str = "stabilityai/stable-video-diffusion-img2vid-xt", process_length: int = -1, - cpu_offload: str = "model", + cpu_offload: str = None, target_fps: int = -1, seed: int = 42, num_inference_steps: int = 5, @@ -171,6 +187,8 @@ def main( save_npz: bool = False, save_exr: bool = False, track_time: bool = False, + max_frames: int = -1, + start_frame: int = 0, ): depthcrafter_demo = DepthCrafterDemo( unet_path=unet_path, @@ -195,10 +213,13 @@ def main( track_time=track_time, save_npz=save_npz, save_exr=save_exr, + max_frames=max_frames, + start_frame=start_frame, ) # clear the cache for the next video gc.collect() - torch.cuda.empty_cache() + if torch.cuda.is_available(): + torch.cuda.empty_cache() if __name__ == "__main__": @@ -206,4 +227,6 @@ def main( # the most important arguments for memory saving are `cpu_offload`, `enable_xformers`, `max_res`, and `window_size` # the most important arguments for trade-off between quality and speed are # `num_inference_steps`, `guidance_scale`, and `max_res` + # Use `max_frames` to limit the number of frames to process (e.g., --max-frames 50) + # Use `start_frame` to start from a specific frame (e.g., --start-frame 100) Fire(main)