diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..2abaf4d
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,94 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+DepthCrafter is a deep learning project for generating temporally consistent long depth sequences from open-world videos. It uses a diffusion-based model built on Stable Video Diffusion to estimate depth maps without requiring camera poses or optical flow.
+
+## Architecture
+
+### Core Components
+
+1. **Main Pipeline (`depthcrafter/depth_crafter_ppl.py`)**: Implements the DepthCrafterPipeline extending diffusers for depth estimation
+2. **UNet Model (`depthcrafter/unet.py`)**: Custom spatio-temporal UNet for depth prediction
+3. **Inference Scripts**:
+   - `run.py`: Main CLI for single video inference
+   - `app.py`: Gradio web interface
+   - `benchmark/infer/infer_batch.py`: Batch processing for benchmarks
+
+### Key Directories
+
+- `depthcrafter/`: Core model implementation
+- `benchmark/`: Dataset evaluation scripts and CSV metadata
+- `examples/`: Sample video files for testing
+- `visualization/`: Point cloud visualization tools
+
+## Common Commands
+
+### Installation
+```bash
+pip install -r requirements.txt
+```
+
+### Single Video Inference
+
+High-resolution (requires ~26GB GPU memory):
+```bash
+python run.py --video-path examples/example_01.mp4
+```
+
+Low-resolution (requires ~9GB GPU memory):
+```bash
+python run.py --video-path examples/example_01.mp4 --max-res 512
+```
+
+### Gradio Demo
+```bash
+gradio app.py
+```
+
+### Benchmark Evaluation
+
+Run inference on all datasets:
+```bash
+bash benchmark/infer/infer.sh
+```
+
+Evaluate results:
+```bash
+bash benchmark/eval/eval.sh
+```
+
+### Key Parameters
+
+- `--process-length`: Number of frames to process (default: 195)
+- `--window-size`: Sliding window size (default: 110)
+- `--overlap`: Frame overlap between windows (default: 25)
+- `--max-res`: Maximum resolution (default: 1024)
+- `--num-denoising-steps`: Denoising steps (default: 5)
+- `--guidance-scale`: Guidance scale for inference (default: 1.0)
+- `--save-npz`: Save depth as NPZ file
+- `--save-exr`: Save depth as EXR file
+
+## Model Loading
+
+The model uses two key components from Hugging Face:
+1. DepthCrafter UNet: `tencent/DepthCrafter`
+2. Base diffusion model: `stabilityai/stable-video-diffusion-img2vid-xt`
+
+## Dependencies
+
+Key dependencies:
+- PyTorch 2.0.1
+- Diffusers 0.29.1
+- Transformers 4.41.2
+- XFormers 0.0.20 (for memory efficient attention)
+- OpenEXR 3.2.4 (for EXR output)
+
+## Performance Notes
+
+- v1.0.1 improvements: ~4x faster inference (465ms/frame vs 1914ms/frame at 1024x576)
+- Memory optimization options via `--cpu-offload` parameter:
+  - `"model"`: Standard CPU offloading
+  - `"sequential"`: Sequential offloading (slower but saves more memory)
\ No newline at end of file
diff --git a/README_macOS.md b/README_macOS.md
new file mode 100644
index 0000000..83330c4
--- /dev/null
+++ b/README_macOS.md
@@ -0,0 +1,348 @@
+# DepthCrafter for macOS (Apple Silicon & Intel)
+
+This is a modified version of DepthCrafter optimized for macOS, with full support for Apple Silicon (M1/M2/M3) and Intel Macs. The modifications enable CPU-based processing since MPS (Metal Performance Shaders) doesn't support Conv3D operations required by the model.
+
+## 🍎 Key Modifications for macOS
+
+### 1. **CPU-Only Processing**
+- Removed CUDA dependencies
+- Disabled MPS due to Conv3D limitations
+- Uses CPU for all computations (slower but fully functional)
+
+### 2. **FP32 Precision**
+- Changed from FP16 to FP32 for CPU compatibility
+- Ensures numerical stability on CPU
+
+### 3. **Enhanced Video Processing**
+- FFmpeg-based video handling with fallback support
+- Automatic format conversion to MP4 (HEVC/H.264)
+- Smart video trimming and frame extraction
+- Progress indicators for all conversions
+
+### 4. **Interactive CLI Interface**
+- User-friendly terminal UI
+- Preset management system
+- Visual progress tracking
+
+## 📋 Requirements
+
+### System Requirements
+- **macOS**: 10.15 (Catalina) or later
+- **Python**: 3.8 - 3.11
+- **RAM**: 16GB minimum, 32GB recommended
+- **Storage**: 10GB free space for models and processing
+
+### Software Dependencies
+```bash
+# Install Homebrew if not already installed
+/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
+
+# Install FFmpeg (required)
+brew install ffmpeg
+
+# Install Python via Homebrew (if needed)
+brew install python@3.11
+```
+
+## 🚀 Installation
+
+### 1. Clone the Repository
+```bash
+git clone https://github.com/Tencent/DepthCrafter.git
+cd DepthCrafter
+```
+
+### 2. Create Virtual Environment
+```bash
+# Create virtual environment
+python3 -m venv venv
+
+# Activate virtual environment
+source venv/bin/activate
+```
+
+### 3. Install Dependencies
+```bash
+# Upgrade pip
+pip install --upgrade pip
+
+# Install PyTorch (CPU version for macOS)
+pip install torch==2.0.1 torchvision==0.15.2
+
+# Install other requirements
+pip install diffusers==0.29.1
+pip install transformers==4.41.2
+pip install accelerate==0.30.1
+pip install numpy==1.26.4
+pip install matplotlib==3.8.4
+pip install mediapy==1.2.0
+pip install fire==0.6.0
+pip install opencv-python==4.9.0.80
+pip install gradio  # For web UI (optional)
+
+# Optional: Install decord for better video processing
+# Note: May require additional setup
+# pip install decord
+```
+
+### 4. Download Model Weights
+The models will be automatically downloaded from Hugging Face on first run:
+- `tencent/DepthCrafter` - Main UNet model
+- `stabilityai/stable-video-diffusion-img2vid-xt` - Base diffusion model
+
+## 💻 Usage
+
+### Option 1: Interactive CLI (Recommended)
+```bash
+# Launch the interactive interface
+python interactive_cli.py
+
+# Or use the launcher
+./depthcrafter_ui
+```
+
+The interactive CLI provides:
+- Step-by-step guided workflow
+- Video preview and information
+- Quality presets (Fast/Balanced/High)
+- Frame range selection
+- Preset save/load functionality
+
+### Option 2: Command Line
+
+#### Basic Usage
+```bash
+# Process entire video at 512px resolution
+python run.py --video-path input.mp4 --max-res 512
+
+# Process with custom settings
+python run.py --video-path input.mp4 \
+    --max-res 768 \
+    --num-inference-steps 10 \
+    --guidance-scale 1.2
+```
+
+#### Video Trimming
+```bash
+# Process first 50 frames only (faster for testing)
+python run.py --video-path input.mp4 --max-frames 50 --max-res 512
+
+# Process frames 100-200
+python run.py --video-path input.mp4 --start-frame 100 --max-frames 100 --max-res 512
+
+# Process specific time range (e.g., seconds 10-20)
+python run.py --video-path input.mp4 --start-frame 300 --max-frames 300 --max-res 512
+# (assuming 30fps: start at 10s = frame 300, 10s duration = 300 frames)
+```
+
+#### Output Options
+```bash
+# Save depth data in multiple formats
+python run.py --video-path input.mp4 \
+    --save-npz \           # Save as NPZ file
+    --save-exr \           # Save as EXR sequence
+    --save-folder output/  # Custom output directory
+```
+
+### Option 3: Web UI (Gradio)
+```bash
+python app.py
+# Opens browser at http://localhost:7860
+```
+
+## ⚙️ Parameters
+
+### Resolution Settings
+- `--max-res`: Maximum resolution (512/768/1024)
+  - 512: ~9GB RAM, fastest
+  - 768: ~15GB RAM, balanced
+  - 1024: ~26GB RAM, highest quality
+
+### Quality Settings
+- `--num-inference-steps`: Denoising steps (1-25, default: 5)
+- `--guidance-scale`: Guidance strength (0.5-2.0, default: 1.0)
+
+### Video Settings
+- `--max-frames`: Limit number of frames to process
+- `--start-frame`: Starting frame index
+- `--target-fps`: Output video FPS (default: 15)
+- `--process-length`: Alternative to max-frames
+
+### Processing Settings
+- `--window-size`: Sliding window size (default: 110)
+- `--overlap`: Frame overlap between windows (default: 25)
+- `--seed`: Random seed for reproducibility
+
+## 📁 Output Files
+
+The script generates the following outputs in the specified folder:
+
+```
+output_folder/
+├── videoname_input.mp4   # Preprocessed/trimmed input
+├── videoname_vis.mp4     # Colored depth visualization
+├── videoname_depth.mp4   # Raw depth video
+├── videoname.npz         # (Optional) Numpy depth data
+└── frame_XXXX.exr        # (Optional) EXR depth frames
+```
+
+## 🎯 Performance Tips
+
+### Memory Management
+1. **Start with low resolution** (512px) for testing
+2. **Use frame limits** to process shorter segments
+3. **Close other applications** to free up RAM
+4. **Monitor Activity Monitor** for memory usage
+
+### Speed Optimization
+```bash
+# Fastest settings (lower quality)
+python run.py --video-path input.mp4 \
+    --max-res 512 \
+    --num-inference-steps 3 \
+    --max-frames 50
+
+# Balanced settings
+python run.py --video-path input.mp4 \
+    --max-res 768 \
+    --num-inference-steps 5 \
+    --max-frames 100
+
+# Best quality (slowest)
+python run.py --video-path input.mp4 \
+    --max-res 1024 \
+    --num-inference-steps 10
+```
+
+### Processing Times (Approximate)
+On M1 MacBook Pro (16GB RAM) for 150 frames:
+- 512px: ~30 minutes
+- 768px: ~60 minutes
+- 1024px: ~120 minutes
+
+*Note: Intel Macs will be slower. Apple Silicon (M1/M2/M3) provides better CPU performance.*
+
+## 🎬 Supported Video Formats
+
+### Input Formats
+- MP4, MOV, AVI, MKV, WEBM, FLV, and most formats supported by FFmpeg
+- Automatic conversion to MP4 for compatibility
+
+### Automatic Optimizations
+- Converts to HEVC/H.264 codec
+- Adjusts to 15 FPS for consistency
+- Maintains aspect ratio
+- Removes audio tracks
+
+## 🔧 Troubleshooting
+
+### Common Issues
+
+#### 1. "Torch not compiled with CUDA enabled"
+This is expected on macOS. The code has been modified to use CPU instead.
+
+#### 2. "Conv3D is not supported on MPS"
+This is why we use CPU processing. MPS doesn't support 3D convolutions yet.
+
+#### 3. Memory Errors
+- Reduce `--max-res` to 512
+- Process fewer frames with `--max-frames`
+- Close other applications
+- Consider upgrading RAM
+
+#### 4. FFmpeg Errors
+```bash
+# Verify FFmpeg installation
+ffmpeg -version
+
+# Reinstall if needed
+brew reinstall ffmpeg
+```
+
+#### 5. Slow Processing
+This is normal for CPU processing. Tips:
+- Use lower resolution (512px)
+- Process shorter segments
+- Run overnight for long videos
+- Consider cloud GPU services for faster processing
+
+### Check System Resources
+```bash
+# Monitor CPU and memory usage
+top
+
+# Check available disk space
+df -h
+
+# Check Python memory usage
+python -c "import psutil; print(f'Available RAM: {psutil.virtual_memory().available / (1024**3):.1f} GB')"
+```
+
+## 🆕 Features Added for macOS
+
+1. **Automatic Video Trimming**
+   - Extract specific frame ranges before processing
+   - Reduces memory usage and processing time
+
+2. **Smart Format Conversion**
+   - Automatic conversion to compatible MP4
+   - Preserves quality while ensuring compatibility
+
+3. **Progress Indicators**
+   - Real-time conversion progress
+   - Processing status updates
+
+4. **Interactive CLI**
+   - User-friendly interface
+   - No need to remember commands
+   - Visual feedback and validation
+
+5. **Preset System**
+   - Save frequently used settings
+   - Share configurations with team
+
+## 📊 Comparison with Original
+
+| Feature | Original | macOS Version |
+|---------|----------|---------------|
+| GPU Support | CUDA | CPU only |
+| MPS Support | No | No (Conv3D limitation) |
+| Precision | FP16 | FP32 |
+| Video Handling | Decord | FFmpeg + fallbacks |
+| Trimming | Manual | Automatic |
+| Interface | CLI only | CLI + Interactive UI |
+| Presets | No | Yes |
+
+## 🤝 Contributing
+
+Contributions to improve macOS compatibility are welcome! Areas of interest:
+- MPS optimization when Conv3D support is added
+- Memory usage optimization
+- Processing speed improvements
+- Additional video format support
+
+## 📝 License
+
+This macOS version maintains the same license as the original DepthCrafter project.
+
+## 🙏 Acknowledgments
+
+- Original DepthCrafter team at Tencent AI Lab
+- PyTorch team for CPU optimizations
+- FFmpeg for robust video processing
+
+## 📮 Support
+
+For macOS-specific issues:
+1. Check this README first
+2. Search existing issues
+3. Create a new issue with:
+   - macOS version
+   - Hardware (Intel/M1/M2/M3)
+   - Python version
+   - Error messages
+   - Command used
+
+---
+
+**Note:** This is a CPU-based implementation optimized for macOS. For faster processing, consider using the original version on a CUDA-capable GPU or cloud services.
\ No newline at end of file
diff --git a/app.py b/app.py
index 26d9615..944843d 100644
--- a/app.py
+++ b/app.py
@@ -25,18 +25,20 @@
 ]
 
 
+# Detect device - use CPU since MPS doesn't support Conv3D
+device = "cuda" if torch.cuda.is_available() else "cpu"
+
 unet = DiffusersUNetSpatioTemporalConditionModelDepthCrafter.from_pretrained(
     "tencent/DepthCrafter",
     low_cpu_mem_usage=True,
-    torch_dtype=torch.float16,
+    torch_dtype=torch.float32,
 )
 pipe = DepthCrafterPipeline.from_pretrained(
     "stabilityai/stable-video-diffusion-img2vid-xt",
     unet=unet,
-    torch_dtype=torch.float16,
-    variant="fp16",
+    torch_dtype=torch.float32,
 )
-pipe.to("cuda")
+pipe.to(device)
 
 
 @spaces.GPU(duration=120)
@@ -56,7 +58,12 @@ def infer_depth(
     save_npz: bool = False,
 ):
     set_seed(seed)
-    pipe.enable_xformers_memory_efficient_attention()
+    # Only enable xformers for CUDA devices
+    if torch.cuda.is_available():
+        try:
+            pipe.enable_xformers_memory_efficient_attention()
+        except Exception as e:
+            print(f"Xformers not enabled: {e}")
 
     frames, target_fps = read_video_frames(video, process_length, target_fps, max_res)
 
@@ -91,7 +98,8 @@ def infer_depth(
 
     # clear the cache for the next video
     gc.collect()
-    torch.cuda.empty_cache()
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
 
     return [
         save_path + "_input.mp4",
diff --git a/benchmark/demo.sh b/benchmark/demo.sh
index 7cc9f1f..339a364 100644
--- a/benchmark/demo.sh
+++ b/benchmark/demo.sh
@@ -10,7 +10,7 @@ saved_dataset_folder=$5
 overlap=$6
 dataset=$7
 
-CUDA_VISIBLE_DEVICES=${gpu_id} PYTHONPATH=. python run.py \
+PYTHONPATH=. python run.py \
   --video-path ${test_case} \
   --save-folder ${saved_root}/${saved_dataset_folder} \
   --process-length ${process_length} \
diff --git a/depthcrafter/depth_crafter_ppl.py b/depthcrafter/depth_crafter_ppl.py
index b7d070d..f29f965 100644
--- a/depthcrafter/depth_crafter_ppl.py
+++ b/depthcrafter/depth_crafter_ppl.py
@@ -15,7 +15,46 @@
 logger = logging.get_logger(__name__)  # pylint: disable=invalid-name
 
 
+def _resize_with_antialiasing_safe(input, size, interpolation="bicubic", align_corners=True):
+    """Wrapper for resize that uses the standard function."""
+    # Since we're not using MPS anymore, we can use the original function
+    return _resize_with_antialiasing(input, size)
+
+
 class DepthCrafterPipeline(StableVideoDiffusionPipeline):
+    
+    @property
+    def _execution_device(self):
+        """
+        Returns the device on which the pipeline should be executed.
+        Note: MPS is not used due to lack of Conv3D support.
+        """
+        # If device attribute exists and is set
+        if hasattr(self, 'device') and self.device is not None:
+            if self.device != torch.device("meta"):
+                return self.device
+        
+        # Check if model has hooks (for CPU offloading)
+        if hasattr(self.unet, "_hf_hook"):
+            for module in self.unet.modules():
+                if (
+                    hasattr(module, "_hf_hook")
+                    and hasattr(module._hf_hook, "execution_device")
+                    and module._hf_hook.execution_device is not None
+                ):
+                    return torch.device(module._hf_hook.execution_device)
+        
+        # Try to get device from model parameters
+        try:
+            return next(self.unet.parameters()).device
+        except:
+            pass
+            
+        # Default fallback based on availability
+        # MPS doesn't support Conv3D, so we use CPU for Apple Silicon
+        if torch.cuda.is_available():
+            return torch.device("cuda")
+        return torch.device("cpu")
 
     @torch.inference_mode()
     def encode_video(
@@ -29,7 +68,7 @@ def encode_video(
         :return: image_embeddings in shape of [b, 1024]
         """
 
-        video_224 = _resize_with_antialiasing(video.float(), (224, 224))
+        video_224 = _resize_with_antialiasing_safe(video.float(), (224, 224))
         video_224 = (video_224 + 1.0) / 2.0  # [-1, 1] -> [0, 1]
 
         embeddings = []
@@ -153,18 +192,18 @@ def __call__(
         video = video * 2.0 - 1.0  # [0,1] -> [-1,1], in [t, c, h, w]
 
         if track_time:
-            start_event = torch.cuda.Event(enable_timing=True)
-            encode_event = torch.cuda.Event(enable_timing=True)
-            denoise_event = torch.cuda.Event(enable_timing=True)
-            decode_event = torch.cuda.Event(enable_timing=True)
-            start_event.record()
+            import time
+            start_time = time.time()
+            encode_time = None
+            denoise_time = None
 
         video_embeddings = self.encode_video(
             video, chunk_size=decode_chunk_size
         ).unsqueeze(
             0
         )  # [1, t, 1024]
-        torch.cuda.empty_cache()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
         # 4. Encode input image using VAE
         noise = randn_tensor(
             video.shape, generator=generator, device=device, dtype=video.dtype
@@ -173,7 +212,7 @@ def __call__(
 
         # pdb.set_trace()
         needs_upcasting = (
-            self.vae.dtype == torch.float16 and self.vae.config.force_upcast
+            self.vae.dtype == torch.float32 and self.vae.config.force_upcast
         )
         if needs_upcasting:
             self.vae.to(dtype=torch.float32)
@@ -186,16 +225,16 @@ def __call__(
         )  # [1, t, c, h, w]
 
         if track_time:
-            encode_event.record()
-            torch.cuda.synchronize()
-            elapsed_time_ms = start_event.elapsed_time(encode_event)
-            print(f"Elapsed time for encoding video: {elapsed_time_ms} ms")
+            encode_time = time.time()
+            elapsed_time_ms = (encode_time - start_time) * 1000
+            print(f"Elapsed time for encoding video: {elapsed_time_ms:.2f} ms")
 
-        torch.cuda.empty_cache()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
 
-        # cast back to fp16 if needed
+        # cast back to fp32 if needed
         if needs_upcasting:
-            self.vae.to(dtype=torch.float16)
+            self.vae.to(dtype=torch.float32)
 
         # 5. Get Added Time IDs
         added_time_ids = self._get_add_time_ids(
@@ -238,7 +277,8 @@ def __call__(
         else:
             weights = None
 
-        torch.cuda.empty_cache()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
 
         # inference strategy for long videos
         # two main strategies: 1. noise init from previous frame, 2. segments stitching
@@ -335,22 +375,20 @@ def __call__(
             idx_start += stride
 
         if track_time:
-            denoise_event.record()
-            torch.cuda.synchronize()
-            elapsed_time_ms = encode_event.elapsed_time(denoise_event)
-            print(f"Elapsed time for denoising video: {elapsed_time_ms} ms")
+            denoise_time = time.time()
+            elapsed_time_ms = (denoise_time - encode_time) * 1000
+            print(f"Elapsed time for denoising video: {elapsed_time_ms:.2f} ms")
 
         if not output_type == "latent":
-            # cast back to fp16 if needed
+            # cast back to fp32 if needed
             if needs_upcasting:
-                self.vae.to(dtype=torch.float16)
+                self.vae.to(dtype=torch.float32)
             frames = self.decode_latents(latents_all, num_frames, decode_chunk_size)
 
             if track_time:
-                decode_event.record()
-                torch.cuda.synchronize()
-                elapsed_time_ms = denoise_event.elapsed_time(decode_event)
-                print(f"Elapsed time for decoding video: {elapsed_time_ms} ms")
+                decode_time = time.time()
+                elapsed_time_ms = (decode_time - denoise_time) * 1000
+                print(f"Elapsed time for decoding video: {elapsed_time_ms:.2f} ms")
 
             frames = self.video_processor.postprocess_video(
                 video=frames, output_type=output_type
diff --git a/depthcrafter/unet.py b/depthcrafter/unet.py
index 0066a71..7472803 100644
--- a/depthcrafter/unet.py
+++ b/depthcrafter/unet.py
@@ -39,7 +39,7 @@ def forward(
         t_emb = self.time_proj(timesteps)
 
         # `Timesteps` does not contain any weights and will always return f32 tensors
-        # but time_embedding might actually be running in fp16. so we need to cast here.
+        # time_embedding should be running in fp32. Cast to ensure compatibility.
         # there might be better ways to encapsulate this.
         t_emb = t_emb.to(dtype=self.conv_in.weight.dtype)
 
diff --git a/depthcrafter/utils.py b/depthcrafter/utils.py
index 2ac50e8..cb113dd 100644
--- a/depthcrafter/utils.py
+++ b/depthcrafter/utils.py
@@ -5,7 +5,17 @@
 import matplotlib.cm as cm
 import mediapy
 import torch
-from decord import VideoReader, cpu
+import subprocess
+import json
+import os
+import sys
+import re
+import warnings
+try:
+    from decord import VideoReader, cpu
+    DECORD_AVAILABLE = True
+except ImportError:
+    DECORD_AVAILABLE = False
 
 dataset_res_dict = {
     "sintel": [448, 1024],
@@ -16,12 +26,98 @@
 }
 
 
-def read_video_frames(video_path, process_length, target_fps, max_res, dataset="open"):
+def get_video_info_ffmpeg(video_path):
+    """Get video metadata using ffprobe."""
+    cmd = [
+        'ffprobe',
+        '-v', 'error',
+        '-select_streams', 'v:0',
+        '-count_frames',
+        '-show_entries', 'stream=width,height,r_frame_rate,nb_frames',
+        '-of', 'json',
+        video_path
+    ]
+    
+    try:
+        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
+        info = json.loads(result.stdout)
+        stream = info['streams'][0]
+        
+        # Parse frame rate
+        fps_str = stream['r_frame_rate']
+        if '/' in fps_str:
+            num, den = map(float, fps_str.split('/'))
+            fps = num / den
+        else:
+            fps = float(fps_str)
+        
+        return {
+            'width': int(stream['width']),
+            'height': int(stream['height']),
+            'fps': fps,
+            'nb_frames': int(stream.get('nb_frames', 0))
+        }
+    except (subprocess.CalledProcessError, KeyError, ValueError) as e:
+        # Fallback: get basic info without frame count
+        cmd = [
+            'ffprobe',
+            '-v', 'error',
+            '-select_streams', 'v:0',
+            '-show_entries', 'stream=width,height,r_frame_rate',
+            '-of', 'json',
+            video_path
+        ]
+        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
+        info = json.loads(result.stdout)
+        stream = info['streams'][0]
+        
+        fps_str = stream['r_frame_rate']
+        if '/' in fps_str:
+            num, den = map(float, fps_str.split('/'))
+            fps = num / den
+        else:
+            fps = float(fps_str)
+        
+        # Estimate frame count
+        duration_cmd = [
+            'ffprobe',
+            '-v', 'error',
+            '-show_entries', 'format=duration',
+            '-of', 'json',
+            video_path
+        ]
+        duration_result = subprocess.run(duration_cmd, capture_output=True, text=True, check=True)
+        duration_info = json.loads(duration_result.stdout)
+        duration = float(duration_info['format']['duration'])
+        
+        return {
+            'width': int(stream['width']),
+            'height': int(stream['height']),
+            'fps': fps,
+            'nb_frames': int(duration * fps)
+        }
+
+
+def read_video_frames_ffmpeg(video_path, process_length, target_fps, max_res, dataset="open"):
+    """Read video frames using ffmpeg."""
+    # Convert to absolute path
+    video_path = os.path.abspath(video_path)
+    
+    if not os.path.exists(video_path):
+        raise RuntimeError(f"Video file not found: {video_path}")
+    
+    print("==> processing video directly with ffmpeg: ", video_path)
+    
+    # Get video info
+    video_info = get_video_info_ffmpeg(video_path)
+    original_width = video_info['width']
+    original_height = video_info['height']
+    original_fps = video_info['fps']
+    total_frames = video_info['nb_frames']
+    
+    print(f"==> original video shape: ({total_frames}, {original_height}, {original_width}, 3)")
+    
     if dataset == "open":
-        print("==> processing video: ", video_path)
-        vid = VideoReader(video_path, ctx=cpu(0))
-        print("==> original video shape: ", (len(vid), *vid.get_batch([0]).shape[1:]))
-        original_height, original_width = vid.get_batch([0]).shape[1:3]
         height = round(original_height / 64) * 64
         width = round(original_width / 64) * 64
         if max(height, width) > max_res:
@@ -31,23 +127,476 @@ def read_video_frames(video_path, process_length, target_fps, max_res, dataset="
     else:
         height = dataset_res_dict[dataset][0]
         width = dataset_res_dict[dataset][1]
-
-    vid = VideoReader(video_path, ctx=cpu(0), width=width, height=height)
-
-    fps = vid.get_avg_fps() if target_fps == -1 else target_fps
-    stride = round(vid.get_avg_fps() / fps)
+    
+    fps = original_fps if target_fps == -1 else target_fps
+    stride = round(original_fps / fps)
     stride = max(stride, 1)
-    frames_idx = list(range(0, len(vid), stride))
-    print(
-        f"==> downsampled shape: {len(frames_idx), *vid.get_batch([0]).shape[1:]}, with stride: {stride}"
-    )
+    
+    # Calculate which frames to extract
+    frames_idx = list(range(0, total_frames, stride))
     if process_length != -1 and process_length < len(frames_idx):
         frames_idx = frames_idx[:process_length]
-    print(
-        f"==> final processing shape: {len(frames_idx), *vid.get_batch([0]).shape[1:]}"
-    )
-    frames = vid.get_batch(frames_idx).asnumpy().astype("float32") / 255.0
+    
+    print(f"==> downsampled shape: ({len(frames_idx)}, {height}, {width}, 3), with stride: {stride}")
+    print(f"==> final processing shape: ({len(frames_idx)}, {height}, {width}, 3)")
+    
+    # Build ffmpeg command to extract frames
+    # Simplified approach: extract at target fps and scale
+    vf_filters = []
+    
+    # Add fps filter to get the right frame rate
+    if stride > 1:
+        vf_filters.append(f"fps={fps}")
+    
+    # Add scaling
+    vf_filters.append(f"scale={width}:{height}:force_original_aspect_ratio=decrease")
+    vf_filters.append(f"pad={width}:{height}:(ow-iw)/2:(oh-ih)/2")
+    
+    # Limit frames if needed
+    if process_length != -1:
+        vf_filters.append(f"select='lt(n\,{process_length})'")
+    
+    vf_string = ','.join(vf_filters)
+    
+    cmd = [
+        'ffmpeg',
+        '-i', video_path,
+        '-vf', vf_string,
+        '-f', 'rawvideo',
+        '-pix_fmt', 'rgb24',
+        '-v', 'error',
+        '-'
+    ]
+    
+    # Run ffmpeg and capture output
+    process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+    stdout, stderr = process.communicate()
+    
+    if process.returncode != 0:
+        raise RuntimeError(f"ffmpeg failed: {stderr.decode()}")
+    
+    # Convert raw RGB data to numpy array
+    frames = np.frombuffer(stdout, dtype=np.uint8)
+    frames = frames.reshape((-1, height, width, 3))
+    frames = frames.astype(np.float32) / 255.0
+    
+    # Ensure we have the expected number of frames
+    if frames.shape[0] != len(frames_idx):
+        print(f"Warning: Expected {len(frames_idx)} frames, got {frames.shape[0]}")
+    
+    return frames, fps
+
+
+def get_video_duration(input_path):
+    """Get video duration in seconds using ffprobe."""
+    cmd = [
+        'ffprobe',
+        '-v', 'error',
+        '-show_entries', 'format=duration',
+        '-of', 'json',
+        input_path
+    ]
+    try:
+        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
+        info = json.loads(result.stdout)
+        return float(info['format']['duration'])
+    except:
+        return None
+
+def show_progress(current_time, total_duration, width=50):
+    """Display a progress bar for video conversion."""
+    if total_duration is None or total_duration == 0:
+        return
+    
+    progress = min(current_time / total_duration, 1.0)
+    filled = int(width * progress)
+    bar = '█' * filled + '░' * (width - filled)
+    percent = progress * 100
+    
+    # Clear the line and print progress
+    sys.stdout.write(f'\rConverting: [{bar}] {percent:.1f}% ({current_time:.1f}s/{total_duration:.1f}s)')
+    sys.stdout.flush()
+
+def convert_to_mp4(input_path, output_path=None, target_fps=15, max_frames=None, start_frame=0):
+    """Convert video to MP4 format matching the example videos' settings.
+    
+    Args:
+        input_path: Path to input video
+        output_path: Path to output MP4 (if None, creates temp file)
+        target_fps: Target frame rate (default: 15)
+        max_frames: Maximum number of frames to extract (if None, extract all)
+        start_frame: Starting frame number (default: 0)
+    """
+    # Convert to absolute path to avoid path issues
+    input_path = os.path.abspath(input_path)
+    
+    if not os.path.exists(input_path):
+        raise RuntimeError(f"Input video file not found: {input_path}")
+    
+    if output_path is None:
+        # Create a temporary MP4 file
+        temp_file = tempfile.NamedTemporaryFile(suffix='.mp4', delete=False)
+        output_path = temp_file.name
+        temp_file.close()
+    else:
+        output_path = os.path.abspath(output_path)
+    
+    # Check if input is already MP4 with correct codec
+    if input_path.lower().endswith('.mp4'):
+        # Check if it's actually a valid MP4 that can be read
+        try:
+            # Quick probe to see if it's readable and has correct codec
+            cmd = ['ffprobe', '-v', 'error', '-select_streams', 'v:0', 
+                   '-show_entries', 'stream=codec_name', '-of', 'json', input_path]
+            result = subprocess.run(cmd, capture_output=True, text=True, check=True)
+            info = json.loads(result.stdout)
+            codec = info['streams'][0]['codec_name']
+            # If it's already HEVC or H264, and readable, return original
+            if codec in ['hevc', 'h264']:
+                return input_path
+        except (subprocess.CalledProcessError, KeyError, json.JSONDecodeError):
+            # If not readable or wrong codec, proceed with conversion
+            pass
+    
+    # Determine if we're trimming the video
+    trimming = max_frames is not None or start_frame > 0
+    
+    if trimming:
+        print(f"Trimming and converting video to MP4: {os.path.basename(input_path)}")
+        print(f"  Extracting frames {start_frame} to {start_frame + (max_frames or 'end')}")
+    else:
+        print(f"Converting video to MP4 format: {os.path.basename(input_path)}")
+    
+    print(f"  Input path: {input_path}")
+    print(f"  Output path: {output_path}")
+    print(f"  File exists: {os.path.exists(input_path)}")
+    print(f"  File size: {os.path.getsize(input_path) / (1024*1024):.1f} MB" if os.path.exists(input_path) else "")
+    
+    # Get video info for progress tracking and trimming
+    video_info = get_video_info_ffmpeg(input_path)
+    original_fps = video_info.get('fps', 30)
+    duration = get_video_duration(input_path)
+    
+    # Calculate time ranges if trimming
+    if trimming:
+        start_time = start_frame / original_fps if start_frame > 0 else 0
+        if max_frames:
+            # IMPORTANT: Use original_fps to calculate duration, not target_fps
+            # We want to extract max_frames from the original video
+            duration_time = max_frames / original_fps
+            # Adjust duration for progress bar
+            duration = min(duration_time, duration - start_time if duration else duration_time)
+        else:
+            duration_time = None
+    
+    def run_ffmpeg_with_progress(cmd, codec_name):
+        """Run ffmpeg command with progress tracking."""
+        # Add progress output to the command
+        # Need to insert -progress and -stats before the input file (-i)
+        try:
+            i_index = cmd.index('-i')
+            cmd_with_progress = cmd[:i_index] + ['-progress', 'pipe:1', '-stats'] + cmd[i_index:]
+        except ValueError:
+            # If -i not found, add at position 2 (after ffmpeg)
+            cmd_with_progress = cmd[:1] + ['-progress', 'pipe:1', '-stats'] + cmd[1:]
+        
+        process = subprocess.Popen(
+            cmd_with_progress,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            universal_newlines=True
+        )
+        
+        # Pattern to match time from ffmpeg progress output
+        time_pattern = re.compile(r'out_time_ms=(\d+)')
+        stderr_lines = []
+        
+        # Read stderr in background
+        import threading
+        def read_stderr():
+            for line in process.stderr:
+                stderr_lines.append(line)
+        
+        stderr_thread = threading.Thread(target=read_stderr)
+        stderr_thread.daemon = True
+        stderr_thread.start()
+        
+        for line in process.stdout:
+            match = time_pattern.search(line)
+            if match:
+                current_time_ms = int(match.group(1))
+                current_time = current_time_ms / 1_000_000  # Convert microseconds to seconds
+                show_progress(current_time, duration)
+        
+        # Wait for process to complete
+        process.wait()
+        stderr_thread.join(timeout=1)
+        
+        if process.returncode == 0:
+            print(f"\n✓ Video successfully converted to MP4 ({codec_name})")
+            return True
+        else:
+            stderr = ''.join(stderr_lines)
+            print(f"\n✗ {codec_name} conversion failed")
+            # Print relevant error messages
+            if 'Unknown encoder' in stderr or 'not found' in stderr:
+                print(f"  Error: {codec_name} encoder not available in ffmpeg")
+            elif 'Invalid' in stderr or 'Error' in stderr:
+                # Extract error lines
+                error_lines = [line.strip() for line in stderr_lines if 'Error' in line or 'Invalid' in line]
+                if error_lines:
+                    print(f"  Error details: {error_lines[0]}")
+            return False
+    
+    # Build base command
+    def build_command(codec, codec_lib, preset='medium', crf='23', use_target_fps=True):
+        cmd = ['ffmpeg']
+        
+        # Add trimming options BEFORE input (for fast seek)
+        if trimming:
+            if start_frame > 0:
+                # Use format HH:MM:SS.mmm for better compatibility
+                hours = int(start_time // 3600)
+                minutes = int((start_time % 3600) // 60)
+                seconds = start_time % 60
+                time_str = f"{hours:02d}:{minutes:02d}:{seconds:06.3f}"
+                cmd.extend(['-ss', time_str])
+            if max_frames:
+                # Duration also in time format
+                hours = int(duration_time // 3600)
+                minutes = int((duration_time % 3600) // 60)
+                seconds = duration_time % 60
+                duration_str = f"{hours:02d}:{minutes:02d}:{seconds:06.3f}"
+                cmd.extend(['-t', duration_str])
+        
+        cmd.extend(['-i', input_path])
+        
+        # Video encoding options
+        cmd.extend([
+            '-c:v', codec_lib,
+            '-preset', preset,
+            '-crf', crf,
+            '-pix_fmt', 'yuv420p',
+        ])
+        
+        # Only set output frame rate if requested and different from input
+        # This prevents frame duplication/interpolation
+        if use_target_fps and target_fps != -1:
+            cmd.extend(['-r', str(target_fps)])
+        
+        if codec == 'hevc':
+            cmd.extend(['-tag:v', 'hev1'])
+        
+        # Add metadata for trimmed videos
+        if trimming:
+            cmd.extend([
+                '-metadata', f'title=Trimmed from {os.path.basename(input_path)}',
+                '-metadata', f'comment=Frames {start_frame}-{start_frame + (max_frames or "end")} at {target_fps}fps',
+            ])
+        
+        cmd.extend([
+            '-an',  # No audio
+            '-movflags', '+faststart',
+            '-y',
+            output_path
+        ])
+        
+        return cmd
+    
+    # Build ffmpeg command for conversion matching example videos
+    # First try with HEVC (H.265) like the examples
+    # Don't change fps when trimming to preserve frame count
+    use_target_fps = not trimming or target_fps == -1
+    cmd_hevc = build_command('hevc', 'libx265', use_target_fps=use_target_fps)
+    
+    # Try HEVC first
+    if run_ffmpeg_with_progress(cmd_hevc, 'HEVC'):
+        return output_path
+    
+    print("Falling back to H.264...")
+    
+    # Fallback to H.264 if HEVC fails (better compatibility)
+    cmd_h264 = build_command('h264', 'libx264', use_target_fps=use_target_fps)
+    
+    if run_ffmpeg_with_progress(cmd_h264, 'H.264'):
+        return output_path
+    
+    # If both conversions failed, try a more basic conversion
+    print("\nTrying basic MP4 conversion with default settings...")
+    
+    cmd_basic = build_command('h264', 'libx264', preset='fast', crf='28', use_target_fps=use_target_fps)
+    
+    if run_ffmpeg_with_progress(cmd_basic, 'H.264 (basic)'):
+        return output_path
+    
+    # Last resort: try with minimal options
+    print("\nTrying minimal conversion...")
+    cmd_minimal = ['ffmpeg']
+    if trimming:
+        if start_frame > 0:
+            # Use time format for compatibility
+            hours = int(start_time // 3600)
+            minutes = int((start_time % 3600) // 60)
+            seconds = start_time % 60
+            time_str = f"{hours:02d}:{minutes:02d}:{seconds:06.3f}"
+            cmd_minimal.extend(['-ss', time_str])
+        if max_frames:
+            # Duration in time format
+            hours = int(duration_time // 3600)
+            minutes = int((duration_time % 3600) // 60)
+            seconds = duration_time % 60
+            duration_str = f"{hours:02d}:{minutes:02d}:{seconds:06.3f}"
+            cmd_minimal.extend(['-t', duration_str])
+    cmd_minimal.extend([
+        '-i', input_path,
+        '-c:v', 'libx264',
+        '-an',
+        '-y',
+        output_path
+    ])
+    
+    process = subprocess.run(cmd_minimal, capture_output=True, text=True)
+    if process.returncode == 0:
+        print("✓ Video converted with minimal settings")
+        return output_path
+    else:
+        print(f"✗ Minimal conversion also failed")
+        print(f"Error: {process.stderr[:500]}...")
+        raise RuntimeError(f"Failed to convert video to MP4. Please check if ffmpeg is properly installed and the input video is valid.")
+
+def read_video_frames(video_path, process_length, target_fps, max_res, dataset="open", skip_conversion=False, start_frame=0):
+    """Read video frames with MP4 conversion and fallback to ffmpeg if decord fails.
+    
+    Args:
+        video_path: Path to input video
+        process_length: Number of frames to process (-1 for all)
+        target_fps: Target frame rate (-1 to keep original)
+        max_res: Maximum resolution
+        dataset: Dataset type for resolution presets
+        skip_conversion: Skip MP4 conversion if True
+        start_frame: Starting frame for trimming (default: 0)
+    """
+    
+    # Convert to absolute path
+    video_path = os.path.abspath(video_path)
+    
+    if not os.path.exists(video_path):
+        raise RuntimeError(f"Video file not found: {video_path}")
+    
+    # Convert to MP4 first if needed
+    converted_path = None
+    original_path = video_path
+    
+    # Convert non-MP4 files or problematic files to MP4 (unless skip_conversion is True)
+    # Use 15 fps by default (matching example videos) unless specified
+    default_fps = 15 if target_fps == -1 else target_fps
+    
+    # Determine if we need to trim or convert
+    needs_conversion = not video_path.lower().endswith('.mp4')
+    needs_trimming = process_length > 0 and process_length != -1
+    
+    if not skip_conversion and (needs_conversion or needs_trimming):
+        try:
+            # If trimming is needed, always convert (even MP4s) to create a trimmed version
+            if needs_trimming:
+                print(f"\nCreating trimmed video: {process_length} frames starting from frame {start_frame}")
+                # For trimming, use -1 for target_fps to keep original fps
+                # This prevents frame count changes
+                converted_path = convert_to_mp4(
+                    video_path, 
+                    target_fps=-1,  # Keep original fps to maintain frame count
+                    max_frames=process_length,
+                    start_frame=start_frame
+                )
+                video_path = converted_path
+                # After trimming, we don't need to limit frames again
+                process_length = -1
+            elif needs_conversion:
+                converted_path = convert_to_mp4(video_path, target_fps=default_fps)
+                video_path = converted_path
+        except RuntimeError as e:
+            print(f"\nWarning: MP4 conversion/trimming failed: {e}")
+            print("Attempting to process the original video directly...")
+            video_path = original_path
+    
+    # Try using decord first if available
+    if DECORD_AVAILABLE:
+        try:
+            if dataset == "open":
+                print("==> processing video: ", video_path)
+                vid = VideoReader(video_path, ctx=cpu(0))
+                print("==> original video shape: ", (len(vid), *vid.get_batch([0]).shape[1:]))
+                original_height, original_width = vid.get_batch([0]).shape[1:3]
+                height = round(original_height / 64) * 64
+                width = round(original_width / 64) * 64
+                if max(height, width) > max_res:
+                    scale = max_res / max(original_height, original_width)
+                    height = round(original_height * scale / 64) * 64
+                    width = round(original_width * scale / 64) * 64
+            else:
+                height = dataset_res_dict[dataset][0]
+                width = dataset_res_dict[dataset][1]
+
+            vid = VideoReader(video_path, ctx=cpu(0), width=width, height=height)
 
+            fps = vid.get_avg_fps() if target_fps == -1 else target_fps
+            stride = round(vid.get_avg_fps() / fps)
+            stride = max(stride, 1)
+            frames_idx = list(range(0, len(vid), stride))
+            print(
+                f"==> downsampled shape: {len(frames_idx), *vid.get_batch([0]).shape[1:]}, with stride: {stride}"
+            )
+            if process_length != -1 and process_length < len(frames_idx):
+                frames_idx = frames_idx[:process_length]
+            print(
+                f"==> final processing shape: {len(frames_idx), *vid.get_batch([0]).shape[1:]}"
+            )
+            frames = vid.get_batch(frames_idx).asnumpy().astype("float32") / 255.0
+            
+            # Clean up temporary file if created
+            if converted_path and converted_path != original_path:
+                try:
+                    os.remove(converted_path)
+                except:
+                    pass
+            
+            return frames, fps
+            
+        except Exception as e:
+            print(f"Decord failed to read video: {e}")
+            # If decord fails on MP4, try converting again with different settings
+            if not skip_conversion and video_path == original_path:  # Only convert if we haven't already
+                print("Attempting to convert video to MP4...")
+                default_fps = 15 if target_fps == -1 else target_fps
+                try:
+                    # Try conversion with trimming if needed
+                    if process_length > 0 and process_length != -1:
+                        converted_path = convert_to_mp4(
+                            video_path, 
+                            target_fps=default_fps,
+                            max_frames=process_length,
+                            start_frame=start_frame
+                        )
+                        process_length = -1  # Reset since video is now trimmed
+                    else:
+                        converted_path = convert_to_mp4(video_path, target_fps=default_fps)
+                    video_path = converted_path
+                except RuntimeError:
+                    print("Conversion failed, using original video")
+                    video_path = original_path
+            print("Falling back to ffmpeg direct processing...")
+    
+    # Fallback to ffmpeg
+    try:
+        frames, fps = read_video_frames_ffmpeg(video_path, process_length, target_fps, max_res, dataset)
+    finally:
+        # Clean up temporary file if created
+        if converted_path and converted_path != original_path:
+            try:
+                os.remove(converted_path)
+            except:
+                pass
+    
     return frames, fps
 
 
diff --git a/depthcrafter_ui b/depthcrafter_ui
new file mode 100755
index 0000000..78e2900
--- /dev/null
+++ b/depthcrafter_ui
@@ -0,0 +1,21 @@
+#!/usr/bin/env python3
+"""
+DepthCrafter UI Launcher
+Quick launcher for the interactive CLI
+"""
+
+import sys
+import os
+
+# Add current directory to path
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+
+# Import and run the interactive CLI
+from interactive_cli import main
+
+if __name__ == "__main__":
+    try:
+        main()
+    except KeyboardInterrupt:
+        print("\n\nExiting...")
+        sys.exit(0)
\ No newline at end of file
diff --git a/interactive_cli.py b/interactive_cli.py
new file mode 100755
index 0000000..ec4a782
--- /dev/null
+++ b/interactive_cli.py
@@ -0,0 +1,469 @@
+#!/usr/bin/env python3
+"""
+Interactive CLI for DepthCrafter
+A user-friendly command-line interface for video depth estimation
+"""
+
+import os
+import sys
+import glob
+import json
+import subprocess
+from pathlib import Path
+from typing import Optional, Dict, Any
+import shutil
+
+# Color codes for terminal output
+class Colors:
+    HEADER = '\033[95m'
+    BLUE = '\033[94m'
+    CYAN = '\033[96m'
+    GREEN = '\033[92m'
+    YELLOW = '\033[93m'
+    RED = '\033[91m'
+    ENDC = '\033[0m'
+    BOLD = '\033[1m'
+    UNDERLINE = '\033[4m'
+
+def clear_screen():
+    """Clear the terminal screen"""
+    os.system('cls' if os.name == 'nt' else 'clear')
+
+def print_header():
+    """Print the application header"""
+    clear_screen()
+    print(f"{Colors.CYAN}{Colors.BOLD}")
+    print("═" * 60)
+    print("     DepthCrafter Interactive CLI     ")
+    print("     Generate Depth Maps from Videos     ")
+    print("═" * 60)
+    print(f"{Colors.ENDC}")
+
+def print_section(title: str):
+    """Print a section header"""
+    print(f"\n{Colors.YELLOW}▶ {title}{Colors.ENDC}")
+    print("-" * 40)
+
+def get_video_info(video_path: str) -> Optional[Dict[str, Any]]:
+    """Get video information using ffprobe"""
+    try:
+        cmd = [
+            'ffprobe', '-v', 'error',
+            '-select_streams', 'v:0',
+            '-count_frames',
+            '-show_entries', 'stream=width,height,r_frame_rate,nb_frames,codec_name',
+            '-show_entries', 'format=duration,size',
+            '-of', 'json',
+            video_path
+        ]
+        result = subprocess.run(cmd, capture_output=True, text=True)
+        if result.returncode == 0:
+            info = json.loads(result.stdout)
+            stream = info['streams'][0]
+            format_info = info['format']
+            
+            # Parse frame rate
+            fps_str = stream.get('r_frame_rate', '30/1')
+            if '/' in fps_str:
+                num, den = map(float, fps_str.split('/'))
+                fps = num / den
+            else:
+                fps = float(fps_str)
+            
+            return {
+                'width': int(stream.get('width', 0)),
+                'height': int(stream.get('height', 0)),
+                'fps': fps,
+                'frames': int(stream.get('nb_frames', 0)),
+                'duration': float(format_info.get('duration', 0)),
+                'size': int(format_info.get('size', 0)),
+                'codec': stream.get('codec_name', 'unknown')
+            }
+    except:
+        return None
+
+def format_size(size_bytes: int) -> str:
+    """Format file size in human-readable format"""
+    for unit in ['B', 'KB', 'MB', 'GB']:
+        if size_bytes < 1024.0:
+            return f"{size_bytes:.1f} {unit}"
+        size_bytes /= 1024.0
+    return f"{size_bytes:.1f} TB"
+
+def format_duration(seconds: float) -> str:
+    """Format duration in human-readable format"""
+    hours = int(seconds // 3600)
+    minutes = int((seconds % 3600) // 60)
+    secs = int(seconds % 60)
+    
+    if hours > 0:
+        return f"{hours}h {minutes}m {secs}s"
+    elif minutes > 0:
+        return f"{minutes}m {secs}s"
+    else:
+        return f"{secs}s"
+
+def select_video() -> Optional[str]:
+    """Let user select a video file"""
+    print_section("Select Video File")
+    
+    # Option 1: Recent files
+    recent_videos = []
+    for pattern in ['*.mp4', '*.webm', '*.avi', '*.mov', '*.mkv']:
+        recent_videos.extend(glob.glob(pattern))
+        recent_videos.extend(glob.glob(f"examples/{pattern}"))
+    
+    if recent_videos:
+        print(f"{Colors.GREEN}Found videos:{Colors.ENDC}")
+        for i, video in enumerate(recent_videos[:10], 1):
+            print(f"  {i}. {video}")
+        print(f"  {Colors.CYAN}0. Enter custom path{Colors.ENDC}")
+        
+        choice = input(f"\n{Colors.BLUE}Select video (1-{len(recent_videos[:10])} or 0): {Colors.ENDC}")
+        
+        if choice.isdigit():
+            idx = int(choice)
+            if 1 <= idx <= len(recent_videos[:10]):
+                return recent_videos[idx - 1]
+    
+    # Option 2: Custom path
+    video_path = input(f"{Colors.BLUE}Enter video path: {Colors.ENDC}").strip()
+    
+    if video_path.startswith('"') and video_path.endswith('"'):
+        video_path = video_path[1:-1]
+    
+    if os.path.exists(video_path):
+        return video_path
+    else:
+        print(f"{Colors.RED}Error: File not found!{Colors.ENDC}")
+        return None
+
+def display_video_info(video_path: str, info: Dict[str, Any]):
+    """Display video information"""
+    print_section("Video Information")
+    print(f"  📹 File: {Colors.CYAN}{os.path.basename(video_path)}{Colors.ENDC}")
+    print(f"  📐 Resolution: {info['width']}x{info['height']}")
+    print(f"  🎬 Codec: {info['codec']}")
+    print(f"  ⏱️  Duration: {format_duration(info['duration'])}")
+    print(f"  🎞️  Frames: {info['frames']} @ {info['fps']:.1f} fps")
+    print(f"  💾 Size: {format_size(info['size'])}")
+
+def get_processing_options() -> Dict[str, Any]:
+    """Get processing options from user"""
+    print_section("Processing Options")
+    
+    # Presets
+    print(f"\n{Colors.GREEN}Quality Presets:{Colors.ENDC}")
+    print("  1. 🚀 Fast (512px, 5 steps) - ~2 min for 150 frames")
+    print("  2. ⚖️  Balanced (768px, 5 steps) - ~4 min for 150 frames")
+    print("  3. 🎯 High Quality (1024px, 10 steps) - ~8 min for 150 frames")
+    print("  4. 🎨 Custom settings")
+    
+    preset = input(f"\n{Colors.BLUE}Select preset (1-4): {Colors.ENDC}").strip()
+    
+    options = {}
+    
+    if preset == '1':
+        options['max_res'] = 512
+        options['num_inference_steps'] = 5
+        options['guidance_scale'] = 1.0
+    elif preset == '2':
+        options['max_res'] = 768
+        options['num_inference_steps'] = 5
+        options['guidance_scale'] = 1.0
+    elif preset == '3':
+        options['max_res'] = 1024
+        options['num_inference_steps'] = 10
+        options['guidance_scale'] = 1.2
+    else:
+        # Custom settings
+        print(f"\n{Colors.YELLOW}Custom Settings:{Colors.ENDC}")
+        
+        max_res = input(f"  Max resolution ({Colors.CYAN}512/768/1024{Colors.ENDC}) [512]: ").strip()
+        options['max_res'] = int(max_res) if max_res else 512
+        
+        steps = input(f"  Inference steps ({Colors.CYAN}1-25{Colors.ENDC}) [5]: ").strip()
+        options['num_inference_steps'] = int(steps) if steps else 5
+        
+        guidance = input(f"  Guidance scale ({Colors.CYAN}0.5-2.0{Colors.ENDC}) [1.0]: ").strip()
+        options['guidance_scale'] = float(guidance) if guidance else 1.0
+    
+    return options
+
+def get_frame_range(video_info: Dict[str, Any]) -> Dict[str, Any]:
+    """Get frame range options from user"""
+    print_section("Frame Range")
+    
+    total_frames = video_info['frames']
+    fps = video_info['fps']
+    
+    print(f"Total frames: {total_frames} ({format_duration(total_frames/fps)})")
+    print(f"\n{Colors.GREEN}Options:{Colors.ENDC}")
+    print("  1. 🎬 Process entire video")
+    print("  2. 🎞️  First N frames")
+    print("  3. ✂️  Custom range")
+    print("  4. ⏱️  Time-based selection")
+    
+    choice = input(f"\n{Colors.BLUE}Select option (1-4): {Colors.ENDC}").strip()
+    
+    if choice == '2':
+        n = input(f"  Number of frames to process: ").strip()
+        return {'max_frames': int(n), 'start_frame': 0}
+    elif choice == '3':
+        start = input(f"  Start frame (0-{total_frames}): ").strip()
+        end = input(f"  End frame (0-{total_frames}): ").strip()
+        start_frame = int(start) if start else 0
+        end_frame = int(end) if end else total_frames
+        return {'max_frames': end_frame - start_frame, 'start_frame': start_frame}
+    elif choice == '4':
+        start_time = input(f"  Start time (seconds): ").strip()
+        duration = input(f"  Duration (seconds): ").strip()
+        start_frame = int(float(start_time) * fps) if start_time else 0
+        max_frames = int(float(duration) * fps) if duration else -1
+        return {'max_frames': max_frames, 'start_frame': start_frame}
+    else:
+        return {'max_frames': -1, 'start_frame': 0}
+
+def get_output_options() -> Dict[str, Any]:
+    """Get output options from user"""
+    print_section("Output Options")
+    
+    options = {}
+    
+    # Output folder
+    default_folder = "./demo_output"
+    folder = input(f"Output folder [{Colors.CYAN}{default_folder}{Colors.ENDC}]: ").strip()
+    options['save_folder'] = folder if folder else default_folder
+    
+    # Output formats
+    print(f"\n{Colors.GREEN}Additional outputs:{Colors.ENDC}")
+    save_npz = input(f"  Save NPZ depth data (y/n) [n]: ").strip().lower()
+    options['save_npz'] = save_npz == 'y'
+    
+    save_exr = input(f"  Save EXR depth files (y/n) [n]: ").strip().lower()
+    options['save_exr'] = save_exr == 'y'
+    
+    # FPS
+    target_fps = input(f"\nTarget FPS ({Colors.CYAN}-1 for auto{Colors.ENDC}) [15]: ").strip()
+    options['target_fps'] = int(target_fps) if target_fps else 15
+    
+    return options
+
+def build_command(video_path: str, options: Dict[str, Any]) -> str:
+    """Build the command to run"""
+    cmd = ["python", "run.py", "--video-path", video_path]
+    
+    # Add all options
+    for key, value in options.items():
+        if isinstance(value, bool):
+            if value:
+                cmd.append(f"--{key.replace('_', '-')}")
+        else:
+            cmd.append(f"--{key.replace('_', '-')}")
+            cmd.append(str(value))
+    
+    return " ".join(cmd)
+
+def run_processing(command: str) -> bool:
+    """Run the processing command"""
+    print_section("Processing")
+    print(f"{Colors.CYAN}Command:{Colors.ENDC}")
+    print(f"  {command}")
+    print()
+    
+    confirm = input(f"{Colors.YELLOW}Start processing? (y/n): {Colors.ENDC}").strip().lower()
+    
+    if confirm != 'y':
+        print(f"{Colors.RED}Cancelled.{Colors.ENDC}")
+        return False
+    
+    print(f"\n{Colors.GREEN}Processing started...{Colors.ENDC}")
+    print("=" * 60)
+    
+    try:
+        # Run the command
+        process = subprocess.Popen(
+            command,
+            shell=True,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.STDOUT,
+            universal_newlines=True,
+            bufsize=1
+        )
+        
+        # Stream output
+        for line in process.stdout:
+            print(line, end='')
+        
+        process.wait()
+        
+        if process.returncode == 0:
+            print("=" * 60)
+            print(f"{Colors.GREEN}✓ Processing completed successfully!{Colors.ENDC}")
+            return True
+        else:
+            print(f"{Colors.RED}✗ Processing failed with error code {process.returncode}{Colors.ENDC}")
+            return False
+            
+    except KeyboardInterrupt:
+        print(f"\n{Colors.YELLOW}Processing interrupted by user.{Colors.ENDC}")
+        return False
+    except Exception as e:
+        print(f"{Colors.RED}Error: {e}{Colors.ENDC}")
+        return False
+
+def save_preset(options: Dict[str, Any]):
+    """Save current settings as a preset"""
+    print_section("Save Preset")
+    
+    name = input(f"Preset name: ").strip()
+    if not name:
+        return
+    
+    preset_file = f".depthcrafter_preset_{name}.json"
+    
+    with open(preset_file, 'w') as f:
+        json.dump(options, f, indent=2)
+    
+    print(f"{Colors.GREEN}✓ Preset saved to {preset_file}{Colors.ENDC}")
+
+def load_preset() -> Optional[Dict[str, Any]]:
+    """Load a saved preset"""
+    presets = glob.glob(".depthcrafter_preset_*.json")
+    
+    if not presets:
+        print(f"{Colors.YELLOW}No saved presets found.{Colors.ENDC}")
+        return None
+    
+    print_section("Load Preset")
+    for i, preset in enumerate(presets, 1):
+        name = preset.replace(".depthcrafter_preset_", "").replace(".json", "")
+        print(f"  {i}. {name}")
+    
+    choice = input(f"\n{Colors.BLUE}Select preset (1-{len(presets)}): {Colors.ENDC}").strip()
+    
+    if choice.isdigit():
+        idx = int(choice) - 1
+        if 0 <= idx < len(presets):
+            with open(presets[idx], 'r') as f:
+                return json.load(f)
+    
+    return None
+
+def main():
+    """Main interactive CLI loop"""
+    print_header()
+    print(f"{Colors.GREEN}Welcome to DepthCrafter Interactive CLI!{Colors.ENDC}")
+    print("This tool will help you create depth maps from your videos.\n")
+    
+    while True:
+        print(f"\n{Colors.CYAN}Main Menu:{Colors.ENDC}")
+        print("  1. 🎥 Process a video")
+        print("  2. 📁 Load preset")
+        print("  3. 📚 View examples")
+        print("  4. ❓ Help")
+        print("  5. 🚪 Exit")
+        
+        choice = input(f"\n{Colors.BLUE}Select option (1-5): {Colors.ENDC}").strip()
+        
+        if choice == '1':
+            # Process video workflow
+            video_path = select_video()
+            if not video_path:
+                continue
+            
+            # Get video info
+            info = get_video_info(video_path)
+            if info:
+                display_video_info(video_path, info)
+            else:
+                print(f"{Colors.YELLOW}Warning: Could not read video information{Colors.ENDC}")
+                info = {'frames': -1, 'fps': 30}
+            
+            # Get options
+            options = {}
+            
+            # Check if user wants to load preset
+            if input(f"\n{Colors.BLUE}Load preset? (y/n): {Colors.ENDC}").strip().lower() == 'y':
+                preset = load_preset()
+                if preset:
+                    options.update(preset)
+            
+            if not options:
+                options.update(get_processing_options())
+                options.update(get_frame_range(info))
+                options.update(get_output_options())
+                
+                # Offer to save as preset
+                if input(f"\n{Colors.BLUE}Save these settings as preset? (y/n): {Colors.ENDC}").strip().lower() == 'y':
+                    save_preset(options)
+            
+            # Build and run command
+            command = build_command(video_path, options)
+            success = run_processing(command)
+            
+            if success:
+                output_folder = options.get('save_folder', './demo_output')
+                print(f"\n{Colors.GREEN}Output files saved to: {output_folder}{Colors.ENDC}")
+            
+            input(f"\n{Colors.CYAN}Press Enter to continue...{Colors.ENDC}")
+            print_header()
+            
+        elif choice == '2':
+            # Load preset
+            preset = load_preset()
+            if preset:
+                print(f"{Colors.GREEN}Preset loaded successfully!{Colors.ENDC}")
+                print(json.dumps(preset, indent=2))
+            input(f"\n{Colors.CYAN}Press Enter to continue...{Colors.ENDC}")
+            print_header()
+            
+        elif choice == '3':
+            # View examples
+            print_section("Example Commands")
+            print(f"{Colors.GREEN}Quick test (50 frames):{Colors.ENDC}")
+            print("  python run.py --video-path video.mp4 --max-frames 50 --max-res 512")
+            print(f"\n{Colors.GREEN}High quality:{Colors.ENDC}")
+            print("  python run.py --video-path video.mp4 --max-res 1024 --num-inference-steps 10")
+            print(f"\n{Colors.GREEN}Custom range:{Colors.ENDC}")
+            print("  python run.py --video-path video.mp4 --start-frame 100 --max-frames 200")
+            input(f"\n{Colors.CYAN}Press Enter to continue...{Colors.ENDC}")
+            print_header()
+            
+        elif choice == '4':
+            # Help
+            print_section("Help")
+            print("DepthCrafter generates depth maps from videos using AI.")
+            print("\n📋 Requirements:")
+            print("  • Python 3.8+")
+            print("  • PyTorch 2.0+")
+            print("  • FFmpeg")
+            print("  • ~8GB GPU memory (512px) or ~26GB (1024px)")
+            print("\n🎯 Tips:")
+            print("  • Start with low resolution (512px) for testing")
+            print("  • Use frame limits to test on short segments")
+            print("  • Save presets for frequently used settings")
+            print("  • Check output folder for _vis.mp4 (visualization)")
+            input(f"\n{Colors.CYAN}Press Enter to continue...{Colors.ENDC}")
+            print_header()
+            
+        elif choice == '5':
+            print(f"\n{Colors.GREEN}Thank you for using DepthCrafter!{Colors.ENDC}")
+            break
+        else:
+            print(f"{Colors.RED}Invalid option. Please try again.{Colors.ENDC}")
+
+if __name__ == "__main__":
+    try:
+        # Check if ffmpeg is available
+        if shutil.which('ffmpeg') is None:
+            print(f"{Colors.RED}Error: FFmpeg is not installed or not in PATH{Colors.ENDC}")
+            print("Please install FFmpeg first: https://ffmpeg.org/download.html")
+            sys.exit(1)
+        
+        main()
+    except KeyboardInterrupt:
+        print(f"\n\n{Colors.YELLOW}Exiting...{Colors.ENDC}")
+    except Exception as e:
+        print(f"\n{Colors.RED}Fatal error: {e}{Colors.ENDC}")
+        sys.exit(1)
\ No newline at end of file
diff --git a/requirements.txt b/requirements.txt
index 9c696da..d388f21 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -7,5 +7,5 @@ accelerate==0.30.1
 xformers==0.0.20
 mediapy==1.2.0
 fire==0.6.0
-decord==0.6.0
+#decord==0.6.0
 OpenEXR==3.2.4
diff --git a/run.py b/run.py
index 7279cf1..11bb417 100644
--- a/run.py
+++ b/run.py
@@ -21,18 +21,26 @@ def __init__(
         unet = DiffusersUNetSpatioTemporalConditionModelDepthCrafter.from_pretrained(
             unet_path,
             low_cpu_mem_usage=True,
-            torch_dtype=torch.float16,
+            torch_dtype=torch.float32,
         )
         # load weights of other components from the provided checkpoint
         self.pipe = DepthCrafterPipeline.from_pretrained(
             pre_train_path,
             unet=unet,
-            torch_dtype=torch.float16,
-            variant="fp16",
+            torch_dtype=torch.float32,
         )
 
-        # for saving memory, we can offload the model to CPU, or even run the model sequentially to save more memory
-        if cpu_offload is not None:
+        # Determine the target device
+        # Note: MPS doesn't support Conv3D operations required by this model
+        # So we use CPU for Apple Silicon devices
+        if torch.cuda.is_available():
+            device = "cuda"
+        else:
+            device = "cpu"
+        
+        # Handle model placement and offloading
+        if cpu_offload is not None and torch.cuda.is_available():
+            # CPU offloading only works with CUDA
             if cpu_offload == "sequential":
                 # This will slow, but save more memory
                 self.pipe.enable_sequential_cpu_offload()
@@ -41,13 +49,14 @@ def __init__(
             else:
                 raise ValueError(f"Unknown cpu offload option: {cpu_offload}")
         else:
-            self.pipe.to("cuda")
-        # enable attention slicing and xformers memory efficient attention
-        try:
-            self.pipe.enable_xformers_memory_efficient_attention()
-        except Exception as e:
-            print(e)
-            print("Xformers is not enabled")
+            # For MPS or CPU, just move the entire model to the device
+            self.pipe.to(device)
+        # enable attention slicing and xformers memory efficient attention (only for CUDA)
+        if torch.cuda.is_available():
+            try:
+                self.pipe.enable_xformers_memory_efficient_attention()
+            except Exception as e:
+                print(f"Xformers not enabled: {e}")
         self.pipe.enable_attention_slicing()
 
     def infer(
@@ -66,15 +75,21 @@ def infer(
         track_time: bool = True,
         save_npz: bool = False,
         save_exr: bool = False,
+        max_frames: int = -1,
+        start_frame: int = 0,
     ):
         set_seed(seed)
 
+        # Use max_frames if specified, otherwise use process_length
+        frame_limit = max_frames if max_frames > 0 else process_length
+        
         frames, target_fps = read_video_frames(
             video,
-            process_length,
+            frame_limit,
             target_fps,
             max_res,
             dataset,
+            start_frame=start_frame,
         )
         # inference the depth map using the DepthCrafter pipeline
         with torch.inference_mode():
@@ -149,7 +164,8 @@ def run(
         )
         # clear the cache for the next video
         gc.collect()
-        torch.cuda.empty_cache()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
         return res_path[:2]
 
 
@@ -159,7 +175,7 @@ def main(
     unet_path: str = "tencent/DepthCrafter",
     pre_train_path: str = "stabilityai/stable-video-diffusion-img2vid-xt",
     process_length: int = -1,
-    cpu_offload: str = "model",
+    cpu_offload: str = None,
     target_fps: int = -1,
     seed: int = 42,
     num_inference_steps: int = 5,
@@ -171,6 +187,8 @@ def main(
     save_npz: bool = False,
     save_exr: bool = False,
     track_time: bool = False,
+    max_frames: int = -1,
+    start_frame: int = 0,
 ):
     depthcrafter_demo = DepthCrafterDemo(
         unet_path=unet_path,
@@ -195,10 +213,13 @@ def main(
             track_time=track_time,
             save_npz=save_npz,
             save_exr=save_exr,
+            max_frames=max_frames,
+            start_frame=start_frame,
         )
         # clear the cache for the next video
         gc.collect()
-        torch.cuda.empty_cache()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
 
 
 if __name__ == "__main__":
@@ -206,4 +227,6 @@ def main(
     # the most important arguments for memory saving are `cpu_offload`, `enable_xformers`, `max_res`, and `window_size`
     # the most important arguments for trade-off between quality and speed are
     # `num_inference_steps`, `guidance_scale`, and `max_res`
+    # Use `max_frames` to limit the number of frames to process (e.g., --max-frames 50)
+    # Use `start_frame` to start from a specific frame (e.g., --start-frame 100)
     Fire(main)