thu-ml · MrEdwards007 · Jan 1, 2026 · Jan 1, 2026 · Jan 1, 2026 · Jan 1, 2026
diff --git a/TurboDiffusion_Studio.md b/TurboDiffusion_Studio.md
@@ -0,0 +1,123 @@
+
+## 🚀 Scripts & Inference
+
+This repository contains optimized inference engines for the Wan2.1 and Wan2.2 models, specifically tuned for high-resolution output and robust memory management on consumer hardware.
+
+### 🎥 Inference Engines
+
+| Script | Function | Key Features |
+| --- | --- | --- |
+| **`wan2.2_i2v_infer.py`** | **Image-to-Video** | **Tiered Failover System**: Automatic recovery from OOM errors.<br>
+
+<br> **Intelligent Model Switching**: Transitions between High and Low Noise models based on step boundaries.<br>
+
+<br> **Tiled Processing**: Uses 4-chunk tiled encoding/decoding for 720p+ stability. |
+| **`wan2.1_t2v_infer.py`** | **Text-to-Video** | **Hardware Auto-Detection**: Automatically selects TF32, BF16, or FP16 based on GPU capabilities.<br>
+
+<br> **Quantization Safety**: Force-disables `torch.compile` for quantized models to prevent graph-break OOMs.<br>
+
+<br> **3-Tier Recovery**: Escalates from GPU ➔ Checkpointing ➔ Manual CPU Offloading if memory is exceeded. |
+
+### 🛠️ Utilities
+
+* **`cache_t5.py`**
+* **Purpose**: Pre-computes and saves T5 text embeddings to disk.
+* **VRAM Benefit**: Eliminates the need to load the **11GB T5 encoder** during the main inference run, allowing 14B models to fit on GPUs with lower VRAM.
+* **Usage**: Run this first to generate a `.pt` file, then pass it to the inference scripts using the `--cached_embedding` flag.
+
+
+---
+
+## 🚀 Getting Started with TurboDiffusion
+
+To run the large 14B models on consumer GPUs, it is recommended to use the **T5 Caching** workflow. This offloads the 11GB text encoder from VRAM, leaving more space for the DiT model and high-resolution video decoding.
+
+### **Step 1: Environment Setup**
+
+Ensure your project structure is organized as follows:
+
+* **Root**: `/your/path/to/TurboDiffusion`
+* **Checkpoints**: Place your `.pth` models in the `checkpoints/` directory.
+* **Output**: Generated videos and metadata will be saved to `output/`.
+
+### **Step 2: The Two Ways to Cache T5**
+
+#### **Option A: Manual Pre-Caching (Recommended for Batching)**
+
+If you have a list of prompts you want to use frequently, use the standalone utility:
+
+```bash
+python turbodiffusion/inference/cache_t5.py --prompt "Your descriptive prompt here" --output cached_t5_embeddings.pt
+
+```
+
+This saves the processed text into a small `.pt` file, allowing the inference scripts to "skip" the heavy T5 model entirely.
+
+#### **Option B: Automatic Caching via Web UI**
+
+For a more streamlined experience, use the **TurboDiffusion Studio**:
+
+1. Launch the UI: `python turbo_diffusion_t5_cache_optimize_v6.py`.
+2. Open the **Precision & Advanced Settings** accordion.
+3. Check **Use Cached T5 Embeddings (Auto-Run)**.
+4. When you click generate, the UI will automatically run the caching script first, clear the T5 model from memory, and then start the video generation.
+
+### **Step 3: Running Inference**
+
+Once your UI is launched and caching is configured:
+
+1. **Select Mode**: Choose between **Text-to-Video** (Wan2.1) or **Image-to-Video** (Wan2.2).
+2. **Apply Quantization**: For 24GB VRAM GPUs (like the RTX 3090/4090/5090), ensure **Enable --quant_linear (8-bit)** is checked to avoid OOM errors.
+3. **Monitor Hardware**: Watch the **Live GPU Monitor** at the top of the UI to track real-time VRAM usage during the sampling process.
+4. **Retrieve Results**: Your video and its reproduction metadata (containing the exact CLI command used) will appear in the `output/` gallery.
+
+
+---
+
+## 🖥️ TurboDiffusion Studio (Web UI)
+
+The `turbo_diffusion_t5_cache_optimize_v6.py` script provides a high-performance, unified **Gradio-based Web interface** for both Text-to-Video and Image-to-Video generation. It serves as a centralized "Studio" dashboard that automates complex environment setups and memory optimizations.
+
+### **Key Features**
+
+| Feature | Description |
+| --- | --- |
+| **Unified Interface** | Toggle between **Text-to-Video (Wan2.1)** and **Image-to-Video (Wan2.2)** workflows within a single dashboard. |
+| **Real-time GPU Monitor** | Native PyTorch-based VRAM monitoring that displays current memory usage and hardware status directly in the UI. |
+| **Auto-Cache T5 Integration** | Automatically runs the `cache_t5.py` utility before inference to offload the 11GB text encoder, significantly reducing peak VRAM usage. |
+| **Frame Sanitization** | Automatically enforces the **4n + 1 rule** required by the Wan VAE to prevent kernel crashes during decoding. |
+| **Reproduction Metadata** | Every generated video automatically saves a matching `_metadata.txt` file containing the exact CLI command and environment variables needed to reproduce the result. |
+| **Live Console Output** | Pipes real-time CLI logs and progress bars directly into a "Live Console" window in the web browser. |
+
+### **Advanced Controls**
+
+The UI exposes granular controls for technical users:
+
+* **Precision & Quantization:** Toggle 8-bit `--quant_linear` mode for low-VRAM operation.
+* **Attention Tuning:** Switch between `sagesla`, `sla`, and `original` attention mechanisms.
+* **Adaptive I2V:** Enable adaptive resolution and ODE solvers for Image-to-Video workflows.
+* **Integrated Gallery:** Browse and view your output history directly within the `output/` directory.
+
+---
+
+## 🛠️ Usage
+
+To launch the studio:
+
+```bash
+python turbo_diffusion_t5_cache_optimize_v6.py
+
+```
+
+> **Note:** The script defaults to `/your/path/to/TurboDiffusion`as the project root. Ensure your local paths are configured accordingly in the **System Setup** section of the code.
+
+
+---
+
+## 💳 Credits & Acknowledgments
+
+If you utilize, share, or build upon these optimized scripts, please include the following acknowledgments:
+
+* **Optimization & Development**: Co-developed by **Waverly Edwards** and **Google Gemini**.
+* **T5 Caching Logic**: Original concept and utility implementation by **John D. Pope**.
+* **Base Framework**: Built upon the NVIDIA Imaginaire and Wan-Video research.
diff --git a/turbodiffusion/inference/cache_t5.py b/turbodiffusion/inference/cache_t5.py
@@ -0,0 +1,117 @@
+#!/usr/bin/env python
+# -----------------------------------------------------------------------------------------
+# T5 EMBEDDING CACHE UTILITY
+#
+# Acknowledgments:
+#   - Work and creativity of: John D. Pope
+#
+# Description:
+#   Pre-computes text embeddings to allow running inference on GPUs with limited VRAM
+#   by removing the need to keep the 11GB T5 encoder loaded in memory.
+# -----------------------------------------------------------------------------------------
+"""
+Pre-cache T5 text embeddings to avoid loading the 11GB model during inference.
+
+Usage:
+    # Cache a single prompt
+    python scripts/cache_t5.py --prompt "slow head turn, cinematic" --output cached_embeddings.pt
+
+    # Cache multiple prompts from file
+    python scripts/cache_t5.py --prompts_file prompts.txt --output cached_embeddings.pt
+
+Then use with inference:
+    python turbodiffusion/inference/wan2.2_i2v_infer.py \
+        --cached_embedding cached_embeddings.pt \
+        --skip_t5 \
+        ...
+"""
+import os
+import sys
+import argparse
+import torch
+
+# Add repo root to path for imports
+SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
+REPO_ROOT = os.path.dirname(SCRIPT_DIR)
+sys.path.insert(0, REPO_ROOT)
+
+def main():
+    parser = argparse.ArgumentParser(description="Pre-cache T5 text embeddings")
+    parser.add_argument("--prompt", type=str, default=None, help="Single prompt to cache")
+    parser.add_argument("--prompts_file", type=str, default=None, help="File with prompts (one per line)")
+    parser.add_argument("--text_encoder_path", type=str,
+                        default="/media/2TB/ComfyUI/models/text_encoders/models_t5_umt5-xxl-enc-bf16.pth",
+                        help="Path to the umT5 text encoder")
+    parser.add_argument("--output", type=str, default="cached_t5_embeddings.pt",
+                        help="Output path for cached embeddings")
+    parser.add_argument("--device", type=str, default="cuda",
+                        help="Device to use for encoding (cuda is faster, memory freed after)")
+    args = parser.parse_args()
+
+    # Collect prompts
+    prompts = []
+    if args.prompt:
+        prompts.append(args.prompt)
+    if args.prompts_file and os.path.exists(args.prompts_file):
+        with open(args.prompts_file, 'r') as f:
+            prompts.extend([line.strip() for line in f if line.strip()])
+
+    if not prompts:
+        print("Error: Provide --prompt or --prompts_file")
+        sys.exit(1)
+
+    print(f"Caching embeddings for {len(prompts)} prompt(s)")
+    print(f"Text encoder: {args.text_encoder_path}")
+    print(f"Device: {args.device}")
+    print()
+
+    # Import after path setup
+    from rcm.utils.umt5 import get_umt5_embedding, clear_umt5_memory
+
+    cache_data = {
+        'prompts': prompts,
+        'embeddings': [],
+        'text_encoder_path': args.text_encoder_path,
+    }
+
+    with torch.no_grad():
+        for i, prompt in enumerate(prompts):
+            print(f"[{i+1}/{len(prompts)}] Encoding: '{prompt[:60]}...' " if len(prompt) > 60 else f"[{i+1}/{len(prompts)}] Encoding: '{prompt}'")
+
+            # Get embedding (loads T5 if not already loaded)
+            embedding = get_umt5_embedding(
+                checkpoint_path=args.text_encoder_path,
+                prompts=prompt
+            )
+
+            # Move to CPU for storage
+            cache_data['embeddings'].append({
+                'prompt': prompt,
+                'embedding': embedding.cpu(),
+                'shape': list(embedding.shape),
+            })
+
+            print(f"    Shape: {embedding.shape}, dtype: {embedding.dtype}")
+
+    # Clear T5 from memory
+    print("\nClearing T5 from memory...")
+    clear_umt5_memory()
+    torch.cuda.empty_cache()
+
+    # Save cache
+    print(f"\nSaving to: {args.output}")
+    torch.save(cache_data, args.output)
+
+    # Summary
+    file_size = os.path.getsize(args.output) / (1024 * 1024)
+    print(f"Done! Cache file size: {file_size:.2f} MB")
+    print()
+    print("Usage:")
+    print(f"  python turbodiffusion/inference/wan2.2_i2v_infer.py \\")
+    print(f"      --cached_embedding {args.output} \\")
+    print(f"      --skip_t5 \\")
+    print(f"      ... (other args)")
+
+
+if __name__ == "__main__":
+    main()