Add install.sh script for automated setup #53

johndpope · 2025-12-26T21:57:51Z

Initializes CUTLASS git submodule
Validates CUDA toolkit and PyTorch environment
Builds CUDA extensions for sm_80, sm_89, sm_90, sm_120a (Blackwell)
Includes --clean option for fresh builds
Verifies installation on completion

🤖 Generated with Claude Code

- Initializes CUTLASS git submodule - Validates CUDA toolkit and PyTorch environment - Builds CUDA extensions for sm_80, sm_89, sm_90, sm_120a (Blackwell) - Includes --clean option for fresh builds - Verifies installation on completion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Check for Miniconda, offer to install if missing - Create conda environment with Python 3.12 - Install PyTorch nightly with CUDA 13.0 (for RTX 5090/Blackwell) - Install psutil dependency - Initialize CUTLASS git submodule - Build TurboDiffusion CUDA extensions - Install SpargeAttn for sparse attention optimization - Add GPU info verification at end Target: media-msi.covershot.app (RTX 5090) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

@reboot

scripts/comfyui-turbo.sh: - Configurable paths for conda, ComfyUI, CUDA - Start/stop/status commands - Logging to /tmp/comfyui_turbo.log - Ready for cron @reboot setup Usage: ./comfyui-turbo.sh # Start ./comfyui-turbo.sh --stop # Stop ./comfyui-turbo.sh --status # Check status 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add scripts/cache_t5.py to pre-cache T5 embeddings (saves ~11GB VRAM) - Add --cached_embedding and --skip_t5 args to wan2.2_i2v_infer.py - Update install.sh with module symlinks for rcm/imaginaire/ops/SLA - Fix spas_sage_attn import name in install verification This enables 2-pass inference: cache embeddings first, then run inference without loading the 11GB T5 model. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Document memory optimization with pre-cached T5 embeddings - Add memory comparison table (30GB+ vs ~18GB peak VRAM) - Include step-by-step instructions for cache_t5.py usage - Note: cached embedding is ~4MB vs 11GB T5 model 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Auto-detect GPU compute capability - Patch SpargeAttn setup.py to add sm_120 support for RTX 5090 - Build with correct TORCH_CUDA_ARCH_LIST 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

johndpope · 2025-12-26T23:48:17Z

Offloads DiT models before VAE decode to free VRAM. Enables 720p 81-frame generation on 32GB GPUs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

whx1003 · 2025-12-28T09:14:58Z

I think the UMT5 model is already released after the text embeddings are computed, and we did observe a decrease in memory usage during our tests.

TurboDiffusion/turbodiffusion/inference/wan2.2_i2v_infer.py

Lines 85 to 88 in c960373

    
           log.info(f"Computing embedding for prompt: {args.prompt}") 
        
           with torch.no_grad(): 
        
               text_emb = get_umt5_embedding(checkpoint_path=args.text_encoder_path, prompts=args.prompt).to(**tensor_kwargs) 
        
           clear_umt5_memory()

johndpope mentioned this pull request Dec 26, 2025

5090推理OOM #42

Open

johndpope and others added 2 commits December 27, 2025 09:05

Remove server-specific reference from install.sh

290baba

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

johndpope mentioned this pull request Dec 26, 2025

Running a 5090 and getting OOM errors on 720p resolution #52

Open

johndpope and others added 5 commits December 27, 2025 09:27

Remove comfyui-turbo.sh (belongs in PresidentialDilema-FastApi)

b0b1cdf

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add --offload_dit flag for high-res/long video generation

a145ca2

Offloads DiT models before VAE decode to free VRAM. Enables 720p 81-frame generation on 32GB GPUs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

zhengkw18 self-requested a review December 27, 2025 08:46

swingler mentioned this pull request Dec 30, 2025

Runtime Error #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add install.sh script for automated setup #53

Add install.sh script for automated setup #53

johndpope commented Dec 26, 2025

Uh oh!

johndpope commented Dec 26, 2025 •

edited

Loading

Uh oh!

whx1003 commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add install.sh script for automated setup #53

Are you sure you want to change the base?

Add install.sh script for automated setup #53

Conversation

johndpope commented Dec 26, 2025

Uh oh!

johndpope commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whx1003 commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johndpope commented Dec 26, 2025 •

edited

Loading