Skip to content

Conversation

@johndpope
Copy link

  • Initializes CUTLASS git submodule
  • Validates CUDA toolkit and PyTorch environment
  • Builds CUDA extensions for sm_80, sm_89, sm_90, sm_120a (Blackwell)
  • Includes --clean option for fresh builds
  • Verifies installation on completion

🤖 Generated with Claude Code

- Initializes CUTLASS git submodule
- Validates CUDA toolkit and PyTorch environment
- Builds CUDA extensions for sm_80, sm_89, sm_90, sm_120a (Blackwell)
- Includes --clean option for fresh builds
- Verifies installation on completion

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@johndpope johndpope mentioned this pull request Dec 26, 2025
johndpope and others added 2 commits December 27, 2025 09:05
- Check for Miniconda, offer to install if missing
- Create conda environment with Python 3.12
- Install PyTorch nightly with CUDA 13.0 (for RTX 5090/Blackwell)
- Install psutil dependency
- Initialize CUTLASS git submodule
- Build TurboDiffusion CUDA extensions
- Install SpargeAttn for sparse attention optimization
- Add GPU info verification at end

Target: media-msi.covershot.app (RTX 5090)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
johndpope and others added 5 commits December 27, 2025 09:27
scripts/comfyui-turbo.sh:
- Configurable paths for conda, ComfyUI, CUDA
- Start/stop/status commands
- Logging to /tmp/comfyui_turbo.log
- Ready for cron @reboot setup

Usage:
  ./comfyui-turbo.sh           # Start
  ./comfyui-turbo.sh --stop    # Stop
  ./comfyui-turbo.sh --status  # Check status

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add scripts/cache_t5.py to pre-cache T5 embeddings (saves ~11GB VRAM)
- Add --cached_embedding and --skip_t5 args to wan2.2_i2v_infer.py
- Update install.sh with module symlinks for rcm/imaginaire/ops/SLA
- Fix spas_sage_attn import name in install verification

This enables 2-pass inference: cache embeddings first, then run inference
without loading the 11GB T5 model.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Document memory optimization with pre-cached T5 embeddings
- Add memory comparison table (30GB+ vs ~18GB peak VRAM)
- Include step-by-step instructions for cache_t5.py usage
- Note: cached embedding is ~4MB vs 11GB T5 model

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Auto-detect GPU compute capability
- Patch SpargeAttn setup.py to add sm_120 support for RTX 5090
- Build with correct TORCH_CUDA_ARCH_LIST

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@johndpope
Copy link
Author

johndpope commented Dec 26, 2025

Screenshot 2025-12-27 at 7 47 20 AM Screenshot 2025-12-27 at 8 38 52 AM

Offloads DiT models before VAE decode to free VRAM.
Enables 720p 81-frame generation on 32GB GPUs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@zhengkw18 zhengkw18 self-requested a review December 27, 2025 08:46
@whx1003
Copy link
Collaborator

whx1003 commented Dec 28, 2025

I think the UMT5 model is already released after the text embeddings are computed, and we did observe a decrease in memory usage during our tests.

log.info(f"Computing embedding for prompt: {args.prompt}")
with torch.no_grad():
text_emb = get_umt5_embedding(checkpoint_path=args.text_encoder_path, prompts=args.prompt).to(**tensor_kwargs)
clear_umt5_memory()

@swingler swingler mentioned this pull request Dec 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants