Run 67GB video models on 32GB consumer GPUs
NVIDIA-style optimizations without ComfyUI dependency
| Model | Original VRAM | Your GPU |
|---|---|---|
| LTX-2 19B | 67 GB | โ Won't fit |
| Wan 2.2 5B | 25 GB |
Large video generation models require datacenter GPUs. Consumer cards like RTX 4090/5090 can't run them... until now.
|
|
75% VRAM reduction via INT4 quantization with minimal quality loss.
|
75% Less VRAM INT4 quantization shrinks models to fit consumer GPUs |
No ComfyUI Standalone Python - use in any project |
RTX 5090 Ready Tested on latest Blackwell architecture |
Simple API 3 lines of code to generate video |
Tested on NVIDIA RTX 5090 (32GB) with CUDA 12.8
| Model | Original | Optimized | Resolution | Speed |
|---|---|---|---|---|
| Wan 2.2 TI2V-5B | 25 GB | 16 GB | 1280ร704 | ~50s |
| LTX-2 19B | 67 GB | 22 GB | 640ร448 | ~60s |
๐ Detailed VRAM Breakdown
| Component | Original | INT4 |
|---|---|---|
| T5 Text Encoder | 11 GB | 11 GB |
| VAE | 3 GB | 3 GB |
| DiT Transformer | 11 GB | 3 GB |
| Peak | 25 GB | 16 GB |
| Component | Original | INT4 |
|---|---|---|
| Gemma-3 Text Encoder | 27 GB | 8 GB |
| Transformer | 40 GB | 10 GB |
| VAE + Audio | 5 GB | 5 GB |
| Peak | 67 GB | 22 GB |
- GPU: RTX 4090, RTX 5090, A6000, or similar (24-32GB VRAM)
- CUDA: 12.0+
- Python: 3.10+
# Clone repository
git clone https://github.com/lumi-node/consumer-gpu-video-gen
cd consumer-gpu-video-gen
# Install dependencies
pip install -r requirements.txt
# Download Wan 2.2 (recommended for most users)
git clone https://github.com/Wan-Video/Wan2.2
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./Wan2.2-TI2V-5B# Generate with Wan 2.2
python generate.py --model wan22 \
--prompt "A fluffy cat walking through a sunny garden" \
--checkpoint ./Wan2.2-TI2V-5B \
--wan-repo ./Wan2.2from models.wan22 import Wan22Pipeline
# Load with INT4 optimization
pipeline = Wan22Pipeline(checkpoint_dir="./Wan2.2-TI2V-5B", wan_repo_path="./Wan2.2")
pipeline.load(quantization="int4")
# Generate video
video = pipeline.generate("A cat playing in a garden")
pipeline.save_video(video, "output.mp4")๐ All CLI Options
--model, -m Model: wan22 or ltx2 (required)
--checkpoint, -c Path to model checkpoint (required)
--wan-repo Path to Wan2.2 repo (required for wan22)
--prompt, -p Text prompt (required)
--output, -o Output path (default: auto-generated)
--frames Number of frames (default: 33)
--steps Diffusion steps (default: 30)
--guidance Guidance scale (default: 5.0)
--seed Random seed (default: random)
--size landscape or portrait (default: landscape)
--fps Output FPS (default: 24)
--quantization int4, int8, or none (default: int4)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Standard Loading โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ T5 Enc โ +โ VAE โ +โ Transformer โ = 67GB โ โ
โ โ 27GB โ โ 5GB โ โ 40GB โ โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ INT4 Quantized Loading โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ T5 Enc โ +โ VAE โ +โ Transformer โ = 22GB โ
โ
โ โ 8GB โ โ 5GB โ โ 10GB โ (INT4) โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- 16-bit โ 4-bit weights = 75% smaller
- No retraining required - post-training quantization
- Minimal quality loss - optimized dequantization at inference
- Load models sequentially
- Quantize before moving to GPU
- Offload unused models during VAE decode
- Strategic garbage collection
| GPU | VRAM | Wan 2.2 | LTX-2 |
|---|---|---|---|
| RTX 5090 | 32 GB | โ Full | โ Reduced res |
| RTX 4090 | 24 GB | โ Full | |
| RTX 4080 | 16 GB | โ | |
| RTX 3090 | 24 GB | โ Full | |
| A6000 | 48 GB | โ Full | โ Full |
Contributions welcome! Areas of interest:
- Additional model support (CogVideoX, etc.)
- FP8 quantization for Blackwell GPUs
- Web UI interface
- Audio generation for LTX-2
- Alibaba Wan Team - Wan 2.2 model
- Lightricks - LTX-2 model
- Hugging Face - quanto quantization
- NVIDIA - Optimization inspiration from ComfyUI implementations
MIT License - see LICENSE file.
Built with โค๏ธ for the open-source AI community