Skip to content

NVIDIA's Optimization on ComfyUI - Built for Everone with Custom Code

License

Notifications You must be signed in to change notification settings

Lumi-node/consumer-gpu-video-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฌ Consumer GPU Video Generation

Run 67GB video models on 32GB consumer GPUs

NVIDIA-style optimizations without ComfyUI dependency

Python 3.10+ PyTorch 2.0+ CUDA 12.0+ MIT License

RTX 5090 Tested RTX 4090 Compatible


๐ŸŽฏ The Problem

Model Original VRAM Your GPU
LTX-2 19B 67 GB โŒ Won't fit
Wan 2.2 5B 25 GB โš ๏ธ Barely fits

Large video generation models require datacenter GPUs. Consumer cards like RTX 4090/5090 can't run them... until now.


โœจ The Solution

Before (Original)

โŒ LTX-2:    67 GB VRAM
โŒ Wan 2.2:  25 GB VRAM

After (INT4 Optimized)

โœ… LTX-2:    22 GB VRAM  (-67%)
โœ… Wan 2.2:  16 GB VRAM  (-36%)

75% VRAM reduction via INT4 quantization with minimal quality loss.


๐Ÿš€ Key Features

๐Ÿ’พ

75% Less VRAM
INT4 quantization shrinks models to fit consumer GPUs

๐Ÿ”“

No ComfyUI
Standalone Python - use in any project

โšก

RTX 5090 Ready
Tested on latest Blackwell architecture

๐ŸŽจ

Simple API
3 lines of code to generate video

๐Ÿ“Š Benchmarks

Tested on NVIDIA RTX 5090 (32GB) with CUDA 12.8

Model Original Optimized Resolution Speed
Wan 2.2 TI2V-5B 25 GB 16 GB 1280ร—704 ~50s
LTX-2 19B 67 GB 22 GB 640ร—448 ~60s
๐Ÿ“ˆ Detailed VRAM Breakdown

Wan 2.2 TI2V-5B

Component Original INT4
T5 Text Encoder 11 GB 11 GB
VAE 3 GB 3 GB
DiT Transformer 11 GB 3 GB
Peak 25 GB 16 GB

LTX-2 19B

Component Original INT4
Gemma-3 Text Encoder 27 GB 8 GB
Transformer 40 GB 10 GB
VAE + Audio 5 GB 5 GB
Peak 67 GB 22 GB

๐Ÿ› ๏ธ Installation

Requirements

  • GPU: RTX 4090, RTX 5090, A6000, or similar (24-32GB VRAM)
  • CUDA: 12.0+
  • Python: 3.10+

Quick Start

# Clone repository
git clone https://github.com/lumi-node/consumer-gpu-video-gen
cd consumer-gpu-video-gen

# Install dependencies
pip install -r requirements.txt

# Download Wan 2.2 (recommended for most users)
git clone https://github.com/Wan-Video/Wan2.2
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./Wan2.2-TI2V-5B

๐Ÿ’ป Usage

Command Line

# Generate with Wan 2.2
python generate.py --model wan22 \
    --prompt "A fluffy cat walking through a sunny garden" \
    --checkpoint ./Wan2.2-TI2V-5B \
    --wan-repo ./Wan2.2

Python API

from models.wan22 import Wan22Pipeline

# Load with INT4 optimization
pipeline = Wan22Pipeline(checkpoint_dir="./Wan2.2-TI2V-5B", wan_repo_path="./Wan2.2")
pipeline.load(quantization="int4")

# Generate video
video = pipeline.generate("A cat playing in a garden")
pipeline.save_video(video, "output.mp4")
๐Ÿ“‹ All CLI Options
--model, -m      Model: wan22 or ltx2 (required)
--checkpoint, -c Path to model checkpoint (required)
--wan-repo       Path to Wan2.2 repo (required for wan22)
--prompt, -p     Text prompt (required)
--output, -o     Output path (default: auto-generated)

--frames         Number of frames (default: 33)
--steps          Diffusion steps (default: 30)
--guidance       Guidance scale (default: 5.0)
--seed           Random seed (default: random)
--size           landscape or portrait (default: landscape)
--fps            Output FPS (default: 24)

--quantization   int4, int8, or none (default: int4)

๐Ÿ”ฌ How It Works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Standard Loading                          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                  โ”‚
โ”‚  โ”‚ T5 Enc  โ”‚ +โ”‚   VAE   โ”‚ +โ”‚ Transformer โ”‚ = 67GB โŒ        โ”‚
โ”‚  โ”‚  27GB   โ”‚  โ”‚   5GB   โ”‚  โ”‚    40GB     โ”‚                  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  INT4 Quantized Loading                      โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                  โ”‚
โ”‚  โ”‚ T5 Enc  โ”‚ +โ”‚   VAE   โ”‚ +โ”‚ Transformer โ”‚ = 22GB โœ…        โ”‚
โ”‚  โ”‚   8GB   โ”‚  โ”‚   5GB   โ”‚  โ”‚    10GB     โ”‚   (INT4)         โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The Magic: quanto INT4 Quantization

  • 16-bit โ†’ 4-bit weights = 75% smaller
  • No retraining required - post-training quantization
  • Minimal quality loss - optimized dequantization at inference

Smart Memory Management

  1. Load models sequentially
  2. Quantize before moving to GPU
  3. Offload unused models during VAE decode
  4. Strategic garbage collection

๐ŸŽฎ GPU Compatibility

GPU VRAM Wan 2.2 LTX-2
RTX 5090 32 GB โœ… Full โœ… Reduced res
RTX 4090 24 GB โœ… Full โš ๏ธ Tight
RTX 4080 16 GB โš ๏ธ Limited โŒ
RTX 3090 24 GB โœ… Full โš ๏ธ Tight
A6000 48 GB โœ… Full โœ… Full

๐Ÿค Contributing

Contributions welcome! Areas of interest:

  • Additional model support (CogVideoX, etc.)
  • FP8 quantization for Blackwell GPUs
  • Web UI interface
  • Audio generation for LTX-2

๐Ÿ“š Acknowledgments


๐Ÿ“„ License

MIT License - see LICENSE file.


Built with โค๏ธ for the open-source AI community

About

NVIDIA's Optimization on ComfyUI - Built for Everone with Custom Code

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages