-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Hello! Thank you so much for this repository, it makes it very simple and clear how to work with these quantizations in diffusers and use the models via python instead of comfyui.
I want to ask a doubt I have.
I am trying to load the distilled lora for ltx2, with this code:
distilled_lora_path = hf_hub_download(
repo_id="Lightricks/LTX-2",
filename="ltx-2-19b-distilled-lora-384.safetensors"
)
Under the assumption that, since the matrix multiplication gets done in bfloat16, it could be feasable.
I tried to adapt models/ltx2/scripts/t2v_sdnq_4bit_both_cpu_offload.py
It actually says I'm going OOM, I have 20GB VRAM, but I wanted to ask confirmation if you think this could work and be feasable.
Code I'm using:
import os
import torch
from diffusers import LTX2Pipeline, LTX2VideoTransformer3DModel
from diffusers.pipelines.ltx2.export_utils import encode_video
from sdnq import SDNQConfig # noqa: F401
from sdnq.common import use_torch_compile as triton_is_available
from sdnq.loader import apply_sdnq_options_to_model
from transformers import Gemma3ForConditionalGeneration
from huggingface_hub import hf_hub_download
torch_dtype = torch.bfloat16
# Download del distilled LoRA
print("Downloading distilled LoRA...")
distilled_lora_path = hf_hub_download(
repo_id="Lightricks/LTX-2",
filename="ltx-2-19b-distilled-lora-384.safetensors"
)
print(f"LoRA downloaded to: {distilled_lora_path}")
text_encoder = Gemma3ForConditionalGeneration.from_pretrained(
"Disty0/LTX-2-SDNQ-4bit-dynamic",
subfolder="text_encoder",
dtype=torch_dtype,
device_map="cpu",
)
transformer = LTX2VideoTransformer3DModel.from_pretrained(
"Disty0/LTX-2-SDNQ-4bit-dynamic",
subfolder="transformer",
torch_dtype=torch_dtype,
device_map="cpu",
)
pipe = LTX2Pipeline.from_pretrained(
"Lightricks/LTX-2",
transformer=transformer,
text_encoder=text_encoder,
torch_dtype=torch_dtype
)
# Carica il distilled LoRA nel transformer
print("Loading distilled LoRA into transformer...")
pipe.load_lora_weights(distilled_lora_path, adapter_name="distilled")
pipe.set_adapters(["distilled"], adapter_weights=[1.0]) # Puoi regolare il peso (default 1.0)
if triton_is_available and (torch.cuda.is_available() or torch.xpu.is_available()):
pipe.transformer = apply_sdnq_options_to_model(pipe.transformer, use_quantized_matmul=True)
pipe.text_encoder = apply_sdnq_options_to_model(pipe.text_encoder, use_quantized_matmul=True)
pipe.vae.enable_tiling()
pipe.enable_model_cpu_offload()
prompt = """Close-up shot of a woman's face in soft natural lighting. She looks directly into the camera with a warm, confident expression. She takes a breath, opens her mouth, and clearly says "Clara da oggi parla!" in Italian. The camera remains steady on her face throughout. Her eyes are bright and engaged, and she delivers the line with gentle emphasis and a slight smile. 50mm lens, shallow depth of field, natural skin tones."""
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
frame_rate = 24.0
video, audio = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=768,
height=512,
num_frames=72,
frame_rate=frame_rate,
num_inference_steps=40, # Con il LoRA distillato potresti ridurre questo numero
guidance_scale=4.0,
generator=torch.Generator("cuda").manual_seed(42),
output_type="np",
return_dict=False,
)
video = (video * 255).round().astype("uint8")
video = torch.from_numpy(video)
if not os.path.exists("./outputs/ltx2"):
os.makedirs("./outputs/ltx2")
encode_video(
video[0],
fps=frame_rate,
audio=audio[0].float().cpu(),
audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
output_path="./outputs/ltx2/t2v_sdnq-4bit-distilled.mp4",
)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels