Skip to content

Distilled ltx could it work? #1

@vgabbo

Description

@vgabbo

Hello! Thank you so much for this repository, it makes it very simple and clear how to work with these quantizations in diffusers and use the models via python instead of comfyui.

I want to ask a doubt I have.

I am trying to load the distilled lora for ltx2, with this code:
distilled_lora_path = hf_hub_download(
repo_id="Lightricks/LTX-2",
filename="ltx-2-19b-distilled-lora-384.safetensors"
)

Under the assumption that, since the matrix multiplication gets done in bfloat16, it could be feasable.
I tried to adapt models/ltx2/scripts/t2v_sdnq_4bit_both_cpu_offload.py
It actually says I'm going OOM, I have 20GB VRAM, but I wanted to ask confirmation if you think this could work and be feasable.

Code I'm using:

import os
import torch
from diffusers import LTX2Pipeline, LTX2VideoTransformer3DModel
from diffusers.pipelines.ltx2.export_utils import encode_video
from sdnq import SDNQConfig  # noqa: F401
from sdnq.common import use_torch_compile as triton_is_available
from sdnq.loader import apply_sdnq_options_to_model
from transformers import Gemma3ForConditionalGeneration
from huggingface_hub import hf_hub_download

torch_dtype = torch.bfloat16

# Download del distilled LoRA
print("Downloading distilled LoRA...")
distilled_lora_path = hf_hub_download(
    repo_id="Lightricks/LTX-2",
    filename="ltx-2-19b-distilled-lora-384.safetensors"
)
print(f"LoRA downloaded to: {distilled_lora_path}")

text_encoder = Gemma3ForConditionalGeneration.from_pretrained(
    "Disty0/LTX-2-SDNQ-4bit-dynamic",
    subfolder="text_encoder",
    dtype=torch_dtype,
    device_map="cpu",
)

transformer = LTX2VideoTransformer3DModel.from_pretrained(
    "Disty0/LTX-2-SDNQ-4bit-dynamic",
    subfolder="transformer",
    torch_dtype=torch_dtype,
    device_map="cpu",
)

pipe = LTX2Pipeline.from_pretrained(
    "Lightricks/LTX-2", 
    transformer=transformer, 
    text_encoder=text_encoder, 
    torch_dtype=torch_dtype
)

# Carica il distilled LoRA nel transformer
print("Loading distilled LoRA into transformer...")
pipe.load_lora_weights(distilled_lora_path, adapter_name="distilled")
pipe.set_adapters(["distilled"], adapter_weights=[1.0])  # Puoi regolare il peso (default 1.0)

if triton_is_available and (torch.cuda.is_available() or torch.xpu.is_available()):
    pipe.transformer = apply_sdnq_options_to_model(pipe.transformer, use_quantized_matmul=True)
    pipe.text_encoder = apply_sdnq_options_to_model(pipe.text_encoder, use_quantized_matmul=True)

pipe.vae.enable_tiling()
pipe.enable_model_cpu_offload()

prompt = """Close-up shot of a woman's face in soft natural lighting. She looks directly into the camera with a warm, confident expression. She takes a breath, opens her mouth, and clearly says "Clara da oggi parla!" in Italian. The camera remains steady on her face throughout. Her eyes are bright and engaged, and she delivers the line with gentle emphasis and a slight smile. 50mm lens, shallow depth of field, natural skin tones."""

negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
frame_rate = 24.0

video, audio = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=768,
    height=512,
    num_frames=72,
    frame_rate=frame_rate,
    num_inference_steps=40,  # Con il LoRA distillato potresti ridurre questo numero
    guidance_scale=4.0,
    generator=torch.Generator("cuda").manual_seed(42),
    output_type="np",
    return_dict=False,
)

video = (video * 255).round().astype("uint8")
video = torch.from_numpy(video)

if not os.path.exists("./outputs/ltx2"):
    os.makedirs("./outputs/ltx2")

encode_video(
    video[0],
    fps=frame_rate,
    audio=audio[0].float().cpu(),
    audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
    output_path="./outputs/ltx2/t2v_sdnq-4bit-distilled.mp4",
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions