Skip to content

appautomaton/MLX-GenAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLX-GenAI

Accelerated LTX-2.3 (22B) text-to-video+audio generation on Apple Silicon using MLX with quantized inference.

Generate 5-10 second videos with synchronized audio from text prompts or input images, running entirely on-device.

Setup

git clone https://github.com/appautomaton/MLX-GenAI.git
cd MLX-GenAI

Install uv if you haven't, then download the model weights (see Model Weights below).

Dependencies are installed automatically on first uv run.

Quick Start

# Text-to-video
uv run python generate.py "A serene mountain lake at sunrise, golden light reflecting off calm water as thin mist drifts across the surface. Tripod-locked camera, live action, 4K."

# Image-to-video
uv run python generate.py -i input/photo.jpg "The person slowly turns and smiles at the camera"

# With options
uv run python generate.py -f 121 -b 8 --upscale "your prompt here"

Output is saved to output/<timestamp>/ with video.mp4, audio.wav, and individual frames.

Features

  • Text-to-video (T2V) and image-to-video (I2V) generation
  • Joint audio+video through 48-layer DiT transformer (22B params)
  • 8-bit / 4-bit quantized inference via MLX quantized_matmul
  • 8-step distilled Euler diffusion (LoRA-fused)
  • 48kHz stereo audio (BigVGAN v2 vocoder + bandwidth extension)
  • Optional 2x spatial upscaler
  • Aspect-ratio-aware resolution snapping for I2V

Requirements

  • Apple Silicon Mac (M-series, M1 or later)
  • macOS with Metal support
  • Python 3.12+, uv
  • ffmpeg
  • ~14 GB unified memory (8-bit) or ~10 GB (4-bit)

Model Weights

Download from HuggingFace and place under models/:

Model Source Path
LTX-2.3 FP8 (29 GB) Lightricks/LTX-2.3-fp8 models/LTX-2.3-fp8/ltx-2.3-22b-dev-fp8.safetensors
Distilled LoRA (7.6 GB) Lightricks/LTX-2.3 models/LTX-2.3/ltx-2.3-22b-distilled-lora-384.safetensors
Spatial Upscaler 2x (1 GB) Lightricks/LTX-2.3 models/LTX-2.3/ltx-2.3-spatial-upscaler-x2-1.0.safetensors
Gemma 3 12B (~24 GB) google/gemma-3-12b-pt models/gemma-3-12b/
# Download with huggingface-cli
huggingface-cli download Lightricks/LTX-2.3-fp8 --local-dir models/LTX-2.3-fp8
huggingface-cli download Lightricks/LTX-2.3 --local-dir models/LTX-2.3
huggingface-cli download google/gemma-3-12b-pt --local-dir models/gemma-3-12b

CLI Reference

uv run python generate.py [prompt] [options]

  -p, --prompt-flag   Text prompt (flag form)
  -i, --image         Input image for I2V (jpg/jpeg/png)
  --strength          I2V conditioning strength 0.0-1.0 (default: 0.95)
  -f, --frames        Frame count, must be 8k+1 (default: 121)
  -H, --height        Height, divisible by 32 (default: 512)
  -W, --width         Width, divisible by 32 (default: 768)
  -b, --bits          Quantization: 4 or 8 (default: 8)
  -s, --seed          Random seed
  --fps               Frames per second (default: 24)
  --no-audio          Skip audio generation
  --upscale           2x spatial upscale
  -o, --output        Output directory (default: output/)

Docs

License

Research use. Model weights are subject to Lightricks LTX-Video license.

About

🎬 Accelerated LTX-2.3 (22B) text-to-video+audio generation on Apple Silicon 8-bit/4-bit quantized inference via MLX

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages