MLX-GenAI

Accelerated LTX-2.3 (22B) text-to-video+audio generation on Apple Silicon using MLX with quantized inference.

Generate 5-10 second videos with synchronized audio from text prompts or input images, running entirely on-device.

Setup

git clone https://github.com/appautomaton/MLX-GenAI.git
cd MLX-GenAI

Install uv if you haven't, then download the model weights (see Model Weights below).

Dependencies are installed automatically on first uv run.

Quick Start

# Text-to-video
uv run python generate.py "A serene mountain lake at sunrise, golden light reflecting off calm water as thin mist drifts across the surface. Tripod-locked camera, live action, 4K."

# Image-to-video
uv run python generate.py -i input/photo.jpg "The person slowly turns and smiles at the camera"

# With options
uv run python generate.py -f 121 -b 8 --upscale "your prompt here"

Output is saved to output/<timestamp>/ with video.mp4, audio.wav, and individual frames.

Features

Text-to-video (T2V) and image-to-video (I2V) generation
Joint audio+video through 48-layer DiT transformer (22B params)
8-bit / 4-bit quantized inference via MLX quantized_matmul
8-step distilled Euler diffusion (LoRA-fused)
48kHz stereo audio (BigVGAN v2 vocoder + bandwidth extension)
Optional 2x spatial upscaler
Aspect-ratio-aware resolution snapping for I2V

Requirements

Apple Silicon Mac (M-series, M1 or later)
macOS with Metal support
Python 3.12+, uv
ffmpeg
~14 GB unified memory (8-bit) or ~10 GB (4-bit)

Model Weights

Download from HuggingFace and place under models/:

Model	Source	Path
LTX-2.3 FP8 (29 GB)	Lightricks/LTX-2.3-fp8	`models/LTX-2.3-fp8/ltx-2.3-22b-dev-fp8.safetensors`
Distilled LoRA (7.6 GB)	Lightricks/LTX-2.3	`models/LTX-2.3/ltx-2.3-22b-distilled-lora-384.safetensors`
Spatial Upscaler 2x (1 GB)	Lightricks/LTX-2.3	`models/LTX-2.3/ltx-2.3-spatial-upscaler-x2-1.0.safetensors`
Gemma 3 12B (~24 GB)	google/gemma-3-12b-pt	`models/gemma-3-12b/`

# Download with huggingface-cli
huggingface-cli download Lightricks/LTX-2.3-fp8 --local-dir models/LTX-2.3-fp8
huggingface-cli download Lightricks/LTX-2.3 --local-dir models/LTX-2.3
huggingface-cli download google/gemma-3-12b-pt --local-dir models/gemma-3-12b

CLI Reference

uv run python generate.py [prompt] [options]

  -p, --prompt-flag   Text prompt (flag form)
  -i, --image         Input image for I2V (jpg/jpeg/png)
  --strength          I2V conditioning strength 0.0-1.0 (default: 0.95)
  -f, --frames        Frame count, must be 8k+1 (default: 121)
  -H, --height        Height, divisible by 32 (default: 512)
  -W, --width         Width, divisible by 32 (default: 768)
  -b, --bits          Quantization: 4 or 8 (default: 8)
  -s, --seed          Random seed
  --fps               Frames per second (default: 24)
  --no-audio          Skip audio generation
  --upscale           2x spatial upscale
  -o, --output        Output directory (default: output/)

Docs

License

Research use. Model weights are subject to Lightricks LTX-Video license.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
src/ltx_accel		src/ltx_accel
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
generate.py		generate.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLX-GenAI

Setup

Quick Start

Features

Requirements

Model Weights

CLI Reference

Docs

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MLX-GenAI

Setup

Quick Start

Features

Requirements

Model Weights

CLI Reference

Docs

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages