Task Catalog

18 tasks available with 193 models.

echo: 1 model
image-similarity: 2 models
text-similarity: 1 model
speech-enhancement: 1 model
image-interpolation: 2 models
background-removal: 1 model
super-resolution: 3 models
speech-synthesis: 5 models
audio-transcription: 9 models
depth-detection: 1 model
line-detection: 4 models
edge-detection: 3 models
pose-detection: 2 models
image-generation: 63 models
video-generation: 23 models
text-generation: 44 models
visual-question-answering: 14 models
image-captioning: 14 models

echo

Name	Echo
Author	Benjamin Paine Taproot https://github.com/painebenjamin/taproot
License	Apache License 2.0
Files	N/A
Minimum VRAM	N/A

image-similarity

(default)

Name	Traditional Image Similarity
Author	Benjamin Paine Taproot https://github.com/painebenjamin/taproot
License	Apache License 2.0
Files	N/A
Minimum VRAM	N/A

inception-v3

Name	Inception Image Similarity (FID)
Author	Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens and Zbigniew Wojna Google Research and University College London Published in CoRR, vol. 1512.00567, “Rethinking the Inception Architecture for Computer Vision”, 2015 https://arxiv.org/abs/1512.00567
License	Apache License 2.0
Files	image-similarity-inception.fp16.safetensors
Minimum VRAM	50.28 MB

text-similarity

Name	Traditional Text Similarity
Author	Benjamin Paine Taproot https://github.com/painebenjamin/taproot
License	Apache License 2.0
Files	N/A
Minimum VRAM	N/A

speech-enhancement

deep-filter-net-v3 (default)

Name	DeepFilterNet V3 Speech Enhancement
Author	Hendrick Schröter, Tobias Rosenkranz, Alberto N. Escalante-B and Andreas Maier Published in INTERSPEECH, “DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement”, 2023 https://arxiv.org/abs/2305.08227
License	Apache License 2.0
Files	speech-enhancement-deep-filter-net-3.safetensors
Minimum VRAM	8.76 MB

image-interpolation

film (default)

Name	Frame Interpolation for Large Motion (FiLM) Image Interpolation
Author	Fitsum Reda, Janne Jontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru and Brian Curless Google Research and University of Washington Published in ECCV, “FiLM: Frame Interpolation for Large Motion”, 2022 https://arxiv.org/abs/2202.04901
License	Apache License 2.0
Files	image-interpolation-film-net.fp16.pt
Minimum VRAM	70.00 MB

rife

Name	Real-Time Intermediate Flow Estimation (RIFE) Image Interpolation
Author	Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi and Shuchang Zhou Megvii Research, NERCVT, School of Computer Science, Peking University, Institute for Artificial Intelligence, Peking University and Beijing Academy of Artificial Intelligence Published in ECCV, “Real-Time Intermediate Flow Estimation for Video Frame Interpolation”, 2022 https://arxiv.org/abs/2011.06294
License	MIT License
Files	image-interpolation-rife-flownet.safetensors
Minimum VRAM	22.68 MB

background-removal

backgroundremover (default)

Name	BackgroundRemover
Author	Johnathan Nader, Lucas Nestler, Dr. Tim Scarfe and Daniel Gatis https://github.com/nadermx/backgroundremover
License	Apache License 2.0
Files	background-removal-u2net.safetensors
Minimum VRAM	217.62 MB

super-resolution

(default)

Name	Traditional Super Resolution
Author	Benjamin Paine Taproot https://github.com/painebenjamin/taproot Implementation byPillow
License	Apache License 2.0
Files	N/A
Minimum VRAM	N/A

aura

Name	Aura Super Resolution
Author	fal.ai Published in fal.ai blog, “Introducing AuraSR - An open reproduction of the GigaGAN Upscaler”, 2024 https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
License	CC BY-SA 4.0
Files	super-resolution-aura.fp16.safetensors
Minimum VRAM	1.24 GB

aura-v2

Name	Aura Super Resolution V2
Author	fal.ai Published in fal.ai blog, “AuraSR V2”, 2024 https://blog.fal.ai/aurasr-v2/
License	CC BY-SA 4.0
Files	super-resolution-aura-v2.fp16.safetensors
Minimum VRAM	1.24 GB

speech-synthesis

xtts-v2 (default)

Name	XTTS2 Speech Synthesis
Author	Coqui AI Published in Coqui AI Blog, “XTTS: Open Model Release Announcement”, 2023 https://coqui.ai/blog/tts/open_xtts
License	Mozilla Public License 2.0
Files	speech-synthesis-xtts-v2.safetensors (1.87 GB) speech-synthesis-xtts-v2-speakers.pth (7.75 MB) speech-synthesis-xtts-v2-vocab.json (361.22 KB) Total Size: 1.88 GB
Minimum VRAM	1.91 GB

f5tts

Name	F5TTS Speech Synthesis
Author	Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu and Xie Chen Published in arXiv, vol. 2410.06885, “F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching”, 2024 https://arxiv.org/abs/2410.06885
License	CC BY-NC 4.0
Files	speech-synthesis-f5tts.safetensors (1.35 GB) speech-synthesis-f5tts-vocab.txt (11.26 KB) audio-vocoder-vocos-mel-24khz.safetensors (54.35 MB) audio-vocoder-vocos-mel-24khz-config.yaml (461.00 B) Total Size: 1.40 GB
Minimum VRAM	707.16 MB

kokoro

Name	Kokoro Speech Synthesis
Author	@rzvzn, Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler and Nima Mesgarani https://huggingface.co/hexgrad/Kokoro-82M
License	Apache License 2.0
Files	speech-synthesis-kokoro-v0-19.safetensors (327.12 MB) speech-synthesis-kokoro-v0-19-voices.safetensors (5.23 MB) Total Size: 332.35 MB
Minimum VRAM	332.54 MB

zonos-hybrid

Name	ZonosHybridSpeechSynthesis
Author	Zyphra Team https://www.zyphra.com/post/beta-release-of-zonos-v0-1
License	Apache License 2.0
Files	speech-synthesis-zonos-hybrid-v0-1.bf16.safetensors (3.30 GB) audio-vocoder-descript-44khz.safetensors (306.51 MB) audio-diarisation-zonos-speaker-embedding.safetensors (396.35 MB) Total Size: 4.01 GB
Minimum VRAM	4.04 GB

zonos-transformer

Name	ZonosTransformerSpeechSynthesis
Author	Zyphra Team https://www.zyphra.com/post/beta-release-of-zonos-v0-1
License	Apache License 2.0
Files	speech-synthesis-zonos-transformer-v0-1.bf16.safetensors (3.25 GB) audio-vocoder-descript-44khz.safetensors (306.51 MB) audio-diarisation-zonos-speaker-embedding.safetensors (396.35 MB) Total Size: 3.95 GB
Minimum VRAM	4.04 GB

audio-transcription

whisper-tiny

Name	Whisper Tiny Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0
Files	audio-transcription-whisper-tiny.safetensors (151.06 MB) audio-transcription-whisper-tokenizer-vocab.json (835.55 KB) audio-transcription-whisper-tokenizer-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer.json (2.48 MB) Total Size: 154.92 MB
Minimum VRAM	147.85 MB

whisper-base

Name	Whisper Base Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0
Files	audio-transcription-whisper-base.safetensors (290.40 MB) audio-transcription-whisper-tokenizer-vocab.json (835.55 KB) audio-transcription-whisper-tokenizer-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer.json (2.48 MB) Total Size: 294.27 MB
Minimum VRAM	285.74 MB

whisper-small

Name	Whisper Small Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0
Files	audio-transcription-whisper-small.safetensors (967.00 MB) audio-transcription-whisper-tokenizer-vocab.json (835.55 KB) audio-transcription-whisper-tokenizer-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer.json (2.48 MB) Total Size: 970.86 MB
Minimum VRAM	945.03 MB

whisper-medium

Name	Whisper Medium Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0
Files	audio-transcription-whisper-medium.safetensors (3.06 GB) audio-transcription-whisper-tokenizer-vocab.json (835.55 KB) audio-transcription-whisper-tokenizer-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer.json (2.48 MB) Total Size: 3.06 GB
Minimum VRAM	3.06 GB

whisper-large-v3

Name	Whisper Large V3 Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0
Files	audio-transcription-whisper-large-v3.fp16.safetensors (3.09 GB) audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB) audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer-v3.json (2.48 MB) Total Size: 3.09 GB
Minimum VRAM	3.09 GB

distilled-whisper-small-english

Name	Distilled Whisper Small (English) Audio Transcription
Author	Sanchit Gandhi, Patrick von Platen and Alexander M. Rush Hugging Face Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023 https://arxiv.org/abs/2311.00430
License	Apache License 2.0
Files	audio-transcription-distilled-whisper-small-english.safetensors (332.30 MB) audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB) audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB) audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB) audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB) Total Size: 336.21 MB
Minimum VRAM	649.01 MB

distilled-whisper-medium-english

Name	Distilled Whisper Medium (English) Audio Transcription
Author	Sanchit Gandhi, Patrick von Platen and Alexander M. Rush Hugging Face Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023 https://arxiv.org/abs/2311.00430
License	Apache License 2.0
Files	audio-transcription-distilled-whisper-medium-english.safetensors (788.80 MB) audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB) audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB) audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB) audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB) Total Size: 792.71 MB
Minimum VRAM	1.58 GB

distilled-whisper-large-v3 (default)

Name	Distilled Whisper Large V3 Audio Transcription
Author	Sanchit Gandhi, Patrick von Platen and Alexander M. Rush Hugging Face Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023 https://arxiv.org/abs/2311.00430
License	Apache License 2.0
Files	audio-transcription-distilled-whisper-large-v3.fp16.safetensors (1.51 GB) audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB) audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer-v3.json (2.48 MB) Total Size: 1.52 GB
Minimum VRAM	1.51 GB

turbo-whisper-large-v3

Name	Turbo Whisper Large V3 Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0
Files	audio-transcription-whisper-large-v3-turbo.fp16.safetensors (1.62 GB) audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB) audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer-v3.json (2.48 MB) Total Size: 1.62 GB
Minimum VRAM	1.62 GB

depth-detection

midas (default)

Name	MiDaS Depth Detection
Author	René Ranftl, Alexey Bochkovskiy and Vladlen Koltun Published in arXiv, vol. 2103.13413, “Vision Transformers for Dense Prediction”, 2021 https://arxiv.org/abs/2103.13413
License	MIT License
Files	depth-detection-midas.fp16.safetensors
Minimum VRAM	255.65 MB

line-detection

informative-drawings (default)

Name	Informative Drawings Line Art Detection
Author	Caroline Chan, Fredo Durand and Phillip Isola Massachusetts Institute of Technology Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022 https://arxiv.org/abs/2203.12691
License	MIT License
Files	line-detection-informative-drawings.fp16.safetensors
Minimum VRAM	8.58 MB

informative-drawings-coarse

Name	Informative Drawings Coarse Line Art Detection
Author	Caroline Chan, Fredo Durand and Phillip Isola Massachusetts Institute of Technology Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022 https://arxiv.org/abs/2203.12691
License	MIT License
Files	line-detection-informative-drawings-coarse.fp16.safetensors
Minimum VRAM	8.58 MB

informative-drawings-anime

Name	Informative Drawings Anime Line Art Detection
Author	Caroline Chan, Fredo Durand and Phillip Isola Massachusetts Institute of Technology Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022 https://arxiv.org/abs/2203.12691
License	MIT License
Files	line-detection-informative-drawings-anime.fp16.safetensors
Minimum VRAM	108.81 MB

mlsd

Name	Mobile Line Segment Detection
Author	Geonmo Gu, Byungsoo Ko, SeongHyun Go, Sung-Hyun Lee, Jingeun Lee and Minchul Shin NAVER/LINE Vision Published in arXiv, vol. 2106.00186, “Towards Light-weight and Real-time Line Segment Detection”, 2022 https://arxiv.org/abs/2106.00186
License	Apache License 2.0
Files	line-detection-mlsd.fp16.safetensors
Minimum VRAM	3.22 MB

edge-detection

canny (default)

Name	Canny Edge Detection
Author	John Canny Published in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 679-698, “A Computational Approach to Edge Detection”, 1986 https://ieeexplore.ieee.org/document/4767851 Implementation byOpenCV
License	Apache License 2.0
Files	N/A
Minimum VRAM	N/A

hed

Name	Holistically-Nested Edge Detection
Author	Saining Xieand Zhuowen Tu University of California, San Diego Published in arXiv, vol. 1504.06375, “Holistically-Nested Edge Detection”, 2015 https://arxiv.org/abs/1504.06375
License	Apache License 2.0
Files	edge-detection-hed.fp16.safetensors
Minimum VRAM	29.44 MB

pidi

Name	Soft Edge (PIDI) Detection
Author	Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietikäinen and Li Liu Published in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5117-5127, “Pixel Difference Networks for Efficient Edge Detection”, 2021
License	MIT License with Non-Commercial Clause
Files	edge-detection-pidi.fp16.safetensors
Minimum VRAM	1.40 MB

pose-detection

openpose

Name	OpenPose Pose Detection
Author	Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei and Yaser Sheikh Published in arXiv, vol. 1812.08008, “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, 2018 https://arxiv.org/abs/1812.08008
License	OpenPose Academic or Non-Profit Non-Commercial Research License
Files	pose-detection-openpose.fp16.safetensors
Minimum VRAM	259.96 MB

dwpose (default)

Name	DWPose Pose Detection
Author	Zhengdong Yang, Ailing Zeng, Chun Yuan and Yu Li Tsinghua Zhenzhen International Graduate School and International Digital Economy Academy (IDEA) Published in arXiv, vol. 2307.15880, “Effective Whole-body Pose Estimation with Two-stages Distillation”, 2023 https://arxiv.org/abs/2307.15880
License	Apache License 2.0
Files	pose-detection-dwpose-estimation.safetensors (134.65 MB) pose-detection-dwpose-detection.safetensors (217.20 MB) Total Size: 351.85 MB
Minimum VRAM	354.64 MB

image-generation

stable-diffusion-v1-5

Name	Stable Diffusion v1.5 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752
License	OpenRAIL-M License
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-abyssorange-mix-v3

Name	AbyssOrange Mix V3 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byliudinglin
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-chillout-mix-ni

Name	Chillout Mix Ni Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byDreamlike Art
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-chillout-mix-ni-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-chillout-mix-ni-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-clarity-v3

Name	Clarity V3 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byndimensional
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-clarity-v3-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-clarity-v3-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-dark-sushi-mix-v2-25d

Name	Dark Sushi Mix V2 2.5D Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byAitasai
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-divine-elegance-mix-v10

Name	Divine Elegance Mix V10 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byTroubleDarkness
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-dreamshaper-v8

Name	DreamShaper V8 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byLykon
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-dreamshaper-v8-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-dreamshaper-v8-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-epicphotogasm-ultimate-fidelity

Name	epiCPhotoGasm Ultimate Fidelity Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byepinikion
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-epicrealism-v5

Name	epiCRealism V5 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byepinikion
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-epicrealism-v5-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-epicrealism-v5-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-filmgirl-ultra

Name	FilmGirl Ultra Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byLEOSAM
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-filmgirl-ultra-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-filmgirl-ultra-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-ghostmix-v2

Name	GhostMix V2 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by_GhostInShell_
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-ghostmix-v2-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-ghostmix-v2-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-lyriel-v1-6

Name	Lyriel V1.6 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byLyriel
License	OpenRAIL-M License
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-lyriel-v1-6-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-lyriel-v1-6-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-majicmix-realistic-v7

Name	MajicMix Realistic V7 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byMerjic
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-meinamix-v12

Name	MeinaMix V12 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byMeina
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-meinamix-v12-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-meinamix-v12-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-mistoon-anime-v3

Name	Mistoon Anime V3 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byInzaniak
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-mistoon-anime-v3-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-mistoon-anime-v3-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-perfect-world-v6

Name	Perfect World V6 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byBloodsuga
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-perfect-world-v6-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-perfect-world-v6-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-photon-v1

Name	Photon V1 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byPhotographer
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-photon-v1-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-photon-v1-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-realcartoon3d-v17

Name	RealCartoon3D V17 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by7whitefire7
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-realcartoon3d-v17-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-realcartoon3d-v17-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-realistic-vision-v5-1

Name	Realistic Vision V5.1 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned bySG_161222
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-realistic-vision-v6-0

Name	Realistic Vision V6.0 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned bySG_161222
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-rev-animated-v2

Name	ReV Animated V2 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byZovya
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-rev-animated-v2-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-rev-animated-v2-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-serenity-v2-1

Name	Serenity V2.1 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned bymalcolmrey
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-serenity-v2-1-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-serenity-v2-1-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-toonyou-beta-v6

Name	ToonYou Beta V6 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned byBradcatt
License	OpenRAIL-M License with Addendum
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-toonyou-beta-v6-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-toonyou-beta-v6-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-xl

Name	Stable Diffusion XL Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-albedobase-v3-1

Name	AlbedoBase XL V3.1 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-albedo-base-v3-1-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-anything

Name	Anything XL Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-anything-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-anything-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-anything-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-animagine-v3-1

Name	Animagine XL V3.1 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-animagine-v3-1-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-copax-timeless-v13

Name	Copax TimeLess V13 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-copax-timeless-v13-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-counterfeit-v2-5

Name	CounterfeitXL V2.5 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-counterfeit-v2-5-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-dreamshaper-alpha-v2

Name	DreamShaper XL Alpha V2 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-helloworld-v7

Name	LEOSAM's HelloWorld XL Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-hello-world-v7-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-hello-world-v7-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-hello-world-v7-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-juggernaut-v11 (default)

Name	Juggernaut XL V11 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-juggernaut-v11-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-lightning-8-step

Name	Stable Diffusion XL Lightning (8-Step)
Author	Shanchuan Lin, Anran Wang and Xiao Yang ByteDance Inc. Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024 https://arxiv.org/abs/2402.13929
License	OpenRAIL++-M License
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-lightning-unet-8-step.fp16.safetensors (5.14 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-lightning-4-step

Name	Stable Diffusion XL Lightning (4-Step)
Author	Shanchuan Lin, Anran Wang and Xiao Yang ByteDance Inc. Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024 https://arxiv.org/abs/2402.13929
License	OpenRAIL++-M License
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-lightning-unet-4-step.fp16.safetensors (5.14 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-lightning-2-step

Name	Stable Diffusion XL Lightning (2-Step)
Author	Shanchuan Lin, Anran Wang and Xiao Yang ByteDance Inc. Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024 https://arxiv.org/abs/2402.13929
License	OpenRAIL++-M License
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-lightning-unet-2-step.fp16.safetensors (5.14 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-nightvision-v9

Name	NightVision XL V9 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-nightvision-v9-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-nightvision-v9-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-nightvision-v9-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-realvis-v5

Name	RealVisXL V5 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-realvis-v5-0-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-stoiqo-newreality-pro

Name	Stoiqo New Reality XL Pro Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-stoiqo-newreality-pro-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-turbo

Name	Stable Diffusion XL Turbo Image Generation
Author	Axel Sauer, Dominik Lorenz, Andreas Blattmann and Robin Rombach Stability AI Published in Stability AI Blog, vol. 2307.01952, “Adversarial Diffusion Distillation”, 2024 https://stability.ai/research/adversarial-diffusion-distillation
License	Stability AI Community License
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-turbo-unet.fp16.safetensors (5.14 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-unstable-diffusers-nihilmania

Name	SDXL Unstable Diffusers NihilMania Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-zavychroma-v10

Name	ZavyChromaXL V10 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Addendum
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-zavychroma-v10-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-v3-medium

Name	Stable Diffusion V3 (Medium) Image Generation
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-transformer.fp16.safetensors (4.17 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 15.50 GB
Minimum VRAM	17.86 GB

stable-diffusion-v3-5-medium

Name	Stable Diffusion V3.5 (Medium) Image Generation
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-medium-transformer.bf16.safetensors (4.94 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 16.27 GB
Minimum VRAM	18.36 GB

stable-diffusion-v3-5-medium-int8

Name	Stable Diffusion V3.5 (Medium) Image Generation (Int8)
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-medium-transformer.int8.bf16.safetensors (2.70 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 10.41 GB
Minimum VRAM	14.85 GB

stable-diffusion-v3-5-large

Name	Stable Diffusion V3.5 (Large) Image Generation
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-transformer.part-1.bf16.safetensors (9.99 GB) image-generation-stable-diffusion-v3-5-large-transformer.part-2.bf16.safetensors (6.31 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 27.62 GB
Minimum VRAM	31.36 GB

stable-diffusion-v3-5-large-absynth-v1-9

Name	Stable Diffusion V3.5 (Large) Image Generation
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-absynth-v1-9-transformer.fp16.safetensors (16.29 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 27.62 GB
Minimum VRAM	31.36 GB

stable-diffusion-v3-5-large-absynth-v2-0

Name	Stable Diffusion V3.5 (Large) Image Generation
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-absynth-v2-0-transformer.fp16.safetensors (16.29 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 27.62 GB
Minimum VRAM	31.36 GB

stable-diffusion-v3-5-large-int8

Name	Stable Diffusion V3.5 (Large) Image Generation (Int8)
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-transformer.int8.bf16.safetensors (8.25 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 15.96 GB
Minimum VRAM	16.85 GB

stable-diffusion-v3-5-large-absynth-v1-9-int8

Name	Stable Diffusion V3.5 (Large) Image Generation (Int8)
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-absynth-v1-9-transformer.int8.fp16.safetensors (8.25 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 15.96 GB
Minimum VRAM	16.85 GB

stable-diffusion-v3-5-large-absynth-v2-0-int8

Name	Stable Diffusion V3.5 (Large) Image Generation (Int8)
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-absynth-v2-0-transformer.int8.fp16.safetensors (8.25 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 15.96 GB
Minimum VRAM	16.85 GB

stable-diffusion-v3-5-large-nf4

Name	Stable Diffusion 3.5 (Large) Image Generation (NF4)
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-transformer.nf4.bf16.safetensors (4.72 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 12.85 GB
Minimum VRAM	12.99 GB

stable-diffusion-v3-5-large-absynth-v1-9-nf4

Name	Stable Diffusion 3.5 (Large) Image Generation (NF4)
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-absynth-v1-9-transformer.nf4.fp16.safetensors (4.72 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 12.85 GB
Minimum VRAM	12.99 GB

stable-diffusion-v3-5-large-absynth-v2-0-nf4

Name	Stable Diffusion 3.5 (Large) Image Generation (NF4)
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-absynth-v2-0-transformer.nf4.fp16.safetensors (4.72 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 12.85 GB
Minimum VRAM	12.99 GB

stable-diffusion-v3-5-medium-absynth-v2-0

Name	Stable Diffusion V3.5 (Medium) Image Generation
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-medium-absynth-v2-0-transformer.fp16.safetensors (4.94 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 16.27 GB
Minimum VRAM	18.36 GB

flux-v1-dev

Name	FluxDev
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-transformer.bf16.safetensors (23.80 GB) Total Size: 33.74 GB
Minimum VRAM	29.50 GB

flux-v1-dev-int8

Name	FluxDevInt8
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-transformer.int8.bf16.safetensors (11.92 GB) Total Size: 18.24 GB
Minimum VRAM	21.22 GB

flux-v1-dev-stoiqo-newreality-alpha-v2-int8

Name	Stoiqo NewReality F1.D Alpha V2 (Int8) Image Generation
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.int8.fp16.safetensors (11.92 GB) Total Size: 18.24 GB
Minimum VRAM	21.22 GB

flux-v1-dev-nf4

Name	FluxDevNF4
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-transformer.nf4.bf16.safetensors (6.70 GB) Total Size: 13.44 GB
Minimum VRAM	14.36 GB

flux-v1-dev-stoiqo-newreality-alpha-v2-nf4

Name	Stoiqo NewReality F1.D Alpha V2 (NF4) Image Generation
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.nf4.fp16.safetensors (6.70 GB) Total Size: 13.44 GB
Minimum VRAM	14.36 GB

flux-v1-schnell

Name	FluxSchnell
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-schnell-transformer.bf16.safetensors (23.78 GB) Total Size: 33.72 GB
Minimum VRAM	29.50 GB

flux-v1-schnell-int8

Name	FluxSchnellInt8
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-schnell-transformer.int8.bf16.safetensors (11.91 GB) Total Size: 18.23 GB
Minimum VRAM	21.22 GB

flux-v1-schnell-sigma-vision-alpha-int8

Name	Sigma Vision F1.S Alpha (Int8) Image Generation
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-sigma-vision-alpha-transformer.int8.fp16.safetensors (11.91 GB) Total Size: 18.23 GB
Minimum VRAM	21.22 GB

flux-v1-schnell-nf4

Name	FluxSchnellNF4
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-schnell-transformer.nf4.bf16.safetensors (6.69 GB) Total Size: 13.44 GB
Minimum VRAM	14.36 GB

video-generation

cogvideox-2b

Name	CogVideoX 2B Video Generation
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-cog-transformer-2b.fp16.safetensors (3.39 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 13.34 GB
Minimum VRAM	13.48 GB

cogvideox-2b-int8

Name	CogVideoX 2B Video Generation (Int8)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-cog-transformer-2b.int8.fp16.safetensors (1.70 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 8.04 GB
Minimum VRAM	11.48 GB

cogvideox-5b

Name	CogVideoX 5B Video Generation
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-cog-transformer-5b.fp16.safetensors (11.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 21.10 GB
Minimum VRAM	21.48 GB

cogvideox-5b-int8

Name	CogVideoX 5B Video Generation (Int8)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-cog-transformer-5b.int8.fp16.safetensors (5.58 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 11.92 GB
Minimum VRAM	17.48 GB

cogvideox-5b-nf4

Name	CogVideoX 5B Video Generation (NF4)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-cog-transformer-5b.nf4.fp16.safetensors (3.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 9.90 GB
Minimum VRAM	12.48 GB

cogvideox-i2v-5b

Name	CogVideoX 5B Image-to-Video Generation
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 21.21 GB
Minimum VRAM	21.48 GB

cogvideox-i2v-5b-int8

Name	CogVideoX 5B Image-to-Video Generation (Int8)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 17.59 GB
Minimum VRAM	17.48 GB

cogvideox-i2v-5b-nf4

Name	CogVideoX 5B Image-to-Video Generation (NF4)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-cog-i2v-transformer-5b.nf4.fp16.safetensors (3.25 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 10.01 GB
Minimum VRAM	12.48 GB

cogvideox-v1-5-5b

Name	CogVideoX V1.5 5B Video Generation
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-cog-v1-5-transformer-5b.fp16.safetensors (11.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 21.10 GB
Minimum VRAM	21.48 GB

cogvideox-v1-5-5b-int8

Name	CogVideoX V1.5 5B Video Generation (Int8)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-cog-v1-5-transformer-5b.int8.fp16.safetensors (5.59 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 11.92 GB
Minimum VRAM	17.48 GB

cogvideox-v1-5-5b-nf4

Name	CogVideoX V1.5 5B Video Generation (NF4)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-cog-v1-5-transformer-5b.nf4.fp16.safetensors (3.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 9.90 GB
Minimum VRAM	12.48 GB

cogvideox-v1-5-i2v-5b

Name	CogVideoX V1.5 5B Image-to-Video Generation
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-cog-v1-5-i2v-transformer-5b.fp16.safetensors (11.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 21.10 GB
Minimum VRAM	21.48 GB

cogvideox-v1-5-i2v-5b-int8

Name	CogVideoX V1.5 5B Image-to-Video Generation (Int8)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-cog-v1-5-i2v-transformer-5b.int8.fp16.safetensors (5.59 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 11.92 GB
Minimum VRAM	17.48 GB

cogvideox-v1-5-i2v-5b-nf4

Name	CogVideoX V1.5 5B Image-to-Video Generation (NF4)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-cog-v1-5-i2v-transformer-5b.nf4.fp16.safetensors (3.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 9.90 GB
Minimum VRAM	12.48 GB

hunyuan

Name	Hunyuan Video Generation
Author	Hunyuan Foundation Model Team Tencent Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024 https://arxiv.org/abs/2412.03603
License	Tencent Hunyuan Community License
Files	video-generation-hunyuan-vae.safetensors (985.94 MB) video-generation-hunyuan-transformer.bf16.safetensors (25.64 GB) text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB) text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-llava-llama-text-encoder.fp16.safetensors (15.01 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) Total Size: 41.90 GB
Minimum VRAM	38.30 GB

hunyuan-int8

Name	Hunyuan Video Generation
Author	Hunyuan Foundation Model Team Tencent Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024 https://arxiv.org/abs/2412.03603
License	Tencent Hunyuan Community License
Files	video-generation-hunyuan-vae.safetensors (985.94 MB) video-generation-hunyuan-transformer.int8.bf16.safetensors (12.84 GB) text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB) text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-llava-llama-text-encoder.int8.fp16.safetensors (8.04 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) Total Size: 22.13 GB
Minimum VRAM	23.30 GB

hunyuan-nf4

Name	Hunyuan Video Generation
Author	Hunyuan Foundation Model Team Tencent Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024 https://arxiv.org/abs/2412.03603
License	Tencent Hunyuan Community License
Files	video-generation-hunyuan-vae.safetensors (985.94 MB) video-generation-hunyuan-transformer.nf4.bf16.safetensors (7.22 GB) text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB) text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-llava-llama-text-encoder.nf4.fp16.safetensors (4.98 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) Total Size: 13.45 GB
Minimum VRAM	14.78 GB

ltx (default)

Name	LTX Video Generation
Author	Lightricks https://github.com/Lightricks/LTX-Video
License	OpenRAIL-M License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-ltx-transformer.bf16.safetensors (3.85 GB) video-generation-ltx-vae.safetensors (1.87 GB) Total Size: 15.24 GB
Minimum VRAM	15.28 GB

ltx-int8

Name	LTX Video Generation
Author	Lightricks https://github.com/Lightricks/LTX-Video
License	OpenRAIL-M License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-ltx-transformer.int8.bf16.safetensors (1.93 GB) video-generation-ltx-vae.safetensors (1.87 GB) Total Size: 9.70 GB
Minimum VRAM	9.72 GB

ltx-nf4

Name	LTX Video Generation
Author	Lightricks https://github.com/Lightricks/LTX-Video
License	OpenRAIL-M License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-ltx-transformer.nf4.bf16.safetensors (1.08 GB) video-generation-ltx-vae.safetensors (1.87 GB) Total Size: 9.28 GB
Minimum VRAM	7.29 GB

mochi-v1

Name	Mochi Video Generation
Author	Genmo AI Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024 https://www.genmo.ai/blog
License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-mochi-v1-preview-transformer.bf16.safetensors (20.06 GB) video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB) Total Size: 30.50 GB
Minimum VRAM	22.95 GB

mochi-v1-int8

Name	Mochi Video Generation
Author	Genmo AI Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024 https://www.genmo.ai/blog
License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-mochi-v1-preview-transformer.int8.bf16.safetensors (10.04 GB) video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB) Total Size: 16.87 GB
Minimum VRAM	15.95 GB

mochi-v1-nf4

Name	Mochi Video Generation
Author	Genmo AI Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024 https://www.genmo.ai/blog
License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-mochi-v1-preview-transformer.nf4.bf16.safetensors (5.64 GB) video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB) Total Size: 12.89 GB
Minimum VRAM	12.41 GB

text-generation

deepseek-r1-llama-8b

Name	DeepSeekR1Llama3TextGeneration8B
Author	DeepSeek AI Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025 https://arxiv.org/abs/2501.12948
License	MIT, Meta Llama 3 Community License
Files	text-generation-deepseek-r1-llama-8b-fp16.gguf
Minimum VRAM	16.20 GB

deepseek-r1-llama-8b-q8-0

Name	DeepSeekR1Llama3TextGeneration8BQ80
Author	DeepSeek AI Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025 https://arxiv.org/abs/2501.12948
License	MIT, Meta Llama 3 Community License
Files	text-generation-deepseek-r1-llama-8b-q8-0.gguf
Minimum VRAM	9.45 GB

deepseek-r1-llama-8b-q6-k

Name	DeepSeekR1Llama3TextGeneration8BQ6K
Author	DeepSeek AI Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025 https://arxiv.org/abs/2501.12948
License	MIT, Meta Llama 3 Community License
Files	text-generation-deepseek-r1-llama-8b-q6-k.gguf
Minimum VRAM	7.73 GB

deepseek-r1-llama-8b-q5-k-m

Name	DeepSeekR1Llama3TextGeneration8BQ5KM
Author	DeepSeek AI Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025 https://arxiv.org/abs/2501.12948
License	MIT, Meta Llama 3 Community License
Files	text-generation-deepseek-r1-llama-8b-q5-k-m.gguf
Minimum VRAM	6.96 GB

deepseek-r1-llama-8b-q4-k-m

Name	DeepSeekR1Llama3TextGeneration8BQ4KM
Author	DeepSeek AI Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025 https://arxiv.org/abs/2501.12948
License	MIT, Meta Llama 3 Community License
Files	text-generation-deepseek-r1-llama-8b-q4-k-m.gguf
Minimum VRAM	6.24 GB

deepseek-r1-llama-8b-q3-k-m

Name	DeepSeekR1Llama3TextGeneration8BQ3KM
Author	DeepSeek AI Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025 https://arxiv.org/abs/2501.12948
License	MIT, Meta Llama 3 Community License
Files	text-generation-deepseek-r1-llama-8b-q3-k-m.gguf
Minimum VRAM	5.44 GB

deepseek-r1-llama-8b-q2-k

Name	DeepSeekR1Llama3TextGeneration8BQ2K
Author	DeepSeek AI Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025 https://arxiv.org/abs/2501.12948
License	MIT, Meta Llama 3 Community License
Files	text-generation-deepseek-r1-llama-8b-q2-k.gguf
Minimum VRAM	4.71 GB

llama-v3-8b

Name	Llama V3.0 8B Text Generation
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-8b-q8-0.gguf
Minimum VRAM	9.64 GB

llama-v3-8b-q6-k

Name	Llama V3.0 8B Text Generation (Q6-K)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-8b-q6-k.gguf
Minimum VRAM	8.10 GB

llama-v3-8b-q5-k-m

Name	Llama V3.0 8B Text Generation (Q5-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-8b-q5-k-m.gguf
Minimum VRAM	7.30 GB

llama-v3-8b-q4-k-m

Name	Llama V3.0 8B Text Generation (Q4-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-8b-q4-k-m.gguf
Minimum VRAM	6.56 GB

llama-v3-8b-q3-k-m

Name	Llama V3.0 8B Text Generation (Q3-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-8b-q3-k-m.gguf
Minimum VRAM	5.72 GB

llama-v3-8b-instruct

Name	Llama V3.0 8B Instruct Text Generation
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-8b-instruct-q8-0.gguf
Minimum VRAM	9.64 GB

llama-v3-8b-instruct-q6-k

Name	Llama V3.0 8B Instruct Text Generation (Q6-K)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-8b-instruct-q6-k.gguf
Minimum VRAM	8.10 GB

llama-v3-8b-instruct-q5-k-m

Name	Llama V3.0 8B Instruct Text Generation (Q5-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-8b-instruct-q5-k-m.gguf
Minimum VRAM	7.30 GB

llama-v3-8b-instruct-q4-k-m

Name	Llama V3.0 8B Instruct Text Generation (Q4-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-8b-instruct-q4-k-m.gguf
Minimum VRAM	6.56 GB

llama-v3-8b-instruct-q3-k-m

Name	Llama V3.0 8B Instruct Text Generation (Q3-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-8b-instruct-q3-k-m.gguf
Minimum VRAM	5.72 GB

llama-v3-1-8b-instruct

Name	Llama V3.1 8B Instruct Text Generation
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-1-8b-instruct-q8-0.gguf
Minimum VRAM	9.64 GB

llama-v3-1-8b-instruct-q6-k (default)

Name	Llama V3.1 8B Instruct Text Generation (Q6-K)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-1-8b-instruct-q6-k.gguf
Minimum VRAM	8.10 GB

llama-v3-1-8b-instruct-q5-k-m

Name	Llama V3.1 8B Instruct Text Generation (Q5-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-1-8b-instruct-q5-k-m.gguf
Minimum VRAM	7.30 GB

llama-v3-1-8b-instruct-q4-k-m

Name	Llama V3.1 8B Instruct Text Generation (Q4-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-1-8b-instruct-q4-k-m.gguf
Minimum VRAM	6.56 GB

llama-v3-1-8b-instruct-q3-k-m

Name	Llama V3.1 8B Instruct Text Generation (Q3-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-1-8b-instruct-q3-k-m.gguf
Minimum VRAM	5.72 GB

llama-v3-2-3b-instruct

Name	Llama V3.2 3B Instruct Text Generation
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-3b-instruct-f16.gguf
Minimum VRAM	8.04 GB

llama-v3-2-3b-instruct-q8-0

Name	Llama V3.2 3B Instruct Text Generation (Q8-0)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-3b-instruct-q8-0.gguf
Minimum VRAM	5.02 GB

llama-v3-2-3b-instruct-q6-k

Name	Llama V3.2 3B Instruct Text Generation (Q6-K)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-3b-instruct-q6-k.gguf
Minimum VRAM	4.20 GB

llama-v3-2-3b-instruct-q5-k-m

Name	Llama V3.2 3B Instruct Text Generation (Q5-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-3b-instruct-q5-k-m.gguf
Minimum VRAM	3.90 GB

llama-v3-2-3b-instruct-q4-k-m

Name	Llama V3.2 3B Instruct Text Generation (Q4-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-3b-instruct-q4-k-m.gguf
Minimum VRAM	3.50 GB

llama-v3-2-3b-instruct-q3-k-l

Name	Llama V3.2 3B Instruct Text Generation (Q3-K-L)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-3b-instruct-q3-k-l.gguf
Minimum VRAM	3.10 GB

llama-v3-2-1b-instruct

Name	Llama V3.2 1B Instruct Text Generation
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-1b-instruct-f16.gguf
Minimum VRAM	3.60 GB

llama-v3-2-1b-instruct-q8-0

Name	Llama V3.2 1B Instruct Text Generation (Q8-0)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-1b-instruct-q8-0.gguf
Minimum VRAM	2.43 GB

llama-v3-2-1b-instruct-q6-k

Name	Llama V3.2 1B Instruct Text Generation (Q6-K)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-1b-instruct-q6-k.gguf
Minimum VRAM	2.15 GB

llama-v3-2-1b-instruct-q5-k-m

Name	Llama V3.2 1B Instruct Text Generation (Q5-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-1b-instruct-q5-k-m.gguf
Minimum VRAM	2.02 GB

llama-v3-2-1b-instruct-q4-k-m

Name	Llama V3.2 1B Instruct Text Generation (Q4-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-1b-instruct-q4-k-m.gguf
Minimum VRAM	1.64 GB

llama-v3-2-1b-instruct-q3-k-l

Name	Llama V3.2 1B Instruct Text Generation (Q3-K-L)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License
Files	text-generation-llama-v3-2-1b-instruct-q3-k-l.gguf
Minimum VRAM	1.58 GB

zephyr-7b-alpha

Name	Zephyr 7B α Text Generation (Q8)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License
Files	text-generation-zephyr-alpha-7b-q8-0.gguf
Minimum VRAM	9.40 GB

zephyr-7b-alpha-q6-k

Name	Zephyr 7B α Text Generation (Q6-K)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License
Files	text-generation-zephyr-alpha-7b-q6-k.gguf
Minimum VRAM	8.20 GB

zephyr-7b-alpha-q5-k-m

Name	Zephyr 7B α Text Generation (Q5-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License
Files	text-generation-zephyr-alpha-7b-q5-k-m.gguf
Minimum VRAM	7.25 GB

zephyr-7b-alpha-q4-k-m

Name	Zephyr 7B α Text Generation (Q4-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License
Files	text-generation-zephyr-alpha-7b-q4-k-m.gguf
Minimum VRAM	6.30 GB

zephyr-7b-alpha-q3-k-m

Name	Zephyr 7B α Text Generation (Q3-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License
Files	text-generation-zephyr-alpha-7b-q3-k-m.gguf
Minimum VRAM	5.35 GB

zephyr-7b-beta

Name	Zephyr 7B β Text Generation
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License
Files	text-generation-zephyr-beta-7b-q8-0.gguf
Minimum VRAM	9.40 GB

zephyr-7b-beta-q6-k

Name	Zephyr 7B β Text Generation (Q6-K)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License
Files	text-generation-zephyr-beta-7b-q6-k.gguf
Minimum VRAM	8.20 GB

zephyr-7b-beta-q5-k-m

Name	Zephyr 7B β Text Generation (Q5-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License
Files	text-generation-zephyr-beta-7b-q5-k-m.gguf
Minimum VRAM	7.25 GB

zephyr-7b-beta-q4-k-m

Name	Zephyr 7B β Text Generation (Q4-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License
Files	text-generation-zephyr-beta-7b-q4-k-m.gguf
Minimum VRAM	6.30 GB

zephyr-7b-beta-q3-k-m

Name	Zephyr 7B β Text Generation (Q3-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License
Files	text-generation-zephyr-beta-7b-q3-k-m.gguf
Minimum VRAM	5.35 GB

visual-question-answering

llava-v1-5-7b

Name	LLaVA V1.5 7B Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 14.10 GB
Minimum VRAM	15.80 GB

llava-v1-5-7b-q8

Name	LLaVA V1.5 7B (Q8-0) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 7.79 GB
Minimum VRAM	9.90 GB

llava-v1-5-7b-q6-k

Name	LLaVA V1.5 7B (Q6-K) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 6.15 GB
Minimum VRAM	8.40 GB

llava-v1-5-7b-q5-k-m

Name	LLaVA V1.5 7B (Q5-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 5.41 GB
Minimum VRAM	7.71 GB

llava-v1-5-7b-q4-k-m

Name	LLaVA V1.5 7B (Q4-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 4.71 GB
Minimum VRAM	7.04 GB

llava-v1-5-7b-q3-k-m

Name	LLaVA V1.5 7B (Q3-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 3.92 GB
Minimum VRAM	6.33 GB

llava-v1-5-13b

Name	LLaVA V1.51 13B (Q8-0) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 14.48 GB
Minimum VRAM	17.51 GB

llava-v1-5-13b-q6-k

Name	LLaVA V1.51 13B (Q6-K) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 11.32 GB
Minimum VRAM	14.54 GB

llava-v1-5-13b-q5-k-m

Name	LLaVA V1.51 13B (Q5-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 9.88 GB
Minimum VRAM	13.17 GB

llava-v1-5-13b-q4-0

Name	LLaVA V1.51 13B (Q4-0) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 8.01 GB
Minimum VRAM	11.48 GB

llava-v1-6-34b-q5-k-m

Name	LLaVA V1.6 34B (Q5-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB) image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB) Total Size: 25.02 GB
Minimum VRAM	24.96 GB

llava-v1-6-34b-q4-k-m

Name	LLaVA V1.6 34B (Q4-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB) image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB) Total Size: 21.36 GB
Minimum VRAM	21.88 GB

llava-v1-6-34b-q3-k-m

Name	LLaVA V1.6 34B (Q3-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB) image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB) Total Size: 17.35 GB
Minimum VRAM	18.06 GB

moondream-v2 (default)

Name	Moondream V2 Visual Question Answering
Author	Vikhyat Korrapati Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024 https://huggingface.co/vikhyatk/moondream2
License	Apache License 2.0
Files	visual-question-answering-moondream-v2.fp16.gguf (2.84 GB) image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB) Total Size: 3.75 GB
Minimum VRAM	4.44 GB

image-captioning

llava-v1-5-7b

Name	LLaVA V1.5 7B Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 14.10 GB
Minimum VRAM	15.80 GB

llava-v1-5-7b-q8

Name	LLaVA V1.5 7B (Q8-0) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 7.79 GB
Minimum VRAM	9.90 GB

llava-v1-5-7b-q6-k

Name	LLaVA V1.5 7B (Q6-K) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 6.15 GB
Minimum VRAM	8.40 GB

llava-v1-5-7b-q5-k-m

Name	LLaVA V1.5 7B (Q5-K-M) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 5.41 GB
Minimum VRAM	7.71 GB

llava-v1-5-7b-q4-k-m

Name	LLaVA V1.5 7B (Q4-K-M) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 4.71 GB
Minimum VRAM	7.04 GB

llava-v1-5-7b-q3-k-m

Name	LLaVA V1.5 7B (Q3-K-M) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 3.92 GB
Minimum VRAM	6.33 GB

llava-v1-5-13b

Name	LLaVA V1.51 13B (Q8-0) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 14.48 GB
Minimum VRAM	17.51 GB

llava-v1-5-13b-q6-k

Name	LLaVA V1.51 13B (Q6-K) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 11.32 GB
Minimum VRAM	14.54 GB

llava-v1-5-13b-q5-k-m

Name	LLaVA V1.51 13B (Q5-K-M) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 9.88 GB
Minimum VRAM	13.17 GB

llava-v1-5-13b-q4-0

Name	LLaVA V1.51 13B (Q4-0) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 8.01 GB
Minimum VRAM	11.48 GB

llava-v1-6-34b-q5-k-m

Name	LLaVA V1.6 34B (Q5-K-M) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB) image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB) Total Size: 25.02 GB
Minimum VRAM	24.96 GB

llava-v1-6-34b-q4-k-m

Name	LLaVA V1.6 34B (Q4-K-M) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB) image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB) Total Size: 21.36 GB
Minimum VRAM	21.88 GB

llava-v1-6-34b-q3-k-m

Name	LLaVA V1.6 34B (Q3-K-M) Image Captioning
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License
Files	visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB) image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB) Total Size: 17.35 GB
Minimum VRAM	18.06 GB

moondream-v2 (default)

Name	Moondream V2 Image Captioning
Author	Vikhyat Korrapati Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024 https://huggingface.co/vikhyatk/moondream2
License	Apache License 2.0
Files	visual-question-answering-moondream-v2.fp16.gguf (2.84 GB) image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB) Total Size: 3.75 GB
Minimum VRAM	4.44 GB