Skip to content

Task Catalog

Benjamin Paine edited this page Feb 14, 2025 · 4 revisions

18 tasks available with 193 models.

echo

Name Echo
Author Benjamin Paine
Taproot
https://github.com/painebenjamin/taproot
License Apache License 2.0
Files N/A
Minimum VRAM N/A

image-similarity

(default)

Name Traditional Image Similarity
Author Benjamin Paine
Taproot
https://github.com/painebenjamin/taproot
License Apache License 2.0
Files N/A
Minimum VRAM N/A

inception-v3

Name Inception Image Similarity (FID)
Author Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens and Zbigniew Wojna
Google Research and University College London
Published in CoRR, vol. 1512.00567, “Rethinking the Inception Architecture for Computer Vision”, 2015
https://arxiv.org/abs/1512.00567
License Apache License 2.0
Files image-similarity-inception.fp16.safetensors
Minimum VRAM 50.28 MB

text-similarity

Name Traditional Text Similarity
Author Benjamin Paine
Taproot
https://github.com/painebenjamin/taproot
License Apache License 2.0
Files N/A
Minimum VRAM N/A

speech-enhancement

deep-filter-net-v3 (default)

Name DeepFilterNet V3 Speech Enhancement
Author Hendrick Schröter, Tobias Rosenkranz, Alberto N. Escalante-B and Andreas Maier
Published in INTERSPEECH, “DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement”, 2023
https://arxiv.org/abs/2305.08227
License Apache License 2.0
Files speech-enhancement-deep-filter-net-3.safetensors
Minimum VRAM 8.76 MB

image-interpolation

film (default)

Name Frame Interpolation for Large Motion (FiLM) Image Interpolation
Author Fitsum Reda, Janne Jontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru and Brian Curless
Google Research and University of Washington
Published in ECCV, “FiLM: Frame Interpolation for Large Motion”, 2022
https://arxiv.org/abs/2202.04901
License Apache License 2.0
Files image-interpolation-film-net.fp16.pt
Minimum VRAM 70.00 MB

rife

Name Real-Time Intermediate Flow Estimation (RIFE) Image Interpolation
Author Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi and Shuchang Zhou
Megvii Research, NERCVT, School of Computer Science, Peking University, Institute for Artificial Intelligence, Peking University and Beijing Academy of Artificial Intelligence
Published in ECCV, “Real-Time Intermediate Flow Estimation for Video Frame Interpolation”, 2022
https://arxiv.org/abs/2011.06294
License MIT License
Files image-interpolation-rife-flownet.safetensors
Minimum VRAM 22.68 MB

background-removal

backgroundremover (default)

Name BackgroundRemover
Author Johnathan Nader, Lucas Nestler, Dr. Tim Scarfe and Daniel Gatis
https://github.com/nadermx/backgroundremover
License Apache License 2.0
Files background-removal-u2net.safetensors
Minimum VRAM 217.62 MB

super-resolution

(default)

Name Traditional Super Resolution
Author Benjamin Paine
Taproot
https://github.com/painebenjamin/taproot
Implementation byPillow
License Apache License 2.0
Files N/A
Minimum VRAM N/A

aura

Name Aura Super Resolution
Author fal.ai
Published in fal.ai blog, “Introducing AuraSR - An open reproduction of the GigaGAN Upscaler”, 2024
https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
License CC BY-SA 4.0
Files super-resolution-aura.fp16.safetensors
Minimum VRAM 1.24 GB

aura-v2

Name Aura Super Resolution V2
Author fal.ai
Published in fal.ai blog, “AuraSR V2”, 2024
https://blog.fal.ai/aurasr-v2/
License CC BY-SA 4.0
Files super-resolution-aura-v2.fp16.safetensors
Minimum VRAM 1.24 GB

speech-synthesis

xtts-v2 (default)

Name XTTS2 Speech Synthesis
Author Coqui AI
Published in Coqui AI Blog, “XTTS: Open Model Release Announcement”, 2023
https://coqui.ai/blog/tts/open_xtts
License Mozilla Public License 2.0
Files
  1. speech-synthesis-xtts-v2.safetensors (1.87 GB)
  2. speech-synthesis-xtts-v2-speakers.pth (7.75 MB)
  3. speech-synthesis-xtts-v2-vocab.json (361.22 KB)

Total Size: 1.88 GB

Minimum VRAM 1.91 GB

f5tts

Name F5TTS Speech Synthesis
Author Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu and Xie Chen
Published in arXiv, vol. 2410.06885, “F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching”, 2024
https://arxiv.org/abs/2410.06885
License CC BY-NC 4.0
Files
  1. speech-synthesis-f5tts.safetensors (1.35 GB)
  2. speech-synthesis-f5tts-vocab.txt (11.26 KB)
  3. audio-vocoder-vocos-mel-24khz.safetensors (54.35 MB)
  4. audio-vocoder-vocos-mel-24khz-config.yaml (461.00 B)

Total Size: 1.40 GB

Minimum VRAM 707.16 MB

kokoro

Name Kokoro Speech Synthesis
Author @rzvzn, Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler and Nima Mesgarani
https://huggingface.co/hexgrad/Kokoro-82M
License Apache License 2.0
Files
  1. speech-synthesis-kokoro-v0-19.safetensors (327.12 MB)
  2. speech-synthesis-kokoro-v0-19-voices.safetensors (5.23 MB)

Total Size: 332.35 MB

Minimum VRAM 332.54 MB

zonos-hybrid

Name ZonosHybridSpeechSynthesis
Author Zyphra Team
https://www.zyphra.com/post/beta-release-of-zonos-v0-1
License Apache License 2.0
Files
  1. speech-synthesis-zonos-hybrid-v0-1.bf16.safetensors (3.30 GB)
  2. audio-vocoder-descript-44khz.safetensors (306.51 MB)
  3. audio-diarisation-zonos-speaker-embedding.safetensors (396.35 MB)

Total Size: 4.01 GB

Minimum VRAM 4.04 GB

zonos-transformer

Name ZonosTransformerSpeechSynthesis
Author Zyphra Team
https://www.zyphra.com/post/beta-release-of-zonos-v0-1
License Apache License 2.0
Files
  1. speech-synthesis-zonos-transformer-v0-1.bf16.safetensors (3.25 GB)
  2. audio-vocoder-descript-44khz.safetensors (306.51 MB)
  3. audio-diarisation-zonos-speaker-embedding.safetensors (396.35 MB)

Total Size: 3.95 GB

Minimum VRAM 4.04 GB

audio-transcription

whisper-tiny

Name Whisper Tiny Audio Transcription
Author Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
License Apache License 2.0
Files
  1. audio-transcription-whisper-tiny.safetensors (151.06 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 154.92 MB

Minimum VRAM 147.85 MB

whisper-base

Name Whisper Base Audio Transcription
Author Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
License Apache License 2.0
Files
  1. audio-transcription-whisper-base.safetensors (290.40 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 294.27 MB

Minimum VRAM 285.74 MB

whisper-small

Name Whisper Small Audio Transcription
Author Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
License Apache License 2.0
Files
  1. audio-transcription-whisper-small.safetensors (967.00 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 970.86 MB

Minimum VRAM 945.03 MB

whisper-medium

Name Whisper Medium Audio Transcription
Author Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
License Apache License 2.0
Files
  1. audio-transcription-whisper-medium.safetensors (3.06 GB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 3.06 GB

Minimum VRAM 3.06 GB

whisper-large-v3

Name Whisper Large V3 Audio Transcription
Author Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
License Apache License 2.0
Files
  1. audio-transcription-whisper-large-v3.fp16.safetensors (3.09 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 3.09 GB

Minimum VRAM 3.09 GB

distilled-whisper-small-english

Name Distilled Whisper Small (English) Audio Transcription
Author Sanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
License Apache License 2.0
Files
  1. audio-transcription-distilled-whisper-small-english.safetensors (332.30 MB)
  2. audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB)
  3. audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB)
  4. audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB)

Total Size: 336.21 MB

Minimum VRAM 649.01 MB

distilled-whisper-medium-english

Name Distilled Whisper Medium (English) Audio Transcription
Author Sanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
License Apache License 2.0
Files
  1. audio-transcription-distilled-whisper-medium-english.safetensors (788.80 MB)
  2. audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB)
  3. audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB)
  4. audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB)

Total Size: 792.71 MB

Minimum VRAM 1.58 GB

distilled-whisper-large-v3 (default)

Name Distilled Whisper Large V3 Audio Transcription
Author Sanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
License Apache License 2.0
Files
  1. audio-transcription-distilled-whisper-large-v3.fp16.safetensors (1.51 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 1.52 GB

Minimum VRAM 1.51 GB

turbo-whisper-large-v3

Name Turbo Whisper Large V3 Audio Transcription
Author Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
License Apache License 2.0
Files
  1. audio-transcription-whisper-large-v3-turbo.fp16.safetensors (1.62 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 1.62 GB

Minimum VRAM 1.62 GB

depth-detection

midas (default)

Name MiDaS Depth Detection
Author René Ranftl, Alexey Bochkovskiy and Vladlen Koltun
Published in arXiv, vol. 2103.13413, “Vision Transformers for Dense Prediction”, 2021
https://arxiv.org/abs/2103.13413
License MIT License
Files depth-detection-midas.fp16.safetensors
Minimum VRAM 255.65 MB

line-detection

informative-drawings (default)

Name Informative Drawings Line Art Detection
Author Caroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
License MIT License
Files line-detection-informative-drawings.fp16.safetensors
Minimum VRAM 8.58 MB

informative-drawings-coarse

Name Informative Drawings Coarse Line Art Detection
Author Caroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
License MIT License
Files line-detection-informative-drawings-coarse.fp16.safetensors
Minimum VRAM 8.58 MB

informative-drawings-anime

Name Informative Drawings Anime Line Art Detection
Author Caroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
License MIT License
Files line-detection-informative-drawings-anime.fp16.safetensors
Minimum VRAM 108.81 MB

mlsd

Name Mobile Line Segment Detection
Author Geonmo Gu, Byungsoo Ko, SeongHyun Go, Sung-Hyun Lee, Jingeun Lee and Minchul Shin
NAVER/LINE Vision
Published in arXiv, vol. 2106.00186, “Towards Light-weight and Real-time Line Segment Detection”, 2022
https://arxiv.org/abs/2106.00186
License Apache License 2.0
Files line-detection-mlsd.fp16.safetensors
Minimum VRAM 3.22 MB

edge-detection

canny (default)

Name Canny Edge Detection
Author John Canny
Published in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 679-698, “A Computational Approach to Edge Detection”, 1986
https://ieeexplore.ieee.org/document/4767851
Implementation byOpenCV
License Apache License 2.0
Files N/A
Minimum VRAM N/A

hed

Name Holistically-Nested Edge Detection
Author Saining Xieand Zhuowen Tu
University of California, San Diego
Published in arXiv, vol. 1504.06375, “Holistically-Nested Edge Detection”, 2015
https://arxiv.org/abs/1504.06375
License Apache License 2.0
Files edge-detection-hed.fp16.safetensors
Minimum VRAM 29.44 MB

pidi

Name Soft Edge (PIDI) Detection
Author Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietikäinen and Li Liu
Published in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5117-5127, “Pixel Difference Networks for Efficient Edge Detection”, 2021
License MIT License with Non-Commercial Clause
Files edge-detection-pidi.fp16.safetensors
Minimum VRAM 1.40 MB

pose-detection

openpose

Name OpenPose Pose Detection
Author Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei and Yaser Sheikh
Published in arXiv, vol. 1812.08008, “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, 2018
https://arxiv.org/abs/1812.08008
License OpenPose Academic or Non-Profit Non-Commercial Research License
Files pose-detection-openpose.fp16.safetensors
Minimum VRAM 259.96 MB

dwpose (default)

Name DWPose Pose Detection
Author Zhengdong Yang, Ailing Zeng, Chun Yuan and Yu Li
Tsinghua Zhenzhen International Graduate School and International Digital Economy Academy (IDEA)
Published in arXiv, vol. 2307.15880, “Effective Whole-body Pose Estimation with Two-stages Distillation”, 2023
https://arxiv.org/abs/2307.15880
License Apache License 2.0
Files
  1. pose-detection-dwpose-estimation.safetensors (134.65 MB)
  2. pose-detection-dwpose-detection.safetensors (217.20 MB)

Total Size: 351.85 MB

Minimum VRAM 354.64 MB

image-generation

stable-diffusion-v1-5

Name Stable Diffusion v1.5 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
License OpenRAIL-M License
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-abyssorange-mix-v3

Name AbyssOrange Mix V3 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byliudinglin
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-chillout-mix-ni

Name Chillout Mix Ni Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byDreamlike Art
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-chillout-mix-ni-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-chillout-mix-ni-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-clarity-v3

Name Clarity V3 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byndimensional
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-clarity-v3-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-clarity-v3-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-dark-sushi-mix-v2-25d

Name Dark Sushi Mix V2 2.5D Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byAitasai
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-divine-elegance-mix-v10

Name Divine Elegance Mix V10 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byTroubleDarkness
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-dreamshaper-v8

Name DreamShaper V8 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byLykon
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-dreamshaper-v8-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-dreamshaper-v8-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-epicphotogasm-ultimate-fidelity

Name epiCPhotoGasm Ultimate Fidelity Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byepinikion
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-epicrealism-v5

Name epiCRealism V5 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byepinikion
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-epicrealism-v5-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-epicrealism-v5-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-filmgirl-ultra

Name FilmGirl Ultra Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byLEOSAM
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-filmgirl-ultra-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-filmgirl-ultra-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-ghostmix-v2

Name GhostMix V2 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by_GhostInShell_
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-ghostmix-v2-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-ghostmix-v2-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-lyriel-v1-6

Name Lyriel V1.6 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byLyriel
License OpenRAIL-M License
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-lyriel-v1-6-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-lyriel-v1-6-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-majicmix-realistic-v7

Name MajicMix Realistic V7 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byMerjic
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-meinamix-v12

Name MeinaMix V12 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byMeina
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-meinamix-v12-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-meinamix-v12-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-mistoon-anime-v3

Name Mistoon Anime V3 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byInzaniak
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-mistoon-anime-v3-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-mistoon-anime-v3-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-perfect-world-v6

Name Perfect World V6 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byBloodsuga
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-perfect-world-v6-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-perfect-world-v6-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-photon-v1

Name Photon V1 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byPhotographer
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-photon-v1-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-photon-v1-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-realcartoon3d-v17

Name RealCartoon3D V17 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by7whitefire7
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-realcartoon3d-v17-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-realcartoon3d-v17-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-realistic-vision-v5-1

Name Realistic Vision V5.1 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned bySG_161222
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-realistic-vision-v6-0

Name Realistic Vision V6.0 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned bySG_161222
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-rev-animated-v2

Name ReV Animated V2 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byZovya
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-rev-animated-v2-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-rev-animated-v2-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-serenity-v2-1

Name Serenity V2.1 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned bymalcolmrey
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-serenity-v2-1-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-serenity-v2-1-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-v1-5-toonyou-beta-v6

Name ToonYou Beta V6 Image Generation
Author Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned byBradcatt
License OpenRAIL-M License with Addendum
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-toonyou-beta-v6-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-toonyou-beta-v6-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM 2.58 GB

stable-diffusion-xl

Name Stable Diffusion XL Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-albedobase-v3-1

Name AlbedoBase XL V3.1 Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-albedo-base-v3-1-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-anything

Name Anything XL Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-anything-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-anything-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-anything-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-animagine-v3-1

Name Animagine XL V3.1 Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-animagine-v3-1-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-copax-timeless-v13

Name Copax TimeLess V13 Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-copax-timeless-v13-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-counterfeit-v2-5

Name CounterfeitXL V2.5 Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-counterfeit-v2-5-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-dreamshaper-alpha-v2

Name DreamShaper XL Alpha V2 Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-helloworld-v7

Name LEOSAM's HelloWorld XL Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-hello-world-v7-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-hello-world-v7-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-hello-world-v7-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-juggernaut-v11 (default)

Name Juggernaut XL V11 Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-juggernaut-v11-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-lightning-8-step

Name Stable Diffusion XL Lightning (8-Step)
Author Shanchuan Lin, Anran Wang and Xiao Yang
ByteDance Inc.
Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
https://arxiv.org/abs/2402.13929
License OpenRAIL++-M License
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-lightning-unet-8-step.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-lightning-4-step

Name Stable Diffusion XL Lightning (4-Step)
Author Shanchuan Lin, Anran Wang and Xiao Yang
ByteDance Inc.
Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
https://arxiv.org/abs/2402.13929
License OpenRAIL++-M License
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-lightning-unet-4-step.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-lightning-2-step

Name Stable Diffusion XL Lightning (2-Step)
Author Shanchuan Lin, Anran Wang and Xiao Yang
ByteDance Inc.
Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
https://arxiv.org/abs/2402.13929
License OpenRAIL++-M License
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-lightning-unet-2-step.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-nightvision-v9

Name NightVision XL V9 Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-nightvision-v9-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-nightvision-v9-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-nightvision-v9-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-realvis-v5

Name RealVisXL V5 Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-realvis-v5-0-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-stoiqo-newreality-pro

Name Stoiqo New Reality XL Pro Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-turbo

Name Stable Diffusion XL Turbo Image Generation
Author Axel Sauer, Dominik Lorenz, Andreas Blattmann and Robin Rombach
Stability AI
Published in Stability AI Blog, vol. 2307.01952, “Adversarial Diffusion Distillation”, 2024
https://stability.ai/research/adversarial-diffusion-distillation
License Stability AI Community License
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-turbo-unet.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-unstable-diffusers-nihilmania

Name SDXL Unstable Diffusers NihilMania Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-xl-zavychroma-v10

Name ZavyChromaXL V10 Image Generation
Author Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License OpenRAIL++-M License with Addendum
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-zavychroma-v10-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM 7.06 GB

stable-diffusion-v3-medium

Name Stable Diffusion V3 (Medium) Image Generation
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-transformer.fp16.safetensors (4.17 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 15.50 GB

Minimum VRAM 17.86 GB

stable-diffusion-v3-5-medium

Name Stable Diffusion V3.5 (Medium) Image Generation
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-medium-transformer.bf16.safetensors (4.94 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 16.27 GB

Minimum VRAM 18.36 GB

stable-diffusion-v3-5-medium-int8

Name Stable Diffusion V3.5 (Medium) Image Generation (Int8)
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-medium-transformer.int8.bf16.safetensors (2.70 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 10.41 GB

Minimum VRAM 14.85 GB

stable-diffusion-v3-5-large

Name Stable Diffusion V3.5 (Large) Image Generation
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-transformer.part-1.bf16.safetensors (9.99 GB)
  3. image-generation-stable-diffusion-v3-5-large-transformer.part-2.bf16.safetensors (6.31 GB)
  4. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  5. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  6. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  7. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  8. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  9. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  10. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  11. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  12. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  13. text-encoding-t5-xxl-vocab.model (791.66 KB)
  14. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 27.62 GB

Minimum VRAM 31.36 GB

stable-diffusion-v3-5-large-absynth-v1-9

Name Stable Diffusion V3.5 (Large) Image Generation
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-absynth-v1-9-transformer.fp16.safetensors (16.29 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 27.62 GB

Minimum VRAM 31.36 GB

stable-diffusion-v3-5-large-absynth-v2-0

Name Stable Diffusion V3.5 (Large) Image Generation
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-absynth-v2-0-transformer.fp16.safetensors (16.29 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 27.62 GB

Minimum VRAM 31.36 GB

stable-diffusion-v3-5-large-int8

Name Stable Diffusion V3.5 (Large) Image Generation (Int8)
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-transformer.int8.bf16.safetensors (8.25 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 15.96 GB

Minimum VRAM 16.85 GB

stable-diffusion-v3-5-large-absynth-v1-9-int8

Name Stable Diffusion V3.5 (Large) Image Generation (Int8)
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-absynth-v1-9-transformer.int8.fp16.safetensors (8.25 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 15.96 GB

Minimum VRAM 16.85 GB

stable-diffusion-v3-5-large-absynth-v2-0-int8

Name Stable Diffusion V3.5 (Large) Image Generation (Int8)
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-absynth-v2-0-transformer.int8.fp16.safetensors (8.25 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 15.96 GB

Minimum VRAM 16.85 GB

stable-diffusion-v3-5-large-nf4

Name Stable Diffusion 3.5 (Large) Image Generation (NF4)
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-transformer.nf4.bf16.safetensors (4.72 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 12.85 GB

Minimum VRAM 12.99 GB

stable-diffusion-v3-5-large-absynth-v1-9-nf4

Name Stable Diffusion 3.5 (Large) Image Generation (NF4)
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-absynth-v1-9-transformer.nf4.fp16.safetensors (4.72 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 12.85 GB

Minimum VRAM 12.99 GB

stable-diffusion-v3-5-large-absynth-v2-0-nf4

Name Stable Diffusion 3.5 (Large) Image Generation (NF4)
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-absynth-v2-0-transformer.nf4.fp16.safetensors (4.72 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 12.85 GB

Minimum VRAM 12.99 GB

stable-diffusion-v3-5-medium-absynth-v2-0

Name Stable Diffusion V3.5 (Medium) Image Generation
Author Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
License Stability AI Community License Agreement
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-medium-absynth-v2-0-transformer.fp16.safetensors (4.94 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 16.27 GB

Minimum VRAM 18.36 GB

flux-v1-dev

Name FluxDev
Author Black Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
License FLUX.1 Non-Commercial License
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-transformer.bf16.safetensors (23.80 GB)

Total Size: 33.74 GB

Minimum VRAM 29.50 GB

flux-v1-dev-int8

Name FluxDevInt8
Author Black Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
License FLUX.1 Non-Commercial License
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-transformer.int8.bf16.safetensors (11.92 GB)

Total Size: 18.24 GB

Minimum VRAM 21.22 GB

flux-v1-dev-stoiqo-newreality-alpha-v2-int8

Name Stoiqo NewReality F1.D Alpha V2 (Int8) Image Generation
Author Black Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
License FLUX.1 Non-Commercial License
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.int8.fp16.safetensors (11.92 GB)

Total Size: 18.24 GB

Minimum VRAM 21.22 GB

flux-v1-dev-nf4

Name FluxDevNF4
Author Black Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
License FLUX.1 Non-Commercial License
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-transformer.nf4.bf16.safetensors (6.70 GB)

Total Size: 13.44 GB

Minimum VRAM 14.36 GB

flux-v1-dev-stoiqo-newreality-alpha-v2-nf4

Name Stoiqo NewReality F1.D Alpha V2 (NF4) Image Generation
Author Black Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
License FLUX.1 Non-Commercial License
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.nf4.fp16.safetensors (6.70 GB)

Total Size: 13.44 GB

Minimum VRAM 14.36 GB

flux-v1-schnell

Name FluxSchnell
Author Black Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
License FLUX.1 Non-Commercial License
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-schnell-transformer.bf16.safetensors (23.78 GB)

Total Size: 33.72 GB

Minimum VRAM 29.50 GB

flux-v1-schnell-int8

Name FluxSchnellInt8
Author Black Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
License FLUX.1 Non-Commercial License
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-schnell-transformer.int8.bf16.safetensors (11.91 GB)

Total Size: 18.23 GB

Minimum VRAM 21.22 GB

flux-v1-schnell-sigma-vision-alpha-int8

Name Sigma Vision F1.S Alpha (Int8) Image Generation
Author Black Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
License FLUX.1 Non-Commercial License
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-sigma-vision-alpha-transformer.int8.fp16.safetensors (11.91 GB)

Total Size: 18.23 GB

Minimum VRAM 21.22 GB

flux-v1-schnell-nf4

Name FluxSchnellNF4
Author Black Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
License FLUX.1 Non-Commercial License
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-schnell-transformer.nf4.bf16.safetensors (6.69 GB)

Total Size: 13.44 GB

Minimum VRAM 14.36 GB

video-generation

cogvideox-2b

Name CogVideoX 2B Video Generation
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-transformer-2b.fp16.safetensors (3.39 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 13.34 GB

Minimum VRAM 13.48 GB

cogvideox-2b-int8

Name CogVideoX 2B Video Generation (Int8)
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-transformer-2b.int8.fp16.safetensors (1.70 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 8.04 GB

Minimum VRAM 11.48 GB

cogvideox-5b

Name CogVideoX 5B Video Generation
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-transformer-5b.fp16.safetensors (11.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.10 GB

Minimum VRAM 21.48 GB

cogvideox-5b-int8

Name CogVideoX 5B Video Generation (Int8)
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-transformer-5b.int8.fp16.safetensors (5.58 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 11.92 GB

Minimum VRAM 17.48 GB

cogvideox-5b-nf4

Name CogVideoX 5B Video Generation (NF4)
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-transformer-5b.nf4.fp16.safetensors (3.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 9.90 GB

Minimum VRAM 12.48 GB

cogvideox-i2v-5b

Name CogVideoX 5B Image-to-Video Generation
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.21 GB

Minimum VRAM 21.48 GB

cogvideox-i2v-5b-int8

Name CogVideoX 5B Image-to-Video Generation (Int8)
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 17.59 GB

Minimum VRAM 17.48 GB

cogvideox-i2v-5b-nf4

Name CogVideoX 5B Image-to-Video Generation (NF4)
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-i2v-transformer-5b.nf4.fp16.safetensors (3.25 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 10.01 GB

Minimum VRAM 12.48 GB

cogvideox-v1-5-5b

Name CogVideoX V1.5 5B Video Generation
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-v1-5-transformer-5b.fp16.safetensors (11.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.10 GB

Minimum VRAM 21.48 GB

cogvideox-v1-5-5b-int8

Name CogVideoX V1.5 5B Video Generation (Int8)
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-v1-5-transformer-5b.int8.fp16.safetensors (5.59 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 11.92 GB

Minimum VRAM 17.48 GB

cogvideox-v1-5-5b-nf4

Name CogVideoX V1.5 5B Video Generation (NF4)
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-v1-5-transformer-5b.nf4.fp16.safetensors (3.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 9.90 GB

Minimum VRAM 12.48 GB

cogvideox-v1-5-i2v-5b

Name CogVideoX V1.5 5B Image-to-Video Generation
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-v1-5-i2v-transformer-5b.fp16.safetensors (11.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.10 GB

Minimum VRAM 21.48 GB

cogvideox-v1-5-i2v-5b-int8

Name CogVideoX V1.5 5B Image-to-Video Generation (Int8)
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-v1-5-i2v-transformer-5b.int8.fp16.safetensors (5.59 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 11.92 GB

Minimum VRAM 17.48 GB

cogvideox-v1-5-i2v-5b-nf4

Name CogVideoX V1.5 5B Image-to-Video Generation (NF4)
Author Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
License CogVideoX License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-v1-5-i2v-transformer-5b.nf4.fp16.safetensors (3.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 9.90 GB

Minimum VRAM 12.48 GB

hunyuan

Name Hunyuan Video Generation
Author Hunyuan Foundation Model Team
Tencent
Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
https://arxiv.org/abs/2412.03603
License Tencent Hunyuan Community License
Files
  1. video-generation-hunyuan-vae.safetensors (985.94 MB)
  2. video-generation-hunyuan-transformer.bf16.safetensors (25.64 GB)
  3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
  4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-llava-llama-text-encoder.fp16.safetensors (15.01 GB)
  9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 41.90 GB

Minimum VRAM 38.30 GB

hunyuan-int8

Name Hunyuan Video Generation
Author Hunyuan Foundation Model Team
Tencent
Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
https://arxiv.org/abs/2412.03603
License Tencent Hunyuan Community License
Files
  1. video-generation-hunyuan-vae.safetensors (985.94 MB)
  2. video-generation-hunyuan-transformer.int8.bf16.safetensors (12.84 GB)
  3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
  4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-llava-llama-text-encoder.int8.fp16.safetensors (8.04 GB)
  9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 22.13 GB

Minimum VRAM 23.30 GB

hunyuan-nf4

Name Hunyuan Video Generation
Author Hunyuan Foundation Model Team
Tencent
Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
https://arxiv.org/abs/2412.03603
License Tencent Hunyuan Community License
Files
  1. video-generation-hunyuan-vae.safetensors (985.94 MB)
  2. video-generation-hunyuan-transformer.nf4.bf16.safetensors (7.22 GB)
  3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
  4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-llava-llama-text-encoder.nf4.fp16.safetensors (4.98 GB)
  9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 13.45 GB

Minimum VRAM 14.78 GB

ltx (default)

Name LTX Video Generation
Author Lightricks
https://github.com/Lightricks/LTX-Video
License OpenRAIL-M License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-ltx-transformer.bf16.safetensors (3.85 GB)
  5. video-generation-ltx-vae.safetensors (1.87 GB)

Total Size: 15.24 GB

Minimum VRAM 15.28 GB

ltx-int8

Name LTX Video Generation
Author Lightricks
https://github.com/Lightricks/LTX-Video
License OpenRAIL-M License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-ltx-transformer.int8.bf16.safetensors (1.93 GB)
  5. video-generation-ltx-vae.safetensors (1.87 GB)

Total Size: 9.70 GB

Minimum VRAM 9.72 GB

ltx-nf4

Name LTX Video Generation
Author Lightricks
https://github.com/Lightricks/LTX-Video
License OpenRAIL-M License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-ltx-transformer.nf4.bf16.safetensors (1.08 GB)
  5. video-generation-ltx-vae.safetensors (1.87 GB)

Total Size: 9.28 GB

Minimum VRAM 7.29 GB

mochi-v1

Name Mochi Video Generation
Author Genmo AI
Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024
https://www.genmo.ai/blog
License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-mochi-v1-preview-transformer.bf16.safetensors (20.06 GB)
  5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

Total Size: 30.50 GB

Minimum VRAM 22.95 GB

mochi-v1-int8

Name Mochi Video Generation
Author Genmo AI
Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024
https://www.genmo.ai/blog
License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-mochi-v1-preview-transformer.int8.bf16.safetensors (10.04 GB)
  5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

Total Size: 16.87 GB

Minimum VRAM 15.95 GB

mochi-v1-nf4

Name Mochi Video Generation
Author Genmo AI
Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024
https://www.genmo.ai/blog
License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-mochi-v1-preview-transformer.nf4.bf16.safetensors (5.64 GB)
  5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

Total Size: 12.89 GB

Minimum VRAM 12.41 GB

text-generation

deepseek-r1-llama-8b

Name DeepSeekR1Llama3TextGeneration8B
Author DeepSeek AI
Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025
https://arxiv.org/abs/2501.12948
License MIT, Meta Llama 3 Community License
Files text-generation-deepseek-r1-llama-8b-fp16.gguf
Minimum VRAM 16.20 GB

deepseek-r1-llama-8b-q8-0

Name DeepSeekR1Llama3TextGeneration8BQ80
Author DeepSeek AI
Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025
https://arxiv.org/abs/2501.12948
License MIT, Meta Llama 3 Community License
Files text-generation-deepseek-r1-llama-8b-q8-0.gguf
Minimum VRAM 9.45 GB

deepseek-r1-llama-8b-q6-k

Name DeepSeekR1Llama3TextGeneration8BQ6K
Author DeepSeek AI
Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025
https://arxiv.org/abs/2501.12948
License MIT, Meta Llama 3 Community License
Files text-generation-deepseek-r1-llama-8b-q6-k.gguf
Minimum VRAM 7.73 GB

deepseek-r1-llama-8b-q5-k-m

Name DeepSeekR1Llama3TextGeneration8BQ5KM
Author DeepSeek AI
Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025
https://arxiv.org/abs/2501.12948
License MIT, Meta Llama 3 Community License
Files text-generation-deepseek-r1-llama-8b-q5-k-m.gguf
Minimum VRAM 6.96 GB

deepseek-r1-llama-8b-q4-k-m

Name DeepSeekR1Llama3TextGeneration8BQ4KM
Author DeepSeek AI
Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025
https://arxiv.org/abs/2501.12948
License MIT, Meta Llama 3 Community License
Files text-generation-deepseek-r1-llama-8b-q4-k-m.gguf
Minimum VRAM 6.24 GB

deepseek-r1-llama-8b-q3-k-m

Name DeepSeekR1Llama3TextGeneration8BQ3KM
Author DeepSeek AI
Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025
https://arxiv.org/abs/2501.12948
License MIT, Meta Llama 3 Community License
Files text-generation-deepseek-r1-llama-8b-q3-k-m.gguf
Minimum VRAM 5.44 GB

deepseek-r1-llama-8b-q2-k

Name DeepSeekR1Llama3TextGeneration8BQ2K
Author DeepSeek AI
Published in arXiv, vol. 2501.12948, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, 2025
https://arxiv.org/abs/2501.12948
License MIT, Meta Llama 3 Community License
Files text-generation-deepseek-r1-llama-8b-q2-k.gguf
Minimum VRAM 4.71 GB

llama-v3-8b

Name Llama V3.0 8B Text Generation
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-8b-q8-0.gguf
Minimum VRAM 9.64 GB

llama-v3-8b-q6-k

Name Llama V3.0 8B Text Generation (Q6-K)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-8b-q6-k.gguf
Minimum VRAM 8.10 GB

llama-v3-8b-q5-k-m

Name Llama V3.0 8B Text Generation (Q5-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-8b-q5-k-m.gguf
Minimum VRAM 7.30 GB

llama-v3-8b-q4-k-m

Name Llama V3.0 8B Text Generation (Q4-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-8b-q4-k-m.gguf
Minimum VRAM 6.56 GB

llama-v3-8b-q3-k-m

Name Llama V3.0 8B Text Generation (Q3-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-8b-q3-k-m.gguf
Minimum VRAM 5.72 GB

llama-v3-8b-instruct

Name Llama V3.0 8B Instruct Text Generation
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-8b-instruct-q8-0.gguf
Minimum VRAM 9.64 GB

llama-v3-8b-instruct-q6-k

Name Llama V3.0 8B Instruct Text Generation (Q6-K)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-8b-instruct-q6-k.gguf
Minimum VRAM 8.10 GB

llama-v3-8b-instruct-q5-k-m

Name Llama V3.0 8B Instruct Text Generation (Q5-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-8b-instruct-q5-k-m.gguf
Minimum VRAM 7.30 GB

llama-v3-8b-instruct-q4-k-m

Name Llama V3.0 8B Instruct Text Generation (Q4-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-8b-instruct-q4-k-m.gguf
Minimum VRAM 6.56 GB

llama-v3-8b-instruct-q3-k-m

Name Llama V3.0 8B Instruct Text Generation (Q3-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-8b-instruct-q3-k-m.gguf
Minimum VRAM 5.72 GB

llama-v3-1-8b-instruct

Name Llama V3.1 8B Instruct Text Generation
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-1-8b-instruct-q8-0.gguf
Minimum VRAM 9.64 GB

llama-v3-1-8b-instruct-q6-k (default)

Name Llama V3.1 8B Instruct Text Generation (Q6-K)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-1-8b-instruct-q6-k.gguf
Minimum VRAM 8.10 GB

llama-v3-1-8b-instruct-q5-k-m

Name Llama V3.1 8B Instruct Text Generation (Q5-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-1-8b-instruct-q5-k-m.gguf
Minimum VRAM 7.30 GB

llama-v3-1-8b-instruct-q4-k-m

Name Llama V3.1 8B Instruct Text Generation (Q4-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-1-8b-instruct-q4-k-m.gguf
Minimum VRAM 6.56 GB

llama-v3-1-8b-instruct-q3-k-m

Name Llama V3.1 8B Instruct Text Generation (Q3-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-1-8b-instruct-q3-k-m.gguf
Minimum VRAM 5.72 GB

llama-v3-2-3b-instruct

Name Llama V3.2 3B Instruct Text Generation
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-3b-instruct-f16.gguf
Minimum VRAM 8.04 GB

llama-v3-2-3b-instruct-q8-0

Name Llama V3.2 3B Instruct Text Generation (Q8-0)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-3b-instruct-q8-0.gguf
Minimum VRAM 5.02 GB

llama-v3-2-3b-instruct-q6-k

Name Llama V3.2 3B Instruct Text Generation (Q6-K)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-3b-instruct-q6-k.gguf
Minimum VRAM 4.20 GB

llama-v3-2-3b-instruct-q5-k-m

Name Llama V3.2 3B Instruct Text Generation (Q5-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-3b-instruct-q5-k-m.gguf
Minimum VRAM 3.90 GB

llama-v3-2-3b-instruct-q4-k-m

Name Llama V3.2 3B Instruct Text Generation (Q4-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-3b-instruct-q4-k-m.gguf
Minimum VRAM 3.50 GB

llama-v3-2-3b-instruct-q3-k-l

Name Llama V3.2 3B Instruct Text Generation (Q3-K-L)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-3b-instruct-q3-k-l.gguf
Minimum VRAM 3.10 GB

llama-v3-2-1b-instruct

Name Llama V3.2 1B Instruct Text Generation
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-1b-instruct-f16.gguf
Minimum VRAM 3.60 GB

llama-v3-2-1b-instruct-q8-0

Name Llama V3.2 1B Instruct Text Generation (Q8-0)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-1b-instruct-q8-0.gguf
Minimum VRAM 2.43 GB

llama-v3-2-1b-instruct-q6-k

Name Llama V3.2 1B Instruct Text Generation (Q6-K)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-1b-instruct-q6-k.gguf
Minimum VRAM 2.15 GB

llama-v3-2-1b-instruct-q5-k-m

Name Llama V3.2 1B Instruct Text Generation (Q5-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-1b-instruct-q5-k-m.gguf
Minimum VRAM 2.02 GB

llama-v3-2-1b-instruct-q4-k-m

Name Llama V3.2 1B Instruct Text Generation (Q4-K-M)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-1b-instruct-q4-k-m.gguf
Minimum VRAM 1.64 GB

llama-v3-2-1b-instruct-q3-k-l

Name Llama V3.2 1B Instruct Text Generation (Q3-K-L)
Author Meta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
License Meta Llama 3 Community License
Files text-generation-llama-v3-2-1b-instruct-q3-k-l.gguf
Minimum VRAM 1.58 GB

zephyr-7b-alpha

Name Zephyr 7B α Text Generation (Q8)
Author Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
License MIT License
Files text-generation-zephyr-alpha-7b-q8-0.gguf
Minimum VRAM 9.40 GB

zephyr-7b-alpha-q6-k

Name Zephyr 7B α Text Generation (Q6-K)
Author Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
License MIT License
Files text-generation-zephyr-alpha-7b-q6-k.gguf
Minimum VRAM 8.20 GB

zephyr-7b-alpha-q5-k-m

Name Zephyr 7B α Text Generation (Q5-K-M)
Author Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
License MIT License
Files text-generation-zephyr-alpha-7b-q5-k-m.gguf
Minimum VRAM 7.25 GB

zephyr-7b-alpha-q4-k-m

Name Zephyr 7B α Text Generation (Q4-K-M)
Author Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
License MIT License
Files text-generation-zephyr-alpha-7b-q4-k-m.gguf
Minimum VRAM 6.30 GB

zephyr-7b-alpha-q3-k-m

Name Zephyr 7B α Text Generation (Q3-K-M)
Author Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
License MIT License
Files text-generation-zephyr-alpha-7b-q3-k-m.gguf
Minimum VRAM 5.35 GB

zephyr-7b-beta

Name Zephyr 7B β Text Generation
Author Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
License MIT License
Files text-generation-zephyr-beta-7b-q8-0.gguf
Minimum VRAM 9.40 GB

zephyr-7b-beta-q6-k

Name Zephyr 7B β Text Generation (Q6-K)
Author Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
License MIT License
Files text-generation-zephyr-beta-7b-q6-k.gguf
Minimum VRAM 8.20 GB

zephyr-7b-beta-q5-k-m

Name Zephyr 7B β Text Generation (Q5-K-M)
Author Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
License MIT License
Files text-generation-zephyr-beta-7b-q5-k-m.gguf
Minimum VRAM 7.25 GB

zephyr-7b-beta-q4-k-m

Name Zephyr 7B β Text Generation (Q4-K-M)
Author Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
License MIT License
Files text-generation-zephyr-beta-7b-q4-k-m.gguf
Minimum VRAM 6.30 GB

zephyr-7b-beta-q3-k-m

Name Zephyr 7B β Text Generation (Q3-K-M)
Author Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
License MIT License
Files text-generation-zephyr-beta-7b-q3-k-m.gguf
Minimum VRAM 5.35 GB

visual-question-answering

llava-v1-5-7b

Name LLaVA V1.5 7B Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 14.10 GB

Minimum VRAM 15.80 GB

llava-v1-5-7b-q8

Name LLaVA V1.5 7B (Q8-0) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 7.79 GB

Minimum VRAM 9.90 GB

llava-v1-5-7b-q6-k

Name LLaVA V1.5 7B (Q6-K) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 6.15 GB

Minimum VRAM 8.40 GB

llava-v1-5-7b-q5-k-m

Name LLaVA V1.5 7B (Q5-K-M) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 5.41 GB

Minimum VRAM 7.71 GB

llava-v1-5-7b-q4-k-m

Name LLaVA V1.5 7B (Q4-K-M) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 4.71 GB

Minimum VRAM 7.04 GB

llava-v1-5-7b-q3-k-m

Name LLaVA V1.5 7B (Q3-K-M) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 3.92 GB

Minimum VRAM 6.33 GB

llava-v1-5-13b

Name LLaVA V1.51 13B (Q8-0) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 14.48 GB

Minimum VRAM 17.51 GB

llava-v1-5-13b-q6-k

Name LLaVA V1.51 13B (Q6-K) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 11.32 GB

Minimum VRAM 14.54 GB

llava-v1-5-13b-q5-k-m

Name LLaVA V1.51 13B (Q5-K-M) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 9.88 GB

Minimum VRAM 13.17 GB

llava-v1-5-13b-q4-0

Name LLaVA V1.51 13B (Q4-0) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 8.01 GB

Minimum VRAM 11.48 GB

llava-v1-6-34b-q5-k-m

Name LLaVA V1.6 34B (Q5-K-M) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 25.02 GB

Minimum VRAM 24.96 GB

llava-v1-6-34b-q4-k-m

Name LLaVA V1.6 34B (Q4-K-M) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 21.36 GB

Minimum VRAM 21.88 GB

llava-v1-6-34b-q3-k-m

Name LLaVA V1.6 34B (Q3-K-M) Visual Question Answering
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 17.35 GB

Minimum VRAM 18.06 GB

moondream-v2 (default)

Name Moondream V2 Visual Question Answering
Author Vikhyat Korrapati
Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024
https://huggingface.co/vikhyatk/moondream2
License Apache License 2.0
Files
  1. visual-question-answering-moondream-v2.fp16.gguf (2.84 GB)
  2. image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB)

Total Size: 3.75 GB

Minimum VRAM 4.44 GB

image-captioning

llava-v1-5-7b

Name LLaVA V1.5 7B Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 14.10 GB

Minimum VRAM 15.80 GB

llava-v1-5-7b-q8

Name LLaVA V1.5 7B (Q8-0) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 7.79 GB

Minimum VRAM 9.90 GB

llava-v1-5-7b-q6-k

Name LLaVA V1.5 7B (Q6-K) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 6.15 GB

Minimum VRAM 8.40 GB

llava-v1-5-7b-q5-k-m

Name LLaVA V1.5 7B (Q5-K-M) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 5.41 GB

Minimum VRAM 7.71 GB

llava-v1-5-7b-q4-k-m

Name LLaVA V1.5 7B (Q4-K-M) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 4.71 GB

Minimum VRAM 7.04 GB

llava-v1-5-7b-q3-k-m

Name LLaVA V1.5 7B (Q3-K-M) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 3.92 GB

Minimum VRAM 6.33 GB

llava-v1-5-13b

Name LLaVA V1.51 13B (Q8-0) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 14.48 GB

Minimum VRAM 17.51 GB

llava-v1-5-13b-q6-k

Name LLaVA V1.51 13B (Q6-K) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 11.32 GB

Minimum VRAM 14.54 GB

llava-v1-5-13b-q5-k-m

Name LLaVA V1.51 13B (Q5-K-M) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 9.88 GB

Minimum VRAM 13.17 GB

llava-v1-5-13b-q4-0

Name LLaVA V1.51 13B (Q4-0) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 8.01 GB

Minimum VRAM 11.48 GB

llava-v1-6-34b-q5-k-m

Name LLaVA V1.6 34B (Q5-K-M) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 25.02 GB

Minimum VRAM 24.96 GB

llava-v1-6-34b-q4-k-m

Name LLaVA V1.6 34B (Q4-K-M) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 21.36 GB

Minimum VRAM 21.88 GB

llava-v1-6-34b-q3-k-m

Name LLaVA V1.6 34B (Q3-K-M) Image Captioning
Author Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
License Meta Llama 2 Community License
Files
  1. visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 17.35 GB

Minimum VRAM 18.06 GB

moondream-v2 (default)

Name Moondream V2 Image Captioning
Author Vikhyat Korrapati
Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024
https://huggingface.co/vikhyatk/moondream2
License Apache License 2.0
Files
  1. visual-question-answering-moondream-v2.fp16.gguf (2.84 GB)
  2. image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB)

Total Size: 3.75 GB

Minimum VRAM 4.44 GB

Clone this wiki locally