-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Maximum of 26Gb VRAM
Without quantization, we can run models up to 7B params.
- togethercomputer/LLaMA-2-7B-32K (we can use 13B version quantized)
- WizardLM/WizardLM-70B-V1.0 - WizardLM-7B-V1.0 (we can use 30B version quantized)
- ehartford/dolphin-llama2-7b
- jondurbin/airoboros-l2-7b-gpt4-2.0 (we can use 13B version quantized)
- lmsys/vicuna-7b-v1.5-16k (we can use 13B version quantized)
- conceptofmind/Hermes-LLongMA-2-7b-8k
- stabilityai/StableBeluga-7B
Quantized models
- iambestfeed/open_llama_3b_4bit_128g
- TheBloke/airoboros-13B-GPTQ
- reeducator/bluemoonrp-13b
- TehVenom/Metharme-13b-4bit-GPTQ
- TheBloke/airoboros-13B-GPTQ
- TheBloke/gpt4-x-vicuna-13B-GPTQ
- TheBloke/GPT4All-13B-snoozy-GGML / TheBloke/GPT4All-13B-snoozy-GPTQ
- TheBloke/guanaco-33B-GPTQ
- TheBloke/h2ogpt-oasst1-512-30B-GPTQ
- TheBloke/koala-13B-GPTQ-4bit-128g
- TheBloke/Llama-2-13B-GPTQ
- TheBloke/Manticore-13B-GPTQ
- TheBloke/Nous-Hermes-13B-GPTQ
- TheBloke/tulu-30B-GPTQ
- TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g
- TheBloke/WizardLM-30B-Uncensored-GPTQ
To quantize a mode, we can try https://github.com/PanQiWei/AutoGPTQ
Metadata
Metadata
Assignees
Labels
No labels