Draft of models to benchmark 

Maximum of 26Gb VRAM

**Without quantization, we can run models up to 7B params.** 
- togethercomputer/LLaMA-2-7B-32K  (we can use 13B version quantized)
- WizardLM/WizardLM-70B-V1.0 - WizardLM-7B-V1.0 (we can use 30B version quantized)
- ehartford/dolphin-llama2-7b
- jondurbin/airoboros-l2-7b-gpt4-2.0 (we can use 13B version quantized)
- lmsys/vicuna-7b-v1.5-16k  (we can use 13B version quantized)
- conceptofmind/Hermes-LLongMA-2-7b-8k
- stabilityai/StableBeluga-7B

**Quantized models**
- iambestfeed/open_llama_3b_4bit_128g
- TheBloke/airoboros-13B-GPTQ
- reeducator/bluemoonrp-13b
- TehVenom/Metharme-13b-4bit-GPTQ
- TheBloke/airoboros-13B-GPTQ
- TheBloke/gpt4-x-vicuna-13B-GPTQ
- TheBloke/GPT4All-13B-snoozy-GGML / TheBloke/GPT4All-13B-snoozy-GPTQ
- TheBloke/guanaco-33B-GPTQ
- TheBloke/h2ogpt-oasst1-512-30B-GPTQ
- TheBloke/koala-13B-GPTQ-4bit-128g
- TheBloke/Llama-2-13B-GPTQ
- TheBloke/Manticore-13B-GPTQ
- TheBloke/Nous-Hermes-13B-GPTQ
- TheBloke/tulu-30B-GPTQ
- TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g
- TheBloke/WizardLM-30B-Uncensored-GPTQ


To quantize a mode, we can try https://github.com/PanQiWei/AutoGPTQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft of models to benchmark #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Draft of models to benchmark #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions