Skip to content

Draft of models to benchmark  #2

@betogaona7

Description

@betogaona7

Maximum of 26Gb VRAM

Without quantization, we can run models up to 7B params.

  • togethercomputer/LLaMA-2-7B-32K (we can use 13B version quantized)
  • WizardLM/WizardLM-70B-V1.0 - WizardLM-7B-V1.0 (we can use 30B version quantized)
  • ehartford/dolphin-llama2-7b
  • jondurbin/airoboros-l2-7b-gpt4-2.0 (we can use 13B version quantized)
  • lmsys/vicuna-7b-v1.5-16k (we can use 13B version quantized)
  • conceptofmind/Hermes-LLongMA-2-7b-8k
  • stabilityai/StableBeluga-7B

Quantized models

  • iambestfeed/open_llama_3b_4bit_128g
  • TheBloke/airoboros-13B-GPTQ
  • reeducator/bluemoonrp-13b
  • TehVenom/Metharme-13b-4bit-GPTQ
  • TheBloke/airoboros-13B-GPTQ
  • TheBloke/gpt4-x-vicuna-13B-GPTQ
  • TheBloke/GPT4All-13B-snoozy-GGML / TheBloke/GPT4All-13B-snoozy-GPTQ
  • TheBloke/guanaco-33B-GPTQ
  • TheBloke/h2ogpt-oasst1-512-30B-GPTQ
  • TheBloke/koala-13B-GPTQ-4bit-128g
  • TheBloke/Llama-2-13B-GPTQ
  • TheBloke/Manticore-13B-GPTQ
  • TheBloke/Nous-Hermes-13B-GPTQ
  • TheBloke/tulu-30B-GPTQ
  • TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g
  • TheBloke/WizardLM-30B-Uncensored-GPTQ

To quantize a mode, we can try https://github.com/PanQiWei/AutoGPTQ

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions