| Backend | Model | Features |
Acceleration |
Chronicler |
|---|---|---|---|---|
| pytorch | llama_orig | - adepter support - visual input (multimodal adapter) |
CUDA MPS |
instruct |
| pytorch | llama_hf | - lora support |
CUDA | instruct |
| pytorch | gpt-2, gpt-j, auto-model | CUDA | instruct |
|
| llama.cpp remote-lcpp |
any llama-based |
- quantized GGML model support - lora support - built-in GPU acceleration - memory manager support - visual input (currently only in llama.cpp server) |
CPU CUDA Metal |
instruct |
| mlc-pb | only brebuilt in mlc-chat |
- quantized MLC model support |
CUDA Vulkan Metal |
raw |
| remote_ob | any supported by oobabooga webui and kobold.cpp |
- all features of Oobabooga webui, including GPTQ support that are available via API - all features of Kobold.cpp that are available via API - memory manager support |
+ |
instruct |