This is a simple inference server to load ggml versions of Replit models (ex: https://huggingface.co/teknium/Replit-v2-CodeInstruct-3B and its ggml version: https://huggingface.co/abacaj/Replit-v2-CodeInstruct-3B-ggml) and use them as OpenAI compatible REST API servers. This is so that they can be used as backend model for continue.dev VS Code extension
- https://github.com/abacaj/replit-3B-inference/blob/main/inference.py (Using
ctransformersfor inference of replit models) - marella/ctransformers#26 (comment) (Using
ctransformersto wrap them around APIs) - https://huggingface.co/spaces/matthoffner/wizardcoder-ggml/blob/main/main.py (Using
ctransformersto wrap them around APIs) - https://github.com/continuedev/continue/blob/main/continuedev/src/continuedev/libs/llm/ggml.py (The endpoints used by
GGMLclass in continue.dev extension)
- Install
miniconda3(to create a virtual environment). Or you could usevirtualenvorpoetryas well - Create an environment
pip install -r requirements.txtpython app.pyor usepython app.py --helpto see available flags- Install
continue.devfrom VS Code extensions - Use instructions from https://continue.dev/docs/customization#local-models-with-ggml to configure
continueto use ggml models. However, instead of using their "5 minute quickstart" server, use this server instead