Replit Inference Server

This is a simple inference server to load ggml versions of Replit models (ex: https://huggingface.co/teknium/Replit-v2-CodeInstruct-3B and its ggml version: https://huggingface.co/abacaj/Replit-v2-CodeInstruct-3B-ggml) and use them as OpenAI compatible REST API servers. This is so that they can be used as backend model for continue.dev VS Code extension

Install miniconda3 (to create a virtual environment). Or you could use virtualenv or poetry as well
Create an environment
pip install -r requirements.txt
python app.py or use python app.py --help to see available flags
Install continue.dev from VS Code extensions
Use instructions from https://continue.dev/docs/customization#local-models-with-ggml to configure continue to use ggml models. However, instead of using their "5 minute quickstart" server, use this server instead

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback