Skip to content

This creates a minimal, openai api compatible inference server to run models supported by ctransformers (Ex: replit, wizardcoder)

License

Notifications You must be signed in to change notification settings

taghash/ctransformers-inference-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Replit Inference Server

This is a simple inference server to load ggml versions of Replit models (ex: https://huggingface.co/teknium/Replit-v2-CodeInstruct-3B and its ggml version: https://huggingface.co/abacaj/Replit-v2-CodeInstruct-3B-ggml) and use them as OpenAI compatible REST API servers. This is so that they can be used as backend model for continue.dev VS Code extension

References:

Setting up

  • Install miniconda3 (to create a virtual environment). Or you could use virtualenv or poetry as well
  • Create an environment
  • pip install -r requirements.txt
  • python app.py or use python app.py --help to see available flags
  • Install continue.dev from VS Code extensions
  • Use instructions from https://continue.dev/docs/customization#local-models-with-ggml to configure continue to use ggml models. However, instead of using their "5 minute quickstart" server, use this server instead

About

This creates a minimal, openai api compatible inference server to run models supported by ctransformers (Ex: replit, wizardcoder)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages