- This is CPU-only queue-based inference system for running local LLMs under constrained resources.
- I built this out of curiosity to learn Queue + LLM inference on CPU.
- I made this
README.mdas readable as possible i hope this helps.
-
Notifications
You must be signed in to change notification settings - Fork 0
thenameisvicky/llm-wrapper
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
llm inference pipeline with async workload orchestration