Skip to content

llm inference pipeline with async workload orchestration

Notifications You must be signed in to change notification settings

thenameisvicky/llm-wrapper

Repository files navigation

Description

  • This is CPU-only queue-based inference system for running local LLMs under constrained resources.
  • I built this out of curiosity to learn Queue + LLM inference on CPU.
  • I made this README.md as readable as possible i hope this helps.

Architecture

About

llm inference pipeline with async workload orchestration

Topics

Resources

Stars

Watchers

Forks