-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Current status
Currently support limited engines. Many potentially relevant engines are missing - some were previously integrated but dropped, others are new alternatives with promising performance.
- Tier 1 (High Priority)
- vLLM
- torch+transformers (true baseline)
- TRT-LLM (NVIDIA's, main competitor)
- Tier 2 (Medium Priority)
- SGLang
- any other inference backend (w/ server-mode) that could compete w/ vLLM
- Tier 3
- optimum-habana (Gaudi2/3)
- TGI/TGIS
- llama.cpp
- LMDeploy
- any other backend that might implement a few optimization techniques that we should have a look at
Plan to be initiated
- This issue serves as a tracking placeholder. Detailed implementation plan will be developed after initial investigation of each engine's integration requirements and compatibility with our framework.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels