I'm excited to try this project (it actually might run on my GPU!)
I actually found this project by bouncing through a news article, to vllm-project/vllm, and then after I didn't have the hardware for it, searching and finding this project.
Do you think this could run faster using ideas from vllm-project/vllm?