Let's help help help devs.
--help should take <200ms. Most LLM CLI tools take 10+ seconds because they import torch/transformers just to print usage text.
| library | cold | warm (10 runs) | version | measured on |
|---|---|---|---|---|
| vllm | 15610ms | 7473ms | 0.14.1+cpu | 2026-01-28T21:50Z |
| sglang | 15662ms | 6825ms | v0.5.9 | 2026-02-25T11:16Z |
| VLMEvalKit | 13936ms | 5076ms | v0.2 | 2026-02-25T11:21Z |
| transformers | 0ms | 0ms | 5.2.0 | 2026-02-25T11:04Z |
| tensorrt-llm | 7467ms | 2600ms | 1.0.0 | 2026-02-25T11:14Z |
| datasets | 3074ms | 907ms | 4.5.0 | 2026-02-25T11:02Z |
| llm | 1205ms | 569ms | 0.28 | 2026-02-25T11:03Z |
| openai | 1053ms | 504ms | 2.24.0 | 2026-02-25T11:03Z |
| langchain-cli | 845ms | 259ms | 0.0.37 | 2026-02-25T11:03Z |
| hf | 990ms | 331ms | 1.4.1 | 2026-02-25T11:02Z |
| lm-eval | 800ms | 228ms | 0.4.11 | 2026-02-25T11:04Z |
| llama.cpp | 19ms | 13ms | b8149 | 2026-02-25T11:04Z |
| ollama | 15ms | 15ms | 0.17.0 | 2026-02-25T11:02Z |
Last updated: 2026-03-11 00:19 UTC