Skip to content
Change the repository type filter

All

    Repositories list

    • A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
      Python
      1.2k16k3933Updated Jan 6, 2026Jan 6, 2026
    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      4964.5k22874Updated Jan 6, 2026Jan 6, 2026
    • sglang

      Public
      SGLang is a fast serving framework for large language models and vision language models.
      Python
      4k501Updated Jan 6, 2026Jan 6, 2026
    • DeepEP: an efficient expert-parallel communication library that supports fault tolerance
      Cuda
      1.1k300Updated Jan 5, 2026Jan 5, 2026
    • sglang_awq

      Public
      SGLang is a fast serving framework for large language models and vision language models.
      Python
      4k100Updated Dec 17, 2025Dec 17, 2025
    • gpustack

      Public
      GPU cluster manager for optimized AI model deployment
      Python
      441000Updated Dec 7, 2025Dec 7, 2025
    • TrEnv-X

      Public
      Go
      27200Updated Sep 15, 2025Sep 15, 2025
    • SGLang is a fast serving framework for large language models and vision language models.
      Python
      4k000Updated Aug 12, 2025Aug 12, 2025
    • FlashInfer: Kernel Library for LLM Serving
      Cuda
      624500Updated Jul 24, 2025Jul 24, 2025
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      12k1400Updated Mar 27, 2025Mar 27, 2025