ziang663

Follow

ziang663

Follow

0 followers · 2 following

Popular repositories Loading

ScaleLLM ScaleLLM Public

Forked from vectorch-ai/ScaleLLM

A high-performance inference system for large language models, designed for production environments.

C++
flash-attention flash-attention Public

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python
custom_op custom_op Public

this is my cuda kernel room

Cuda
Megatron-LM Megatron-LM Public

Forked from NVIDIA/Megatron-LM

Ongoing research training transformer models at scale

Python
vllm-091 vllm-091 Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python