Skip to content

ztzhu1/CMU-11868

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

My implementation for CMU 11868: Large Language Model Systems assignments. This repository focuses on CUDA-related parts, including CUDA map/reduce kernels, tiled matrix multiplication, and custom forward/backward pass kernels for Softmax and LayerNorm. Automatic differentiation is also implemented. You can learn the fundamental CUDA programming knowledge like memory hierarchy, warp reduction, parallelization strategies, and efficient kernel implementation.

About

Implementation of CMU 11868 assignments, focusing on CUDA kernels, custom operators, and automatic differentiation for LLM systems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors