This repository archives my notes and materials during my computer science self-learning jouney. Currently, I mainly focus on LLM/VLM inference engine and GPU/NPU computing, thus I have gathered many technical blogs for AI infra beginners and MLSys papers for researchers.
🔍 Contents:
In addition, I have also published some technical blogs on the internet, you can read them at links below.
- 📖 My technical blogs: Zhihu, Personal Website.
😊 Welcome to star this repository!
- Programming Languages:
- Data Structure & Algorithm:
- Network
- Operating System
- Design Pattern
- Mathematics:
- Deep Learning:
- LLM:
- AI Infra:
- Roadmap
- Backend Development:
- Big Data Development:
- Open Source Best Practices
- Research:
- Employment:
| Title | Category | Author | Note | Rec | Read |
|---|---|---|---|---|---|
| PyTorch 显存管理介绍与源码解析(一) | Memory | @kaiyuan | ⭐️⭐️⭐️⭐️ | ✅ | |
| PyTorch 显存可视化与 Snapshot 数据分析 | Memory | @kaiyuan | ⭐️⭐️⭐️⭐️ | ✅ |
| Title | Category | Author | Note | Rec | Read |
|---|---|---|---|---|---|
| CUDA 内核优化策略 | Performance | @Zhang | ⭐️⭐️⭐️ | ✅ | |
| 从啥也不会到 CUDA GEMM 优化 | Performance | @猛猿 | ⭐️⭐️⭐️⭐️ | ✅ |
| Title | Category | Author | Note | Rec | Read |
|---|---|---|---|---|---|
| NCCL: Collective Operations | Collective Communication | @NVIDIA Developer | 集合通信常用操作 | ⭐️⭐️⭐️⭐️⭐️ | ✅ |
| 一文读懂|RDMA 原理 | Network | @Linux内核库 | ⭐️⭐️⭐️ | ✅ |
| Title | Category | Author | Note | Rec | Read |
|---|---|---|---|---|---|
| 多模态技术梳理:ViT 系列 | ViT | @姜富春 | ViT 研究综述 | ⭐️⭐️⭐️ | ✅ |
| ViT 论文速读 | ViT | @Zhang | ⭐️⭐️ | ✅ | |
| LLaVA 系列模型结构详解 | ViT | @Zhang | ⭐️⭐️⭐️ | ✅ |
| Title | Category | Author | Note | Rec | Read |
|---|---|---|---|---|---|
| 多模态技术梳理:Qwen-VL 系列 | VL | @姜富春 | ⭐️⭐️⭐️⭐️ | ✅ | |
| Qwen2-VL 源码解读:从准备一条样本到模型生成全流程图解 | VL | @姜富春 | ⭐️⭐️⭐️⭐️⭐️ | ✅ | |
| 万字长文图解 Qwen2.5-VL 实现细节 | VL | @猛猿 | ⭐️⭐️⭐️⭐️⭐️ | ✅ |
| Title | Category | Author | Note | Rec | Read |
|---|---|---|---|---|---|
| DeepSeek 技术解读(1)- 彻底理解 MLA(Multi-Head Latent Attention) | Attention | @姜富春 | ⭐️⭐️⭐️⭐️ | ✅ | |
| DeepSeek 技术解读(2)- MTP(Multi-Token Prediction)的前世今生 | Parallel Decoding | @姜富春 | ⭐️⭐️⭐️⭐️ | ✅ | |
| DeepSeek 技术解读(3)- MoE 的演进之路 | MoE | @姜富春 | ⭐️⭐️⭐️⭐️ | ✅ |
| Title | Category | Author | Note | Rec | Read |
|---|---|---|---|---|---|
| LLM Inference 高效 Debug 方法汇总 | Debug | @CarryPls | vLLM Debug 经验 | ⭐️⭐️ | ✅ |
| 推理性能优化:GPU/NPU Profiling 阅读引导 | Profiling | @kaiyuan | ⭐️⭐️⭐️⭐️ | ✅ |
Refer to How to Read a Paper to master a practical and efficient three-pass method for reading research papers.
Clarification for symbols in the following tables:
- ✅: The first pass that gives you a general idea about the paper.
- ✅ ✅: The second pass that lets you grasp the paper's content, but not its details.
- ✅ ✅ ✅: The third pass that helps you understand the paper in depth.
| Title | Date | arXiv | GitHub | Note | Read |
|---|---|---|---|---|---|
| Mamba: Linear-Time Sequence Modeling with Selective State Spaces | 2023/12 | Mamba | link | ✅ | |
| Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality | 2024/05 |
| Title | Date | arXiv | GitHub | Note | Read |
|---|
| Title | Date | arXiv | GitHub | Note | Read |
|---|---|---|---|---|---|
| Efficient Memory Management for Large Language Model Serving with PagedAttention | 2023/09 | vLLM | ✅ ✅ ✅ | ||
| SGLang: Efficient Execution of Structured Language Model Programs | 2023/12 | SGLang | ✅ | ||
| A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency | 2025/05 |
| Title | Date | arXiv | GitHub | Note | Read |
|---|
| Title | Date | arXiv | GitHub | Note | Read |
|---|---|---|---|---|---|
| Robust Text-to-SQL Generation with Execution-Guided Decoding | 2018/07 | ✅ | |||
| Efficient Guided Generation for Large Language Models | 2023/07 | Outlines | ✅ ✅ | ||
| XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | 2024/11 | XGrammar | |||
| Pre3: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation | 2025/06 |
| Title | Date | arXiv | GitHub | Note | Read |
|---|
| Title | Date | arXiv | GitHub | Note | Read |
|---|---|---|---|---|---|
| MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache | 2024/01 | ||||
| ProMoE: Fast MoE-based LLM Serving using Proactive Caching | 2024/10 | ✅ |
| Title | Date | arXiv | GitHub | Note | Read |
|---|---|---|---|---|---|
| AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications | 2025/03 | ||||
| Serving Large Language Models on Huawei CloudMatrix384 | 2025/06 |
| Title | Date | arXiv | GitHub | Note | Read |
|---|---|---|---|---|---|
| Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection | 2024/11 | ||||
| Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models | 2024/11 |
| Title | Date | arXiv | GitHub | Note | Read |
|---|---|---|---|---|---|
| CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion | 2024/05 | LMCache | |||
| Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving | 2024/07 | Mooncake |
| Project | Category | Author/Organization | About |
|---|---|---|---|
| llm-action | LLM | @liguodongiot | 本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)。 |
| awesomeMLSys | MLSys | @GPU MODE | An ML Systems Onboarding list. |
| InfraTech | MLSys | @CalvinXKY | 分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等。 |
| AI-Infra-from-Zero-to-Hero | MLSys | @HuaizhengZhang | 🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑💻 Video Tutorials. |
| resource-stream | CUDA | @GPU MODE | GPU programming related news and material links. |
| BasicCUDA | CUDA | @CalvinXKY | A tutorial for CUDA & PyTorch. |
@misc{cs-self-learning@2023,
title = {cs-self-learning},
url = {https://github.com/shen-shanshan/cs-self-learning},
note = {Open-source software available at https://github.com/shen-shanshan/cs-self-learning},
author = {shen-shanshan},
year = {2023}
}MIT License, find more details here.
