Computer Science Self-Learning Notes

📌 Overview

This repository archives my notes and materials during my computer science self-learning jouney. Currently, I mainly focus on LLM/VLM inference engine and GPU/NPU computing, thus I have gathered many technical blogs for AI infra beginners and MLSys papers for researchers.

🔍 Contents:

📚 Learning Notes
📚 Technical Blogs
📚 Papers
📚 Learning Projects

In addition, I have also published some technical blogs on the internet, you can read them at links below.

📖 My technical blogs: Zhihu, Personal Website.

😊 Welcome to star this repository!

📚 Learning Notes

🧱 Basic Knowledges

Programming Languages:
- Python
- C/C++
- Java
- Go
Data Structure & Algorithm:
- Data Structure & Algorithm
- LeetCode Practices
Network
Operating System
Design Pattern

🤖 AI

Mathematics:
Deep Learning:
- Basic Knowledges
- PyTorch Tutorial
LLM:
AI Infra:
- Environment Preparation
- Basic Knowledges
- Inference Engine
  - vLLM
- HPC
  - CUDA
  - CANN
- Communication
  - NCCL
- Distributed System
  - Ray

🚀 Backend & Big Data

Roadmap
Backend Development:
- Spring
- MySQL
- Redis
- Oracle
Big Data Development:
- ElasticSearch
- Flink
- Hudi

🛠️ Tools

Git
Docker
CMake
Documentation:
- Markdown
- reStructuredText
IDE:
- VSCode Shortcut Key
AI Agent:
- Skills

🔗 Others

Open Source Best Practices
Research:
- Research Notes
- Popular MLSys Papers
Employment:
- Interview Experience
- Working Insights

📚 Technical Blogs

📖 Basic Knowledges

Title	Category	Author	Note	Rec	Read
The Illustrated Transformer	Transformer	@Jay Alammar	Transformer 原理详解	⭐️⭐️⭐️⭐️⭐️	✅
The Illustrated GPT-2 (Visualizing Transformer Language Models)	Transformer	@Jay Alammar	Transformer 推理过程	⭐️⭐️⭐️⭐️⭐️	✅
图文详解 LLM inference：KV Cache	KV Cache	@季叶		⭐️⭐️⭐️	✅
Mixture of Experts Explained	MoE	@HuggingFace Blog	MoE 综述	⭐️⭐️⭐️⭐️	✅
MoE 并行负载均衡：EPLB 的深度解析与可视化	MoE	@kaiyuan		⭐️⭐️⭐️	✅
LLM 推理并行优化的必备知识	Parallel Strategy	@kaiyuan
分布式推理优化思路	Parallel Strategy	@kaiyuan
The Ultra-Scale Playbook: Training LLMs on GPU Clusters	Parallel Strategy	@HuggingFace Blog
图解大模型计算加速系列：分离式推理架构 1，从 DistServe 谈起/u>	PD Disaggregation	@猛猿	PD 分离原理详解	⭐️⭐️⭐️⭐️	✅
图解大模型计算加速系列：分离式推理架构 2，模糊分离与合并边界的 chunked-prefills	Schedule	@猛猿		⭐️⭐️⭐️⭐️	✅
LLM 推理提速：Attention 与 FFN 分离方案解析	AF Disaggregation	@kaiyuan	AF 分离原理详解	⭐️⭐️⭐️	✅
Step-3 AF 分离推理系统 vs Deepseek EP 推理系统，谁更好？	AF Disaggregation	@不归牛顿管的熊猫	AF 分离与大 EP 优劣对比	⭐️⭐️	✅
Step-3 推理系统：从 PD 分离到 AF 分离（AFD）	AF Disaggregation	@Yibo Zhu	Step3 作者杂谈	⭐️⭐️	✅
GPU 内存（显存）的理解与基本使用	Hardware	@kaiyuan		⭐️⭐️⭐️⭐️	✅

📖 Dive into vLLM

Title	Category	Author	Note	Rec	Read
Inside vLLM: Anatomy of a High-Throughput LLM Inference System	Overview	@vLLM Blog	vLLM 全面详解	⭐️⭐️⭐️⭐️⭐️	✅
vLLM V1 整体流程｜从请求到算子执行	Architecture	@SSS不知-道	vLLM 推理流程	⭐️⭐️⭐️⭐️⭐️	✅
图解 vLLM V1 系列 1：整体流程	Architecture	@猛猿		⭐️⭐️⭐️	✅
图解 vLLM V1 系列 2：Executor-Workers 架构	Architecture	@猛猿		⭐️⭐️⭐️	✅
图解 vLLM V1 系列 3：KV Cache 初始化	KV Cache	@猛猿		⭐️⭐️⭐️	✅
图解 vLLM V1 系列 4：加载模型权重	Model	@猛猿		⭐️⭐️	✅
vLLM 模型权重加载：使用 setattr	Model	@风之魔术师		⭐️⭐️	✅
ColumnParallelLinear 和 RowParallelLinear	Model	@风之魔术师		⭐️⭐️	✅
图解 vLLM V1 系列 5：调度器策略	Scheduler	@猛猿		⭐️⭐️⭐️⭐️	✅
Introducing vLLM Hardware Plugin, Best Practice from Ascend NPU	Platform	@The Ascend Team on vLLM	vLLM 硬件插件化机制	⭐️⭐️⭐️	✅
vLLM 算力多样性｜Platform 插件与 CustomOp	Platform	@SSS不知-道		⭐️⭐️⭐️⭐️	✅
vLLM 算子开发流程：“保姆级”详细记录	Kernel	@DefTruth		⭐️⭐️⭐️⭐️⭐️	✅
Introduction to torch.compile and How It Works with vLLM	Graph	@vLLM Blog		⭐️⭐️	✅
vLLM torch.compile Integration	Graph	@Jiangyun Zhu	自定义 Pass 方法	⭐️⭐️⭐️	✅
vLLM 为什么没在 Prefill 阶段支持 Cuda Graph？	Graph	@kaiyuan		⭐️⭐️⭐️	✅
vLLM 显存管理详解	Memory	@kaiyuan		⭐️⭐️⭐️⭐️	✅
vLLM DP 特性与演进方案分析	Parallel Strategy	@kaiyuan		⭐️⭐️⭐️⭐️	✅
LLM 推理数据并行负载均衡（DPLB）浅析	Parallel Strategy	@kaiyuan		⭐️⭐️⭐️	✅
vLLM PD 分离方案浅析	PD Disaggregation	@kaiyuan		⭐️⭐️⭐️	✅
vLLM PD 分离 KV Cache 传递机制详解与演进分析	PD Disaggregation	@kaiyuan		⭐️⭐️⭐️	✅
vLLM 结构化输出｜Guided Decoding (V0)	Guided Decoding	@SSS不知-道		⭐️⭐️⭐️	✅
vLLM 结构化输出｜Guided Decoding (V1)	Guided Decoding	@SSS不知-道		⭐️⭐️⭐️	✅
vLLM 多模态推理｜卷积计算加速	Multi-Modal	@SSS不知-道		⭐️⭐️	✅

📖 Dive into PyTorch

Title	Category	Author	Note	Rec	Read
PyTorch 显存管理介绍与源码解析（一）	Memory	@kaiyuan		⭐️⭐️⭐️⭐️	✅
PyTorch 显存可视化与 Snapshot 数据分析	Memory	@kaiyuan		⭐️⭐️⭐️⭐️	✅

📖 CUDA Programming

Title	Category	Author	Note	Rec	Read
CUDA 内核优化策略	Performance	@Zhang		⭐️⭐️⭐️	✅
从啥也不会到 CUDA GEMM 优化	Performance	@猛猿		⭐️⭐️⭐️⭐️	✅

📖 Communication

Title	Category	Author	Note	Rec	Read
NCCL: Collective Operations	Collective Communication	@NVIDIA Developer	集合通信常用操作	⭐️⭐️⭐️⭐️⭐️	✅
一文读懂｜RDMA 原理	Network	@Linux内核库		⭐️⭐️⭐️	✅

📖 Multi-Modality

Title	Category	Author	Note	Rec	Read
多模态技术梳理：ViT 系列	ViT	@姜富春	ViT 研究综述	⭐️⭐️⭐️	✅
ViT 论文速读	ViT	@Zhang		⭐️⭐️	✅
LLaVA 系列模型结构详解	ViT	@Zhang		⭐️⭐️⭐️	✅

📖 Dive into Qwen

Title	Category	Author	Rec	Read
多模态技术梳理：Qwen-VL 系列	VL	@姜富春	⭐️⭐️⭐️⭐️	✅
Qwen2-VL 源码解读：从准备一条样本到模型生成全流程图解	VL	@姜富春	⭐️⭐️⭐️⭐️⭐️	✅
万字长文图解 Qwen2.5-VL 实现细节	VL	@猛猿	⭐️⭐️⭐️⭐️⭐️	✅

📖 Dive into DeepSeek

Title	Category	Author	Rec	Read
DeepSeek 技术解读（1）- 彻底理解 MLA（Multi-Head Latent Attention）	Attention	@姜富春	⭐️⭐️⭐️⭐️	✅
DeepSeek 技术解读（2）- MTP（Multi-Token Prediction）的前世今生	Parallel Decoding	@姜富春	⭐️⭐️⭐️⭐️	✅
DeepSeek 技术解读（3）- MoE 的演进之路	MoE	@姜富春	⭐️⭐️⭐️⭐️	✅

📖 Development

Title	Category	Author	Note	Rec	Read
LLM Inference 高效 Debug 方法汇总	Debug	@CarryPls	vLLM Debug 经验	⭐️⭐️	✅
推理性能优化：GPU/NPU Profiling 阅读引导	Profiling	@kaiyuan		⭐️⭐️⭐️⭐️	✅

📚 Papers

Refer to How to Read a Paper to master a practical and efficient three-pass method for reading research papers.

Clarification for symbols in the following tables:

✅: The first pass that gives you a general idea about the paper.
✅ ✅: The second pass that lets you grasp the paper's content, but not its details.
✅ ✅ ✅: The third pass that helps you understand the paper in depth.

📖 LLM Backbone

Title	Date	arXiv	GitHub	Note	Read
Mamba: Linear-Time Sequence Modeling with Selective State Spaces	2023/12		Mamba	link	✅
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality	2024/05

📖 LLM Inference Survey

Title	Date	arXiv	GitHub	Note	Read

📖 Framework

Title	Date	GitHub	Read
Efficient Memory Management for Large Language Model Serving with PagedAttention	2023/09	vLLM	✅ ✅ ✅
SGLang: Efficient Execution of Structured Language Model Programs	2023/12	SGLang	✅
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency	2025/05

📖 Schedule

Title	Date	arXiv	GitHub	Note	Read

📖 Speculative Decoding

Title	Date	Note	Read
Blockwise Parallel Decoding for Deep Autoregressive Models	2018/11
Fast Inference from Transformers via Speculative Decoding	2022/11	link	✅
Accelerating Large Language Model Decoding with Speculative Sampling	2023/02
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification	2023/05	link	✅
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding	2024/01
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads	2024/01	link	✅
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty	2024/01	link	✅
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding	2024/02	link	✅
Accelerating Production LLMs with Combined Token/Embedding Speculators	2024/04
Better & Faster Large Language Models via Multi-token Prediction	2024/04
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput	2024/06
Scaling Speculative Decoding with Lookahead Reasoning	2025/06

📖 Guided Decoding

Title	Date	GitHub	Read
Robust Text-to-SQL Generation with Execution-Guided Decoding	2018/07		✅
Efficient Guided Generation for Large Language Models	2023/07	Outlines	✅ ✅
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models	2024/11	XGrammar
Pre³: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation	2025/06

📖 Long Sequence Processing

Title	Date	arXiv	GitHub	Note	Read

📖 Memory Offloading

Title	Date	arXiv	GitHub	Note	Read
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache	2024/01
ProMoE: Fast MoE-based LLM Serving using Proactive Caching	2024/10				✅

📖 Large Scale Serving

Title	Date	arXiv	GitHub	Note	Read
AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications	2025/03
Serving Large Language Models on Huawei CloudMatrix384	2025/06

📖 Load Balancing

Title	Date	arXiv	GitHub	Note	Read
Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection	2024/11
Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models	2024/11

📖 KVCache Store

Title	Date	arXiv	GitHub	Note	Read
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion	2024/05		LMCache
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving	2024/07		Mooncake

📖 Disaggregated Architecture

Title	Date	GitHub	Note	Read
Splitwise: Efficient generative LLM inference using phase splitting	2023/11	splitwise-sim	link	✅
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving	2024/01	DistServe
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism	2025/04		link	✅ ✅
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding	2025/07	Step3, StepMesh	link	✅ ✅ ✅
xDeepServe: Model-as-a-Service on Huawei CloudMatrix384	2025/08

📖 Elasticity and Fault Tolerance

Title	Date	GitHub	Note	Read
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models	2024/01	ServerlessLLM	link	✅ ✅
Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving	2025/09		link	✅ ✅
ElasWave: An Elastic-Native System for Scalable Hybrid-Parallel Training	2025/10		link	✅ ✅
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs	2025/10		link	✅ ✅
From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models	2025/11		link	✅

📚 Learning Projects

Project	Category	Author/Organization	About
llm-action	LLM	@liguodongiot	本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）。
awesomeMLSys	MLSys	@GPU MODE	An ML Systems Onboarding list.
InfraTech	MLSys	@CalvinXKY	分享AI Infra知识&代码练习：PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等。
AI-Infra-from-Zero-to-Hero	MLSys	@HuaizhengZhang	🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.
resource-stream	CUDA	@GPU MODE	GPU programming related news and material links.
BasicCUDA	CUDA	@CalvinXKY	A tutorial for CUDA & PyTorch.

©️ Citation

@misc{cs-self-learning@2023,
  title  = {cs-self-learning},
  url    = {https://github.com/shen-shanshan/cs-self-learning},
  note   = {Open-source software available at https://github.com/shen-shanshan/cs-self-learning},
  author = {shen-shanshan},
  year   = {2023}
}

📜 License

MIT License, find more details here.

Name		Name	Last commit message	Last commit date
Latest commit History 439 Commits
AI		AI
Backend&Big_Data		Backend&Big_Data
Data_Structure&Algorithm		Data_Structure&Algorithm
Design_Pattern		Design_Pattern
Employment		Employment
Languages		Languages
Network		Network
Open_Source		Open_Source
Operating_System		Operating_System
Posts		Posts
Research		Research
Tools		Tools
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
MEMO.md		MEMO.md
README.md		README.md
TODO.md		TODO.md

Folders and files

Latest commit

History

Repository files navigation

Computer Science Self-Learning Notes

📌 Overview

📚 Learning Notes

🧱 Basic Knowledges

🤖 AI

🚀 Backend & Big Data

🛠️ Tools

🔗 Others

📚 Technical Blogs

📖 Basic Knowledges

📖 Dive into vLLM

📖 Dive into PyTorch

📖 CUDA Programming

📖 Communication

📖 Multi-Modality

📖 Dive into Qwen

📖 Dive into DeepSeek

📖 Development

📚 Papers

📖 LLM Backbone

📖 LLM Inference Survey

📖 Framework

📖 Schedule

📖 Speculative Decoding

📖 Guided Decoding

📖 Long Sequence Processing

📖 Memory Offloading

📖 Large Scale Serving

📖 Load Balancing

📖 KVCache Store

📖 Disaggregated Architecture

📖 Elasticity and Fault Tolerance

📚 Learning Projects

©️ Citation

📜 License

⭐ Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages