Articles/Blogs

The Ultra-Scale Playbook: Training LLMs on GPU Clusters (https://huggingface.co/spaces/nanotron/ultrascale-playbook)
Inside vLLM (https://www.aleksagordic.com/blog/vllm)
Flash attention (https://gordicaleksa.medium.com/eli5-flash-attention-5c44017022ad)
Inside NVIDIA GPUs: Anatomy of high performance matmul kernels (https://www.aleksagordic.com/blog/matmul)
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs (https://arxiv.org/abs/2408.13296v1)
FineWeb: decanting the web for the finest text data at scale (https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1)
A Meticulous Guide to Advances in Deep Learning Efficiency over the Years (https://alexzhang13.github.io/blog/2024/efficient-dl/)
Mastering QLoRa (https://manalelaidouni.github.io/4Bit-Quantization-Models-QLoRa.html)
Llama from scratch (or how to implement a paper without crying) (https://blog.briankitano.com/llama-from-scratch/)
Fast LLM Inference From Scratch (https://andrewkchan.dev/posts/yalm.html)
A Survey of Reinforcement Learning for Large Reasoning Models (https://arxiv.org/pdf/2509.08827)
Exercise in ML (https://arxiv.org/pdf/2206.13446)

Courses