- The Ultra-Scale Playbook: Training LLMs on GPU Clusters (https://huggingface.co/spaces/nanotron/ultrascale-playbook)
- Inside vLLM (https://www.aleksagordic.com/blog/vllm)
- Flash attention (https://gordicaleksa.medium.com/eli5-flash-attention-5c44017022ad)
- Inside NVIDIA GPUs: Anatomy of high performance matmul kernels (https://www.aleksagordic.com/blog/matmul)
- The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs (https://arxiv.org/abs/2408.13296v1)
- FineWeb: decanting the web for the finest text data at scale (https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1)
- A Meticulous Guide to Advances in Deep Learning Efficiency over the Years (https://alexzhang13.github.io/blog/2024/efficient-dl/)
- Mastering QLoRa (https://manalelaidouni.github.io/4Bit-Quantization-Models-QLoRa.html)
- Llama from scratch (or how to implement a paper without crying) (https://blog.briankitano.com/llama-from-scratch/)
- Fast LLM Inference From Scratch (https://andrewkchan.dev/posts/yalm.html)
- A Survey of Reinforcement Learning for Large Reasoning Models (https://arxiv.org/pdf/2509.08827)
- Exercise in ML (https://arxiv.org/pdf/2206.13446)
- Stanford CS336 Language Modeling from Scratch (https://www.youtube.com/playlist?list=PLoROMvodv4rOY23Y0BoGoBGgQ1zmU_MT_)
- Transformers and LLMs (https://cme295.stanford.edu/syllabus/)