Skip to content

afondiel/MIT-6.5940-Notes-Labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIT 6.5940 - Notes and Labs⚡️

Notes and practical notebooks from MIT 6.5940, (Fall 2023) : TinyML and Efficient Deep Learning Computing lecture.

🌟 Course Overview

This course introduces efficient deep learning computing techniques that enable powerful deep learning applications on resource-constrained devices. The main focus is on achieving maximal performance with minimal resource consumption.

🎯 Key Learning Objectives

Upon completion of this course, you will be able to:

  • Shrink and Accelerate Models: Master techniques like Pruning, Quantization (INT8/INT4), and Knowledge Distillation to dramatically reduce model size and inference latency.
  • Design Efficient Architectures: Utilize Neural Architecture Search (NAS), specifically Once-for-All (OFA), to automatically design hardware-aware networks.
  • Master LLM Efficiency: Apply Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA and QLoRA for efficient adaptation of multi-billion parameter models.
  • Optimize Distributed Systems: Implement Data, Pipeline, and Tensor Parallelism for efficient training of models that exceed single-GPU memory.
  • Deploy to the Edge (TinyML): Design models and system software (MCUNet, TinyEngine) capable of running complex AI on microcontrollers with Kilobytes of RAM.
  • Explore Future Computing: Understand the fundamentals of Quantum Machine Learning (QML) and implement Noise Mitigation techniques for current NISQ hardware.

💻 Tech Stack & Prerequisites

  • Programming: Strong proficiency in Python 3.
  • Frameworks: Experience with PyTorch (primary framework) or TensorFlow.
  • Math: Comfort with Linear Algebra, Calculus, and Probability.
  • Prerequisites: Familiarity with standard deep learning concepts (CNNs, RNNs, basic optimizers).

📚 Course Schedule

Chapter 0: Introduction

Lecture Topic (notes) Slide Notebook Reference
L1 Introduction Slides Video
L2 Basics of Deep Learning Slides L02_NN_Basics.ipynb Video

Chapter I: Efficient Inference

Lecture Topic (notes) Slide Notebook Reference
L3 Pruning and Sparsity (Part I) Slides Video
L4 Pruning and Sparsity (Part II) Slides L03_L04_Pruning.ipynb Video
L5 Quantization (Part I) Slides Video
L6 Quantization (Part II) Slides L08_Quantization_PTQ.ipynb Video
L7 Neural Architecture Search (Part I) Slides Video
L8 Neural Architecture Search (Part II) Slides Video
L9 Knowledge Distillation Slides L09_Quantization_QAT.ipynb Video
L10 MCUNet: TinyML on Microcontrollers Slides L10_L11_NAS.ipynb Video
L11 TinyEngine and Parallel Processing Slides L21_TinyML_Deployment.ipynb Video

Chapter II: Domain-Specific Optimization

Lecture Topic (notes) Slide Notebook Reference
L12 Transformer and LLM Slides Video
L13 Efficient LLM Deployment Slides Video
L14 LLM Post Training Slides Video
L15 Long Context LLM Slides Video
L16 Vision Transformer Slides L16_LLM_QLoRA_Finetuning.ipynb Video
L17 GAN, Video, and Point Cloud Slides Video
L18 Diffusion Model Slides Video

Chapter III: Efficient Training

Lecture Topic (notes) Slide Notebook Reference
L19 Distributed Training (Part I) Slides Video
L20 Distributed Training (Part II) Slides Video
L21 On-Device Training and Transfer Learning Slides Video

Chapter IV: Advanced Topics

Lecture Topic (notes) Slide Notebook Reference
L22 Course Summary + Quantum ML I Slides Video
L23 Quantum Machine Learning II Slides L23_QML_Noise_Mitigation.ipynb Video
L24 Final Project Presentation Slides Video
L25 Final Project Presentation Slides Video
L26 Final Project Presentation Slides Video

💻 Hands-on Labs & Advanced Project Ideas (MIT 6.5940 Final Projects)

All lab exercises are designed to provide hands-on experience with real-world frameworks:

  • LLM Deployment: Hands-on experience deploying and running QLoRA-tuned LLMs (e.g., Llama-2) directly on a local GPU or CPU.
  • TinyML: Utilizing the TinyEngine and TensorFlow Lite Micro frameworks for model deployment on simulated microcontroller environments.
  • QML: Using Qiskit and Pennylane to build, train, and mitigate noise in variational quantum circuits.

For further advanced projects the course provided a set of state-of-the-art research challenges in efficient ML to explore.

1. Project: TSM for Efficient Video Understanding (Temporal Shift Module)

  • [cite_start]Goal: Address the challenge of efficient video analysis by leveraging Temporal Shift Module (TSM), which captures temporal relationships without adding computational cost[cite: 8, 10].
  • [cite_start]Description: TSM works by shifting part of the channels along the temporal dimension, facilitating information exchange among neighboring frames[cite: 9]. [cite_start]Projects could involve changing the backbone (e.g., from MobileNetV2) or applying TSM to a new video task like fall detection[cite: 14, 15].

2. Project: SIGE - Sparse Engine for Generative AI

  • [cite_start]Goal: Accelerate image editing in deep generative models by avoiding the re-synthesis of unedited regions[cite: 34, 35].
  • [cite_start]Description: SIGE (Sparse Inference GEnerator) is a sparse engine that caches and reuses feature maps from the original image to generate only the edited regions[cite: 36]. [cite_start]The project focuses on integrating SIGE with Stable Diffusion XL (SDXL) to assess and potentially achieve more significant speed improvements[cite: 37, 39].

3. Project: QServe for Online Quantized LLM Serving

  • [cite_start]Goal: Achieve high-throughput, real-time serving of low-precision quantized LLMs (like INT4) in cloud-based settings[cite: 165, 182].
  • Description: The project centers on implementing an online, real-time serving system using the QServe library, which utilizes the QoQ (W4A8KV4) quantization algorithm. [cite_start]The final objective is to build an online Gradio demo to serve these highly-efficient, quantized LLMs[cite: 168, 183, 185].

Full documentation and project details can be found here.

References

🙏 Acknowledgements

Special thanks to:

  • Professor Song Han (MIT/HAN Lab) for his tremendous effort and passion in developing the EfficientML.ai framework and for making this cutting-edge research accessible to everyone.
  • Yifan Lu for his dedication to making the course homework and lab materials publicly accessible and available for the community (All Homeworks Labs Accessible).

About

Notes and Labs from MIT 6.5940, (Fall 2023) lecture.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published