MIT 6.5940 - Notes and Labs⚡️

Notes and practical notebooks from MIT 6.5940, (Fall 2023) : TinyML and Efficient Deep Learning Computing lecture.

🌟 Course Overview

This course introduces efficient deep learning computing techniques that enable powerful deep learning applications on resource-constrained devices. The main focus is on achieving maximal performance with minimal resource consumption.

🎯 Key Learning Objectives

Upon completion of this course, you will be able to:

Shrink and Accelerate Models: Master techniques like Pruning, Quantization (INT8/INT4), and Knowledge Distillation to dramatically reduce model size and inference latency.
Design Efficient Architectures: Utilize Neural Architecture Search (NAS), specifically Once-for-All (OFA), to automatically design hardware-aware networks.
Master LLM Efficiency: Apply Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA and QLoRA for efficient adaptation of multi-billion parameter models.
Optimize Distributed Systems: Implement Data, Pipeline, and Tensor Parallelism for efficient training of models that exceed single-GPU memory.
Deploy to the Edge (TinyML): Design models and system software (MCUNet, TinyEngine) capable of running complex AI on microcontrollers with Kilobytes of RAM.
Explore Future Computing: Understand the fundamentals of Quantum Machine Learning (QML) and implement Noise Mitigation techniques for current NISQ hardware.

💻 Tech Stack & Prerequisites

Programming: Strong proficiency in Python 3.
Frameworks: Experience with PyTorch (primary framework) or TensorFlow.
Math: Comfort with Linear Algebra, Calculus, and Probability.
Prerequisites: Familiarity with standard deep learning concepts (CNNs, RNNs, basic optimizers).

📚 Course Schedule

Chapter 0: Introduction

Lecture	Topic (notes)	Slide	Notebook	Reference
L1	Introduction	Slides	—	Video
L2	Basics of Deep Learning	Slides	L02_NN_Basics.ipynb	Video

Chapter I: Efficient Inference

Lecture	Topic (notes)	Slide	Notebook	Reference
L3	Pruning and Sparsity (Part I)	Slides	—	Video
L4	Pruning and Sparsity (Part II)	Slides	L03_L04_Pruning.ipynb	Video
L5	Quantization (Part I)	Slides	—	Video
L6	Quantization (Part II)	Slides	L08_Quantization_PTQ.ipynb	Video
L7	Neural Architecture Search (Part I)	Slides	—	Video
L8	Neural Architecture Search (Part II)	Slides	—	Video
L9	Knowledge Distillation	Slides	L09_Quantization_QAT.ipynb	Video
L10	MCUNet: TinyML on Microcontrollers	Slides	L10_L11_NAS.ipynb	Video
L11	TinyEngine and Parallel Processing	Slides	L21_TinyML_Deployment.ipynb	Video

Chapter II: Domain-Specific Optimization

Lecture	Topic (notes)	Slide	Notebook	Reference
L12	Transformer and LLM	Slides	—	Video
L13	Efficient LLM Deployment	Slides	—	Video
L14	LLM Post Training	Slides	—	Video
L15	Long Context LLM	Slides	—	Video
L16	Vision Transformer	Slides	L16_LLM_QLoRA_Finetuning.ipynb	Video
L17	GAN, Video, and Point Cloud	Slides	—	Video
L18	Diffusion Model	Slides	—	Video

Chapter III: Efficient Training

Lecture	Topic (notes)	Slide	Notebook	Reference
L19	Distributed Training (Part I)	Slides	—	Video
L20	Distributed Training (Part II)	Slides	—	Video
L21	On-Device Training and Transfer Learning	Slides	—	Video

Chapter IV: Advanced Topics

Lecture	Topic (notes)	Slide	Notebook	Reference
L22	Course Summary + Quantum ML I	Slides	—	Video
L23	Quantum Machine Learning II	Slides	L23_QML_Noise_Mitigation.ipynb	Video
L24	Final Project Presentation	Slides	—	Video
L25	Final Project Presentation	Slides	—	Video
L26	Final Project Presentation	Slides	—	Video

💻 Hands-on Labs & Advanced Project Ideas (MIT 6.5940 Final Projects)

All lab exercises are designed to provide hands-on experience with real-world frameworks:

LLM Deployment: Hands-on experience deploying and running QLoRA-tuned LLMs (e.g., Llama-2) directly on a local GPU or CPU.
TinyML: Utilizing the TinyEngine and TensorFlow Lite Micro frameworks for model deployment on simulated microcontroller environments.
QML: Using Qiskit and Pennylane to build, train, and mitigate noise in variational quantum circuits.

For further advanced projects the course provided a set of state-of-the-art research challenges in efficient ML to explore.

1. Project: TSM for Efficient Video Understanding (Temporal Shift Module)

[cite_start]Goal: Address the challenge of efficient video analysis by leveraging Temporal Shift Module (TSM), which captures temporal relationships without adding computational cost[cite: 8, 10].
[cite_start]Description: TSM works by shifting part of the channels along the temporal dimension, facilitating information exchange among neighboring frames[cite: 9]. [cite_start]Projects could involve changing the backbone (e.g., from MobileNetV2) or applying TSM to a new video task like fall detection[cite: 14, 15].

2. Project: SIGE - Sparse Engine for Generative AI

[cite_start]Goal: Accelerate image editing in deep generative models by avoiding the re-synthesis of unedited regions[cite: 34, 35].
[cite_start]Description: SIGE (Sparse Inference GEnerator) is a sparse engine that caches and reuses feature maps from the original image to generate only the edited regions[cite: 36]. [cite_start]The project focuses on integrating SIGE with Stable Diffusion XL (SDXL) to assess and potentially achieve more significant speed improvements[cite: 37, 39].

3. Project: QServe for Online Quantized LLM Serving

[cite_start]Goal: Achieve high-throughput, real-time serving of low-precision quantized LLMs (like INT4) in cloud-based settings[cite: 165, 182].
Description: The project centers on implementing an online, real-time serving system using the QServe library, which utilizes the QoQ (W4A8KV4) quantization algorithm. [cite_start]The final objective is to build an online Gradio demo to serve these highly-efficient, quantized LLMs[cite: 168, 183, 185].

Full documentation and project details can be found here.

References

Course Youtube Series: EfficientML.ai Course | 2023 Fall | MIT 6.5940
Course Slides
Final project list (2023- 2024): EfficientML.ai Project Ideas

🙏 Acknowledgements

Special thanks to:

Professor Song Han (MIT/HAN Lab) for his tremendous effort and passion in developing the EfficientML.ai framework and for making this cutting-edge research accessible to everyone.
Yifan Lu for his dedication to making the course homework and lab materials publicly accessible and available for the community (All Homeworks Labs Accessible).

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
chapters		chapters
lab		lab
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MIT 6.5940 - Notes and Labs⚡️

🌟 Course Overview

🎯 Key Learning Objectives

💻 Tech Stack & Prerequisites

📚 Course Schedule

Chapter 0: Introduction

Chapter I: Efficient Inference

Chapter II: Domain-Specific Optimization

Chapter III: Efficient Training

Chapter IV: Advanced Topics

💻 Hands-on Labs & Advanced Project Ideas (MIT 6.5940 Final Projects)

1. Project: TSM for Efficient Video Understanding (Temporal Shift Module)

2. Project: SIGE - Sparse Engine for Generative AI

3. Project: QServe for Online Quantized LLM Serving

References

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

afondiel/MIT-6.5940-Notes-Labs

Folders and files

Latest commit

History

Repository files navigation

MIT 6.5940 - Notes and Labs⚡️

🌟 Course Overview

🎯 Key Learning Objectives

💻 Tech Stack & Prerequisites

📚 Course Schedule

Chapter 0: Introduction

Chapter I: Efficient Inference

Chapter II: Domain-Specific Optimization

Chapter III: Efficient Training

Chapter IV: Advanced Topics

💻 Hands-on Labs & Advanced Project Ideas (MIT 6.5940 Final Projects)

1. Project: TSM for Efficient Video Understanding (Temporal Shift Module)

2. Project: SIGE - Sparse Engine for Generative AI

3. Project: QServe for Online Quantized LLM Serving

References

🙏 Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages