Edge AI Study Guide

Introduction

This folder provides a comprehensive guide to Edge AI — the practice of deploying machine learning models directly on edge devices such as smartphones, microcontrollers, embedded systems, and IoT sensors. Edge AI eliminates the need to send data to the cloud for inference, enabling real-time decisions with lower latency, improved privacy, and reduced bandwidth costs.

The curriculum covers the full pipeline from model compression and optimization to hardware-specific deployment, spanning both theoretical foundations and hands-on implementation with PyTorch, ONNX, TensorFlow Lite, and vendor-specific toolchains.

Target Audience

Learners who have completed the Deep_Learning folder (or equivalent knowledge of CNNs, Transformers, and training workflows)
Engineers interested in deploying models on resource-constrained devices
Researchers exploring efficient model design and hardware-aware optimization
Anyone building real-time AI applications (robotics, autonomous vehicles, smart cameras, wearables)

Prerequisites

Deep_Learning: Solid understanding of CNNs, training loops, loss functions, and PyTorch
Computer_Architecture: Familiarity with CPU/GPU pipelines, memory hierarchy, and instruction-level parallelism
IoT_Embedded: Basic knowledge of embedded systems, microcontrollers, and sensor interfaces

Learning Roadmap

┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│   Foundations     │────▶│   Compression    │────▶│   Optimization   │
│     L01-L02       │     │     L03-L05      │     │     L06-L07      │
└──────────────────┘     └──────────────────┘     └──────────────────┘
                                                          │
                                                          ▼
┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│    Practical      │◀────│    Deployment    │◀────│     Export &     │
│     L14-L16       │     │     L10-L13      │     │    Runtimes      │
└──────────────────┘     └──────────────────┘     │     L08-L09      │
                                                  └──────────────────┘

Recommended Path:

Start with Foundations (L01-L02) to understand edge AI constraints and compression taxonomy
Master Compression techniques (L03-L05) — quantization, pruning, and knowledge distillation
Study Efficient Architecture design (L06-L07) — MobileNet, EfficientNet, and NAS
Learn Export & Runtimes (L08-L09) — ONNX, TensorFlow Lite, and inference engines
Explore Hardware Deployment (L10-L13) — TensorRT, mobile/MCU targets, and benchmarking
Apply knowledge with Practical projects (L14-L16) — end-to-end edge AI applications

File List

Lesson	Filename	Difficulty	Description
Block 1: Foundations
L01	`01_Edge_AI_Fundamentals.md`	⭐	Edge vs cloud inference, latency/privacy tradeoffs, edge computing spectrum
L02	`02_Model_Compression_Overview.md`	⭐⭐	Compression taxonomy: pruning, quantization, distillation, NAS
Block 2: Compression Techniques
L03	`03_Quantization.md`	⭐⭐⭐	PTQ vs QAT, INT8/INT4, symmetric vs asymmetric, mixed-precision
L04	`04_Pruning.md`	⭐⭐⭐	Structured vs unstructured, magnitude-based, lottery ticket hypothesis
L05	`05_Knowledge_Distillation.md`	⭐⭐⭐	Teacher-student framework, soft targets, attention transfer
Block 3: Efficient Architecture Design
L06	`06_Efficient_Architectures.md`	⭐⭐⭐	MobileNet, EfficientNet, ShuffleNet, SqueezeNet, design principles
L07	`07_Neural_Architecture_Search.md`	⭐⭐⭐⭐	NAS fundamentals, search strategies, hardware-aware NAS
Block 4: Export and Runtimes
L08	`08_ONNX_and_Model_Export.md`	⭐⭐⭐	ONNX format, graph optimization, cross-framework conversion
L09	`09_TensorFlow_Lite.md`	⭐⭐⭐	TFLite converter, optimization options, delegate system, TFLM
Block 5: Hardware Deployment
L10	`10_PyTorch_Mobile_and_ExecuTorch.md`	⭐⭐⭐	PyTorch Mobile, ExecuTorch, on-device inference pipelines
L11	`11_Edge_Hardware.md`	⭐⭐⭐	NPUs, TPU Edge, Jetson, Coral, Hailo, hardware comparison
L12	`12_On_Device_Training.md`	⭐⭐⭐⭐	On-device training, federated learning, transfer learning on edge
L13	`13_Real_Time_Inference.md`	⭐⭐⭐	Real-time inference optimization, latency/throughput profiling
Block 6: Practical Applications
L14	`14_Edge_AI_for_Computer_Vision.md`	⭐⭐⭐	Edge vision: object detection, segmentation on edge devices
L15	`15_Edge_AI_for_NLP.md`	⭐⭐⭐⭐	Edge NLP: on-device text classification and keyword spotting
L16	`16_Deployment_and_Monitoring.md`	⭐⭐⭐⭐	End-to-end deployment, monitoring, and production pipelines

Total: 16 lessons (13 concept lessons + 3 practical/implementation lessons)

Environment Setup

Core Installation

# PyTorch (for model training and export)
pip install torch torchvision

# ONNX ecosystem
pip install onnx onnxruntime onnxoptimizer

# TensorFlow Lite (for mobile/MCU deployment)
pip install tensorflow tflite-runtime

# Profiling and benchmarking
pip install thop fvcore

Optional Tools

# NVIDIA TensorRT (requires NVIDIA GPU + CUDA)
pip install tensorrt

# Edge TPU compiler (for Google Coral)
# See: https://coral.ai/docs/edgetpu/compiler/

# ARM NN SDK (for ARM-based deployment)
# See: https://developer.arm.com/Tools%20and%20Software/Arm%20NN

Verify Installation

import torch
import onnx
import onnxruntime as ort

print(f"PyTorch: {torch.__version__}")
print(f"ONNX: {onnx.__version__}")
print(f"ONNX Runtime: {ort.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

Study Tips

Start with Profiling: Before compressing a model, measure its baseline latency and memory on your target device
Compression is Iterative: Apply one technique at a time (quantize, then prune, then distill) and measure impact
Hardware Matters: A technique that works well on GPU (e.g., unstructured pruning) may not help on mobile NPUs
Test on Real Devices: Emulators and simulators cannot capture true latency — always validate on hardware
Read Vendor Docs: Each hardware platform (TensorRT, CoreML, Edge TPU) has specific operator support and constraints
Track Accuracy vs Efficiency: Build Pareto curves (accuracy vs latency/size) to find the sweet spot

Learning Outcomes

After completing this folder, you will be able to:

Explain the tradeoffs between cloud inference and edge deployment
Apply quantization (PTQ, QAT) to reduce model size by 2-4x with minimal accuracy loss
Prune neural networks using structured and unstructured methods
Train compact student models via knowledge distillation
Design efficient architectures using MobileNet, EfficientNet, and NAS principles
Export models to ONNX and TensorFlow Lite formats
Deploy optimized models on GPUs (TensorRT), mobile devices, and microcontrollers
Benchmark and profile edge inference pipelines for latency, throughput, and power

Next Steps

For LLM Compression: Proceed to Foundation_Models for quantization and LoRA on large language models
For Vision Applications: Explore Computer_Vision for detection, tracking, and SLAM on edge
For Production Pipelines: Check MLOps for model serving, monitoring, and CI/CD
For Hardware Design: Study Computer_Architecture for custom accelerator design

License: CC BY-NC 4.0

Start with 01_Edge_AI_Fundamentals.md to begin your edge AI journey.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Edge AI Study Guide

Introduction

Target Audience

Prerequisites

Learning Roadmap

File List

Environment Setup

Core Installation

Optional Tools

Verify Installation

Related Topics

Study Tips

Learning Outcomes

Next Steps

FilesExpand file tree

00_Overview.md

Latest commit

History

00_Overview.md

File metadata and controls

Edge AI Study Guide

Introduction

Target Audience

Prerequisites

Learning Roadmap

File List

Environment Setup

Core Installation

Optional Tools

Verify Installation

Related Topics

Study Tips

Learning Outcomes

Next Steps