- Overview
- Tools
- Implementation Details
- Scenarios
- Simulation Process
- Additional Notes
- Related Links
- Authors
This project benchmarks the performance of the STM32F103 microcontroller in running machine learning models for keyword spotting, image classification, anomaly detection, and emotion recognition. We convert models into optimized formats and analyze inference speed, memory usage, and power consumption to assess their feasibility on embedded systems. We used Keil uVision5 for firmware development and provided a Python-based simulation environment for validating models before deployment. This work helps assess the practical applications of machine learning on low-power embedded devices.
- STM32 Development Board: A STM32F103C8 Board
- ST-Link programmer for flashing the firmware
- Development Tools:
- Keil uVision5 (ARM uVision 5, version 5.x)
- Libraries:
- TensorFlow Lite for Microcontrollers: A lightweight version of TensorFlow designed for microcontroller environments.
- Other Tools:
- Python 3.x: For scripting and model preparation.
- Git: For version control.
- Convert trained model to TensorFlow Lite format using
ModelConverter.py. - Validate the TensorFlow Lite model with
ValidateTFModel.py. - Convert the
.tflitemodel into a C header file using the command:This embeds the model data directly into firmware.xxd -i speech_commands_model_float32.tflite > model_data.h
- Open the Keil project file (e.g.,
RUN.uvprojx) located in theCode/STM32folder. - Configure the target device as an STM32F103 microcontroller, ensuring clock and memory settings match the board.
- In the main source file (e.g.,
Runner.c), implement code to:- Capture audio data.
- Run inference using the embedded model from
model_data.h. - Measure and print inference time and predicted labels to a serial terminal.
- Build the Project: In Keil uVision5, navigate to Project > Open Project... to load your project, then click the Build button to compile the firmware.
- Flash the Firmware: Connect the STM32F103 board via ST-Link, and use the Download option to flash the firmware. After flashing, reset or power cycle the board to start the benchmark application.
- Open a serial terminal (e.g., PuTTY, Tera Term) to monitor the output from the STM32F103 board.
- The device will print inference times and predicted labels, this allows analyzing performance metrics such as inference latency, memory usage, and power consumption.
-
Install Python and Dependencies:
- Ensure that Python 3.x is installed on your system.
- Open a command prompt in the repository’s root directory.
- Install required Python packages using:
pip install -r Code/requirements.txt
-
Run Simulation Scripts:
- To validate the model conversion or simulate inference on your computer, navigate to the appropriate folder (e.g.,
Code). - Execute simulation scripts such as:
python ValidateTFModel.py
- These scripts will run the TensorFlow Lite model in a simulated environment and output performance metrics and inference results.
- To validate the model conversion or simulate inference on your computer, navigate to the appropriate folder (e.g.,
As we can interpret from the table below, in this work, we assess four different scenarios, targeting different tasks in ML, like speech recognition, anomaly detection, image classification, and lastly, emotion detection in the domain of natural language processing.
From the table, it can be understood that many developed TFLite models are well reached in the limited target of 64 KB memory, and only the LSTM structure exceeds this value because of its tokenizer.
Regarding detecting whether a sentence in the domain of text has what type of emotion, we developed two different models: BERT and LSTM.
For the BERT model, we utilized the ParsBERT embedding space and defined a Dense-CNN classifier on top of it to be trained using our novel dataset. Below, the process of training and the value of accuracy and loss can be seen:
With the aid of the TFlite converter, we were able to reduce the size of the model to one-fourth of the initial size or, in other words, 162 MB, but this is beyond our limitation of the embedded system, so we developed an LSTM architecture in its place.
For this task, we developed two distinct models, one regular one using a simple tokenizer by defining a dictionary of 2000 maximum vocab, and another using dynamic-learning rate and sparse_categorical_crossentropy loss. This model resulted in 82% and 89% accuracy, respectively, which is a promising result for 640 KB and 161 KB models.
Furthermore, more detailed information regarding classification is included below:
precision recall f1-score support
0 0.88 0.76 0.82 110
1 0.81 0.84 0.83 102
2 0.95 0.86 0.90 97
3 0.90 0.97 0.94 104
4 0.97 0.96 0.96 95
5 0.88 0.98 0.93 100
6 0.99 0.94 0.96 113
7 0.98 0.92 0.95 104
8 0.80 0.73 0.76 101
9 0.99 1.00 1.00 103
10 0.91 0.92 0.92 100
11 0.71 0.85 0.77 105
accuracy 0.89 1234
macro avg 0.90 0.89 0.89 1234
weighted avg 0.90 0.89 0.89 1234One of our works' novel contributions is emotion detection dataset creation. For this dataset, we utilized GPT to generate sentences for each emotion, resulting in more than 6000 Farsi-labeled sentences across 12 different classes. For this reason, we used prompt engineering techniques to make sure that the generated sentences were valid and also unique.
To understand anomaly behaviors, we developed and trained two distinct architectures, one using a random forest tree and another using FC-AutoEncoder. The random forest tree classifier showed a promising result of 99% in this task; however, its model size exceeds 16 MB, which is beyond the limit to be run on the STM32 chipset. The result of anomaly detection is shown below:
Using FC-AutoEncoder, we achieved acceptable results, which is shown below:
Nevertheless, as shown in the training process, due to the limited size of the model, it is not capable of understanding the meaning and relation between labels and dense feature space.
But its 41 KB size makes it manageable to work with in embedded systems.
Image Classification involves training a model to categorize images into predefined classes using deep learning models like Convolutional Neural Networks (CNNs).
- Model Training
- Prepare the Dataset: Download and preprocess the dataset (e.g., CIFAR-10, ImageNet). Split into training, validation, and test sets.
- Define the Model: Use CNNs or pre-trained models like ResNet or VGG.
- Train the Model: Compile and fit the model to the dataset.
- Evaluate the Model: Test the model using accuracy, precision, and recall metrics.
- Model Evaluation
Evaluate the trained model's performance on a test dataset using metrics like accuracy.
- Model Inference
Use the trained model to classify new images.
Keyword Spotting detects specific words or phrases in an audio stream, typically using features like MFCCs and models like CNNs.
- Dataset Preparation
- Download the Dataset: Use the Google Speech Commands dataset.
- Preprocess the Data: Convert audio to features (MFCC, LFBE, or raw samples).
- Split the Data: Divide into training, validation, and test sets.
- Model Training
- Define the Model: Use CNN or RNN architectures.
- Train the Model: Compile and train the model using the prepared dataset.
- Save the Model: Save the trained model for later use.
- Model Evaluation
Evaluate the model on the test set using accuracy, precision, recall, and AUC.
- Quantization and Inference
- Quantize the Model: Convert the model to TensorFlow Lite format for efficient deployment.
- Evaluate the Quantized Model: Ensure the quantized model performs well.
- Run Inference: Use the model for real-time keyword detection.
For the simulation process, we utilized the TFLite model to get the result. We analyzed and monitored the memory usage, CPU time, execution time, and outputs, and we were able to produce results iteratively in each area of measurement. the result is as follows.
-
Firmware Development:
- In this project, we used a converted TFLite model (via
xxd) embedded as a C header (model_data.h). No external AI libraries are imported in the firmware code. - Then in the project’s source files (
Runner.c) we referencedmodel_data.h.
- In this project, we used a converted TFLite model (via
-
Simulation:
- The Python simulation helps in validating the TFLite model performance before deploying it on hardware.
- To analyze memory usage, CPU time, and inference latency, we used a python script for each project.
Special Thanks to Ali Salesi.


















