SentinelNN is an open-source framework to analyze and enhance the fault tolerance of CNN models. The key features of SentinelNN are:
- Resilience analysis: SentinelNN provides a set of methods for the importance analysis of channels in convolutional layers, including: "l1-norm", "vul-gain", "salience", "deepvigor"
- Structured channel pruning: SentinelNN can prune the less important channels of CNNs, based on a selected resilience analysis method, to reduce the size of CNNs
- Selective channel duplication and correction: SentinelNN duplicates the more important (or vulnerable) channels and applies a correction mechanism to mitigate the effect of faults
- Range restriction: SentinalNN applies a range restriction method (i.e., Ranger) to the ReLU activation function to reduce the error propagation to the CNN outputs
The framework takes a pretrained CNN model (i.e., .pth file) and outputs pruned or hardened model with the same format. For experiments, sentinelNN logs the results in a directory with the model's name and dataset. The user can also conduct fault injection experiments to observe the effectiveness of model hardening.
Requirements:
This framework is developed and tested with python 3.10, pytorch 1.10.2+cu102, torchvision 0.11.2+cu102
How to use:
Here are the commands that the user can employ to use this framework:
- To test a pretrained CNN model:
python main.py --model=model_name --dataset=dataset_name --batch-size=batch-size
This framework supports loading pretrained models for Cifar-10 and Cifar-100 directly from pytorch-cifar-models and also loads models from pytorch hub for ImageNet.
-
To prune a CNN model:
python main.py --model=model_name --dataset=dataset_name --batch-size=batch-size --is-pruning \ --pruning-ratio=0.x --importance=method_name
The
pruning_ratiois a value between 0 and 1. Theimportancemethod name can be: "l1-norm", "vul-gain", "salience", "deepvigor" or "channel-FI".
In the case of
importance=deepvigorthe derived vulnerability factors by DeepVigor are saved, for each layer separately, in the corresponding workspace and can be reused.
-
The pruned model is saved and can be loaded as:
python main.py --model=model_name --dataset=dataset_name --batch-size=batch-size \ --is-pruned --pruning-ratio=0.x --pruned-checkpoint=./path/to/saved/model.pth -
To harden a pretrained model:
python main.py --model=model_name --dataset=dataset_name --batch-size=batch-size \ --is-hardening --hardening-ratio=0.x --importance=method_name --clipping=ranger -
To harden a pruned model:
python main.py --model=model_name --dataset=dataset_name --batch-size=batch-size \ --is-pruned --pruning-ratio=0.x --pruned-checkpoint=path/to/saved/model.pth --is-hardening --hardening-ratio=0.x --importance=method_name --clipping=ranger -
To load a pruned+hardened model and conduct fault injection into weights:
python3.10 main.py --model=model_name --dataset=dataset_name --batch-size=256 \ --is-pruned --pruning-ratio=0.1 --pruned-checkpoint=path/to/saved/model.pth \ --is-hardened --hardening-ratio=0.1 --clipping=ranger --hardened-checkpoint=path/to/saved/model.pth \ --is-FI --BER=0.000005 --repeat=1000
This command performs bitflips in convolutional and linear layers with the Bit Error Rate (BER) of
0.000005and repeats the FI campaign1000times and logs the average results.
While loading a pruned or hardened model, the
pruning_ratioandhardening_ratioshould be equal to their initial applied ratios and the corresponding files should be assigned. The framework first loads the initial model, and then modifies its structure with respect to the pruning and hardening ratios, respectively. And finally, it loads the parameters from the saved model.
SentinelNN supports Ranger at the moment, in the hardening process in which ReLU is protected. We have implemented multiple state-of-the-art activation restriction methods in another framework, called RReLU.
Check out the published paper here or here. If you use SentinelNN, please cite:
@inproceedings{ahmadilivani2024cost-effective,
author={Ahmadilivani, Mohammad Hasan and Mousavi, Seyedhamidreza and Raik, Jaan and Daneshtalab, Masoud and Jenihhin, Maksim},
booktitle={2024 IEEE 30th International Symposium on On-Line Testing and Robust System Design (IOLTS)},
title={Cost-Effective Fault Tolerance for CNNs Using Parameter Vulnerability Based Hardening and Pruning},
year={2024},
pages={1-7},
keywords={Accuracy; Error analysis; Computational modeling; Fault-tolerant systems; Redundancy; Neural networks},
doi={10.1109/IOLTS60994.2024.10616072}}
Related papers and frameworks:
- DeepVigor+ source code, and paper1 and paper2
- RReLU toolbox source code and its paper
