Repository for the paper
AIO-P: Expanding Neural Performance Predictors Beyond Image Classification
Keith G. Mills, Di Niu, Mohammad Salameh, Weichen Qiu, Fred X. Han, Puyuan Liu, Jialin Zhang, Wei Lu and Shangling Jui
AAAI-23 Oral Presentation
Specifically, we provide the following:
- Computation Graph (CG) data caches for all datasets used in the paper.
- Code for generating individually labeled CG samples, as well as training a shared head to generate pseudo-labels.
- Predictor code including AIO-P with k-Adapters and label scaling, as well as the baseline GNN.
- Code API for generating, loading and visualizing CGs.
- Machine with an NVIDIA GPU and CUDA (>=10.2) support
- Python 3.7
- System: Ubuntu 20.04.4 LTS
- Conda is installed
First create a conda environment
conda create -n aiop python=3.7
conda activate aiopInstall conda packages
$ conda install -c anaconda tensorflow-gpu=1.15.0 cudatoolkit
Install pip packages (can use conda instead, but this worked for us)
$ pip install --trusted-host pytorch.org --trusted-host download.pytorch.org torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install --trusted-host pytorch-geometric.com torch-scatter==2.0.8 torch-sparse==0.6.11 torch-cluster==1.5.9 torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.8.1.html
$ pip install torch_geometric==1.7.2 opencv-python thop
$ pip install git+https://github.com/facebookresearch/detectron2.git git+https://github.com/cocodataset/panopticapi.git
These commands worked for us, but your mileage may vary. We note that for Detectron2 to work properly, torch and torchvision should be compiled with a version of CUDA which is reflected in their respective __version__ fields.
- Download Computational Graph (CG) caches:
- Download
cache.zipfrom the shared google drive and place all.pklfiles in/cache/.- Caches with
indin their name are individually-trained architectures.sharedrefers to architectures fine-tuned by a shared head. deeplab' orslim' caches refer to model zoos.- Caches without either are classifiction CGs.
- Caches with
- Download
sample_pbs.zipfrom the shared google drive for examples of ResNet-18 and EfficientNet-B0.
- Download
- Download the following datasets:
- LSP
- Unzip contents to
data/HPE/lsp
- Unzip contents to
- LSP Extended
- Unzip contents to
data/HPE/lsp_extended
- Unzip contents to
- MPII
- Unzip and place the annot and images folders under
data/HPE/mpii
- Unzip and place the annot and images folders under
- COCO
- Set up the detectron2 COCO dataset: https://github.com/facebookresearch/detectron2/tree/main/datasets
- LSP
- Download and unpack CG_data.zip
- Place
.pklfiles incachefolder - Set the environment variable
DETECTRON2_DATASETSto the directory containing the coco datasets:export DETECTRON2_DATASETS=/home/...
The
run_train_cgs_on_task.pywill train individual architectures to be used as test architectures that represent the ground truth.
Inside the cache folder, this script will output a subfolder with a.txtfile that contains the logs and a.pklfile with the trained architectures
python run_train_cgs_on_task.py -family ofa_mbv3 -task detectron2 -tag individual -start_idx 0 -num_archs 10 -skip --num-gpus 2 --config-file tasks/detectron2/COCO_PanSeg_FPN_Adapted_Head.yml
-familyis the OFA family to train, select from one ofofa_pn,ofa_mbv3, andofa_resnet.-task detectron2will execute the detectron2 code-tagcan be any string that labels the output folder-start_idxis the start index of the architectures to train-num_archsis the number of architectures to train-skipuses skip connections--num-gpus 2uses 2 GPUs. We use Tesla V100 GPUs with 32GB of VRAM each, so depending on your computer resources, you may need to increase number of GPUs to avoid CUDA Out of memory errors--config-fileis the path to the detectron2 config file
python run_train_cgs_on_task.py -family ofa_mbv3 -task hpe2d -tag individual -start_idx 0 -num_archs 10 --num_epochs 140 --data_dir data/HPE
- See the
tasks/pose_hg_3d/lib/opts.pyfile for the full list of flags on HPE -familyis the OFA family to train, select from one ofofa_pn,ofa_mbv3, andofa_resnet.-task hpe2dwill execute the hpe2d code-tagcan be any string that labels the output folder-start_idxis the start index of the architectures to train-num_archsis the number of architectures to train--num_epochsis the number of epochs to train for--data_diris the folder path of the data directory that contains the HPE data and it should containlsp,lsp_extendedandmpiisubfolders
These caches are needed for the HPE training shared head experiments
The caches are passed in at the--cache_fileflag forrun_train_head_on_task.py
python tasks/pose_hg_3d/lsp_dataloader.py --family mbv3 --data_dir data/HPE/
You might need run export PYTHONPATH=$PYTHONPATH:/path/to/this/directory/
--familyis the OFA family to train, select from one ofpn,mbv3, andresnet.--data_diris the folder path of the data directory that contains the HPE data and it should contain bothlspandlsp_extendedsubfolders
run_train_head_on_task.pywill train the shared head.
The script will produce a.pklfile that contains the shared head weights.
We do not generate caches as COCO is too big, but sample latent representations on the fly.
python run_train_head_on_task.py -family ofa_mbv3 -task detectron2 -tag sampled -skip --num-gpus 1 --config-file tasks/detectron2/COCO_PanSeg_FPN_Adapted_Head.yml -sample_n 3 SOLVER.MAX_ITER 250000 SOLVER.STEPS 166000,222000 SOLVER.IMS_PER_BATCH 8
-familyis the OFA family to train, select from one ofofa_pn,ofa_mbv3, andofa_resnet.-task detectron2will execute the detectron2 code-tagcan be any string that labels the output folder-skipuses skip connections-sample_nis the number of architectures per bin--num-gpus 1uses 1 GPU--config-fileis the path to the detectron2 config fileSOLVER.MAX_ITERis the maximum number of iterationsSOLVER.STEPSare the steps at which the learning rate will be decreasedSOLVER.IMS_PER_BATCHis the number of architectures per batch
Hyperparameters
| OFA family | SOLVER.MAX_ITER | SOLVER.STEPS | SOLVER.IMS_PER_BATCH |
|---|---|---|---|
| PN | 250000 | 166000,222000 | 8 |
| MBv3 | 250000 | 166000,222000 | 8 |
| ResNet | 250000 | 166000,222000 | 5 |
python run_train_head_on_task.py -family ofa_mbv3 --family mbv3 -task hpe2d -tag sampled --dataset lsp_cache --num_epochs 5000 --batch_size 256 -swap 10 --lr_cosine --cache_file cache/ofa_mbv3_cache_dict_n5 --data_dir data/HPE
-familyis the OFA family to train, select from one ofofa_pn,ofa_mbv3, andofa_resnet--familyis the OFA family to train, select from one ofpn,mbv3, andresnet, it should be the same as-familyexcept without theofa_prefix-task hpe2dwill execute the hpe2d code-tagcan be any string that labels the output folder--dataset lsp_cacheindicates that it should use the LSP dataset with a cache file--num_epochsis the number of epochs to train for--batch_sizeis the number of architectures per batch--lr_cosineuses a cosine learning rate--cache_fileis the prefix of the directory containing the latent representation caches--data_diris the folder path of the data directory that contains the HPE data and it should contain bothlspandmpiisubfolders
python run_train_cgs_on_task.py -family ofa_mbv3 -task detectron2 -tag shared -start_idx 10 -num_archs 15 -skip --num-gpus 2 --config-file tasks/detectron2/COCO_PanSeg_FPN_Adapted_Head.yml -chkpt cache/ofa_mbv3_detectron2_sampled/head_weights.pkl SOLVER.MAX_ITER 750 SOLVER.STEPS 465,635
-familyis the OFA family to train, select from one ofofa_pn,ofa_mbv3, andofa_resnet.-task detectron2will execute the detectron2 code-tagcan be any string that labels the output folder-start_idxis the start index of the architectures to train-num_archsis the number of architectures to train-skipuses skip connections-chkptis the file path of the shared head weights--num-gpus 2uses 2 GPUs. Depending on your computer resources, you may need to increase number of GPUs to avoid CUDA Out of memory errors--config-fileis the path to the detectron2 config fileSOLVER.MAX_ITERis the maximum number of iterationsSOLVER.STEPSare the steps at which the learning rate will be decreased
Hyperparameters
| OFA family | SOLVER.MAX_ITER | SOLVER.STEPS | SOLVER.IMS_PER_BATCH |
|---|---|---|---|
| PN | 750 | 465,635 | - |
| MBv3 | 750 | 465,635 | - |
| ResNet | 1000 | 620,850 | 12 |
python run_train_cgs_on_task.py -family ofa_mbv3 -task hpe2d -tag shared --lr_cosine --num_epochs 10 -start_idx 10 -num_archs 15 --dataset lsp_extended -chkpt saved_models/ofa_mbv3_hpe2d_sampled_head_head.pt --data_dir data/HPE
-familyis the OFA family to train, select from one ofofa_pn,ofa_mbv3, andofa_resnet.-task hpe2dwill execute the hpe2d code-tagcan be any string that labels the output folder-start_idxis the start index of the architectures to train-num_archsis the number of architectures to train--dataset lsp_extendedindicates that it should use the LSP extended dataset--num_epochsis the number of epochs to train for--batch_sizeis the number of architectures per batch--lr_cosineuses a cosine learning rate-chkptis the file path of the shared head weights--data_diris the folder path of the data directory that contains the HPE data and it should contain bothlspandmpiisubfolders
Run this profiling script to get the FLOPs and Params of all the architectures in a .pkl cache file
run_profiler.pywill take a cache file containing architectures, profile those architectures, and then overwrite the cache file(s) with new file(s) containing flops and params
python run_profiler.py -task detectron -profiler flops params -reprofile --config-file tasks/detectron2/COCO_PanSeg_FPN_Adapted_Head.yml -cache_file cache/FOLDER
-task detectronwill execute the detectron2 code-reprofilewill profile the architectures even if it has already been profiled-profilerselects the metrics to profile for-cache_fileis the path to the folder of.pklfiles to profile--config-fileis the path to the detectron2 config file
python run_profiler.py -task hpe2d -profiler flops params -reprofile --data_dir data/HPE -cache_file cache/FOLDER
-task hpe2dwill execute the hpe2d code-reprofilewill profile the architectures even if it has already been profiled-profilerselects the metrics to profile for--data_diris the folder path of the data directory that contains the HPE data and it should contain bothlspandmpiisubfolders-cache_fileis the path to the folder of.pklfiles to profile
This scripts takes the output folders from run_train_cgs_on_task.py, which contains multiple
.pklfiles and combines it into a single.pklfile
This script outputs a single file named "gpi_ofa_{family}{test_metric}{suffix}_comp_graph_cache.pkl"
python make_cg_task_cache.py -cache_dir cache/FOLDER -family ofa_mbv3 -suffix SUFFIX -test_metric "obj_det"
python make_cg_task_cache.py -cache_dir cache/FOLDER -family ofa_mbv3 -suffix SUFFIX -test_metric "inst_seg"
python make_cg_task_cache.py -cache_dir cache/FOLDER -family ofa_mbv3 -suffix SUFFIX -test_metric "sem_seg"
python make_cg_task_cache.py -cache_dir cache/FOLDER -family ofa_mbv3 -suffix SUFFIX -test_metric "pan_seg"-cache_diris the path to the folder contains all the.pklfiles to combine-familyis the OFA family to combine, select from one ofofa_pn,ofa_mbv3, andofa_resnet.-suffixis any string to uniquely identify the output cache file-test_metricis the metric/task you wish to make a cache for. Select from:obj_det,inst_seg,sem_seg, andpan_seg
python make_cg_task_cache.py -cache_dir cache/FOLDER -family ofa_mbv3 -suffix SUFFIX -test_metric "val_PCK"-cache_diris the path to the folder contains all the.pklfiles to combine-familyis the OFA family to combine, select from one ofofa_pn,ofa_mbv3, andofa_resnet.-suffixis any string to uniquely identify the output cache file, usuallylspormpii-test_metricis the metric on which accuracy is evaluated. For hpe, the test metric is justval_PCK
This script will output a model of the trained predictor as a
.ptfile to thesaved_modelsfolder
python run_gpi_acc_predictor.py -model_name MODEL_NAME -family_train nb101 -family_test ofa_mbv3_val_PCK_lsp_ind#20+ofa_mbv3_val_PCK_mpii_ind#20+ofa_mbv3_obj_det_coco_ind#20+ofa_mbv3_inst_seg_coco_ind#20+ofa_mbv3_sem_seg_coco_ind#20+ofa_mbv3_pan_seg_coco_ind#20 -fine_tune_epochs 100 -epochs 40 -num_seeds 5 -k_adapt 1 -k_epochs 100 -family_k ofa_mbv3_val_PCK_lsp_shared -tar_norm stand_flops
-model_nameis any string to uniquely identify the model-family_trainis the family to train on-family_testis the list of families to test on- Each family is separated by a
+ - The names of the tests refer to the middle text in the filename: gpi_*_comp_graph_cache.pkl
- #20 means set aside 20 archs for calculating standardization mean/s.dev and fine-tunin.
- Each family is separated by a
-fine_tune_epochsis the number of epochs to fine tune for-epochsis the number of epochs to train for-num_seedsis how many times the same code will be executed at different seed values-e_chkis the path to the checkpoint file-k_adaptis the k-adapter-k_epochsis the number of epochs to train the k-adapter-family_kis the family to train the k-adapter on-tar_normwill apply a transform, should bestandorstand_flops
We also include some files for making new compute graphs from .pb files and visualizing them.
See
make_cg.py
We provide sample .pb files for EfficientNet-b0 and ResNet18.
See
visualize_cgs.py
Need graphviz library.
Saves CGs as images which you can then view.
E.g., print pictures for the models we provided.pbfiles for, then compare to the actual model using Netron.
If you find our data or CG API useful, we kindly ask that you cite our paper:
@inproceedings{mills2023aiop,
title = {AIO-P: Expanding Neural Performance Predictors Beyond Image Classification},
author = {Mills, Keith G. and Niu, Di and Salameh, Mohammad and Qiu, Weichen and Han, Fred X. and Liu, Puyuan and Zhang, Jialin and Lu, Wei and Jui, Shangling},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2023}
}