This repository hosts the notebook and all the needed information to replicate the Nerual Networks Exam project based upon the work presented in the seminal paper "Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks" by N. Wu et al. .
The purpose of the study is to prove the probably counterintuitive hypothesis that in the context of multimodal learining methods one modality overcomes the other in a greedy manner, crippling the beneficial effects of said learning methods. The analysis begins with the implementation of an already well known project "MMTM: Multimodal Transfer Module for CNN Fusion" by H.R.V. Joze et al. to which a corrective method is applied to counter its greedy nature. The original MMTM and its proposed correction are run and compared through the use of custom defined metrics aimed at pointing out the distribution of the learning process over the various modalities of the multimodal learning process.
More in depth the model uses two modalities and the proposed metric to evaluate their distribution is defined as the Conditional Utilisation Rate (CUR). The CUR is the relative change in accuracy between the two models within each pair. For example, u(m0|m1) measures the marginal contribution that m0 has in increasing the accuracy of the prediction of the fucntion of modality 1 and u(m1|m0) vice versa.
The goal is to have the difference between the two CURs as low as possible. Since CURs are designed to be measured after training, a new metric is defined: Conditional Learning Speed (CLS). The conditional learning speed, s(m1|m0;t), is the log-ratio between the learning speed of the parameter from the fusion module and the original parameter of the uni-modal branch of modality 0. Same goes for s(m0|m1;t) for modality 1.
The goal therefore becomes to have the minimal difference between CLSs, defined in the script as d_CLS.
The chosen dataset is Modelnet40 amidst the ones proposed by the paper. More specifically 12 views for each render of the pointcloud files.
All the code needed to run training, evaluation and testing of the experiment is contained in a single notebook. In order to run the code correctly, follow these steps:
- Create a main directory and place the notebook there
- Download the ModelNet dataset here (or follow the below commands), extract the dataset and place it in the main folder.
- Create a logging and checkpoints folder.
You can replicate the above steps using these instructions:
mkdir greedy_multimodal_learning_main
cd greedy_multimodal_learning_main
curl -o ModelNet.tar.gz http://supermoe.cs.umass.edu/shape_recog/shaded_images.tar.gz
mkdir logging checkpointsFollowing the best practice from the original paper, if you are running for the first time the full experiment, remember to change the parameter make_npy_files to True in order to create the .npy files, ONLY FOR THE FIRST RUN. If the .npy files were already created, so after running for the first time, remember to change the parameter make_npy_files back to False.
Once finished, change make_npy_files to False and run the cells from the beginning.
Now you will be able to correctly run the notebook.