The following project contains a ConvNet architecture implemented in pytorch to classify the following four types of butterflies found in Costa Rica:
- Siproeta Stelenes
- Morpho Helenor
- Euptoieta Hegesia Meridania
- Biblis Hyperia Aganisa
To use the code, certain python packages are required. You can install them by running the following command:
pip install -r requirements.txtThe project contains the following directories when you clone it.
butterfly_classificator/
├── checkpoints/
│ └── checkpoint.pth
├── results/
│ ├── epoch_metrics.csv
│ ├── final_results.csv
│ ├── testing_matrix.csv
│ └── training_matrix.csv
├── src/
│ ├── convnet.py
│ ├── csv_writer.py
│ └── trainer.py
└── utils/
├── data_augmenter.py
├── preprocessor.py
└── splitter.py
- In
resultsyou will find.csvfiles with the results from the last training execution. These can be used to plot the data. - In
srcyou will find the relevant files to train the ConvNet model. - In
utilsyou will find all files related to preprocessing the images.
To run any file with code you will need a version of the dataset.
The dataset for the project was constructed from photos taken from Inaturalist, you can download three datasets that were created for this project:
- Initial dataset which contains the photos without being split.
- Split dataset which contains the photos split into a training, testing and validation split with a 80-10-10 proportion.
- Balanced dataset which contains a balanced version of the dataset through data augmentation.
If you want to create your own split, you can use the auxiliary tools in the utils directory to:
- Resize the images.
- Split the data.
- Balance the classes.
The images were resized to 200x200 pixels to be easier to process by the model, to do this you must have a data directory with the images and execute the following command:
python preprocessor.pyThis will create a new directory titled preprocessed_data.
To split the data you must have the preprocessed_data directory from the previous step and execute the following command:
python splitter.pyThis will create a new directory titled split_datawith the following splits:
- Training: 80% of the images.
- Testing: 10% of the images.
- Validation: 10% of the images.
If you want to modify these splits you can change the following line in the code:
splitfolders.ratio(input_path, output=output_path, seed=18, ratio=(0.80, 0.10, 0.10))Once you either downloaded the dataset or created your own version you can train the model by running the following command inside the src directory:
python trainer.pyThis will update the .csv files inside the results directory and will output the training process and final results for the following information for training and validation:
- Loss
- Macro Accuracy
- Macro F1
- Macro Precision
- Macro Recall
- Confusion matrix
To observe the result you can use the following colab notebook in which you can graph the metric results and each confusion matrix from the .csv files in the results directory.
