For the Break Through Tech AI Spring project, we partnered with The New York Botanical Garden. The New York Botanical Garden(NYBG) herbarium houses over 7.8 million plant and fungal specimens, offering invaluable insights into plant diversity and ecological changes over time. However, approximately 10% of the digitized images in the database are classified as “non-standard,” including images of animals and color illustrations, which hinder researchers' ability to conduct meaningful machine learning studies.
This project aims to develop a machine learning model that automatically classifies and filters out non-standard images, facilitating more efficient dataset curation for biodiversity research.

- Objective: To create a robust image classification model capable of distinguishing between standard and non-standard herbarium images.
- Target Audience: Researchers and scientists utilizing NYBG’s herbarium data for ecological studies, biodiversity analysis, and conservation efforts.
- Impact: Streamline the data curation process, enabling researchers to focus on significant biodiversity research while enhancing the usability of the dataset.
For this project, we classified 122,880 samples of specimens into the 10 image classes.
- Occuluded Specimens
- Microscope Slides
- Illustrations (Color)
- Animal Specimens
- Live Plants
- Biocultural Specimens: Human made objects; brooms, carpets, etc.
- Illustrations (Gray)
- Mixed Pressed Specimens
- Ordinary Pressed Specimens
- Micrographs Transmission Light
Our most accurate model lies in the 'epoch.ipynb' file. Through a Tensorflow Xception Model, we achieved an accuracy score over 90%. This competition was sourced through kaggle and you can find our team's submission here: https://www.kaggle.com/competitions/bttai-nybg-2024/overview
- Exploratory Data Analysis
- Model Creation
- Hyperparameter Tuning
- Performance
Note
Our data was previously seperated into 3 datasets. Training data with 81,946 rows and 5 columns. Validation data with 10,244 rows and 5 columns. Test data with 30,690 rows and 2 columns.
Our training dataset is compromised of 5 columns: 'uniqueID', 'classLabel', 'classID', 'source', 'imageFile'. The two columns 'classLabel', 'classID' correspond to the label we will be predicting. The 'source' column is what organization it came from. This can hold some correlation that we can explore-- like if more microscopic specimens come from a certain laboratory/foundation etc. more than others within 'sources'.
Correlation is a statistical measure that expresses the extent to which two variables are linearly related. In our example, if a source provides a significantly larger number of samples compared to the other sources for a specific class, you can say there is a correlation-- a strong association between that source and class.
Here is the breakdown of the classes: 
We can create a countplot to visualize the contributions per source

This isn't as easy to read so we can look at individual class labels:
And after looking at one of the labels, 'microscope-slides', we can see that of the 37 different sources, only 3 contribute microscope slides with the majority provided by 'L'.

Same with 'illustrations-color' with the majority sourced by 'BHL'

Or we can look at the first half of the sources that have most variation and/or samples from the training data.

Still not the best but it's a way to see if there is correlation within the source column and a specific class and there seems to be for some.
- Exploratory Data Analysis
- Model Creation
- Hyperparameter Tuning
- Performance
For our model, we are using a pretrained Xception model via Keras for the image classification. Xception by Google, stands for Extreme version of Inception. We previously tried VGG16, K Nearest Neighbors, Yolo, ---- and found that it was either too computationally exhaustive, wasn't accurate, or ...
Standard convolution (cnn) learns filters in 3D space, with each kernel learning width, height, and channels.
Whereas, a depthwise separable convolution (Xception) divides the process into two distinctive processes using depth-wise convolution and pointwise convolution:
Depthwise Convolution: Here, a single filter is applied to each input channel separately. For example, if an image has three color channels (red, green, and blue), a separate filter is applied to each color channel.
Pointwise Convolution: After the depthwise convolution, a pointwise convolution is applied. This is a 1×1 filter that combines the output of the depthwise convolution into a single feature map.
https://keras.io/api/applications/xception/
- Exploratory Data Analysis
- Model Creation
- Hyperparameter Tuning
- Performance
The Xception model has the same number of parameters as Inception model.
- include_top: whether to include the 3 fully-connected layers at the top of the network.
- weights: one of None (random initialization), "imagenet" (pre-training on ImageNet), or the path to the weights file to be loaded.
- input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input for the model.
- input_shape: optional shape tuple, only to be specified if include_top is False (otherwise the input shape has to be (299, 299, 3). It should have exactly 3 inputs channels, and width and height should be no smaller than 71. E.g. (150, 150, 3) would be one valid value.
- pooling: Optional pooling mode for feature extraction when include_top is False. None means that the output of the model will be the 4D tensor output of the last convolutional block. avg means that global average pooling will be applied to the output of the last convolutional block, and thus the output of the model will be a 2D tensor. max means that global max pooling will be applied.
- classes: optional number of classes to classify images into, only to be specified if include_top is True, and if no weights argument is specified.
- classifier_activation: A str or callable. The activation function to use on the "top" layer. Ignored unless include_top=True. Set classifier_activation=None to return the logits of the "top" layer. When loading pretrained weights, classifier_activation can only be None or "softmax".
- name: The name of the model (string).
base_model = Xception(
weights='imagenet',
include_top=False,
classifier_activation='softmax'
)
We set the weights to 'imagenet' for the pretrained model. The include_top=False sets model to output features from the last convolutional block instead of class probabilities. Then we'll set input/outputs and run a Keras base model.
- Exploratory Data Analysis
- Model Creation
- Hyperparameter Tuning
- Performance
After hyperparameter tuning and many epochs, here is out performance after 10 epochs loss: 0.0312 - accuracy: 0.9444 - val_loss: 0.0289 - val_accuracy: 0.9463. And this was after we froze some layers of the model and updated others. It only led to a 0.04 increase in accuracy but stayed resistant to overfitting.
- Exploratory Data Analysis
- Model Creation
- Hyperparameter Tuning
- Performance