This project implements an emotion recognition model using a pre-trained ResNet18 architecture to classify facial expressions into seven distinct emotions: anger, disgust, fear, happiness, sadness, surprise, and neutral. The project leverages deep learning techniques with PyTorch and includes data preprocessing, model training, evaluation, and visualization of results.
- Project Structure
- Dataset
- Requirements
- Installation
- Usage
- Model Architecture
- Training and Evaluation
- Results and Visualizations
- Contributing
- License
The project is organized as a Jupyter Notebook (emotion.ipynb) with the following key components:
- Data Loading: Downloads and processes the dataset from Google Drive.
- Data Preprocessing: Splits the dataset into training, evaluation, and test sets, and applies image transformations.
- Model Definition: Uses a modified ResNet18 model for emotion classification.
- Training: Trains the model using the Adam optimizer and CrossEntropyLoss.
- Evaluation: Evaluates model performance with accuracy metrics and confusion matrix.
- Visualization: Displays sample images with predicted and true labels, along with emotion distribution plots.
The dataset consists of grayscale facial images (48x48 pixels) labeled with one of seven emotions. The data is split into:
- Training Set: Used to train the model.
- Evaluation Set: Used to validate model performance during training.
- Test Set: Used for final predictions.
The dataset is stored in parquet files (df_train.parquet.gzip and df_test.parquet.gzip) and is downloaded from Google Drive links provided in the notebook.
- Open the
emotion.ipynbnotebook in Jupyter. - Run the cells sequentially to:
- Download the dataset.
- Preprocess the data.
- Train the model.
- Evaluate the model on the validation set.
- Generate predictions for the test set.
- Visualizations, such as the confusion matrix and sample image predictions, will be displayed during execution.
The model is based on a pre-trained ResNet18 architecture with the following modifications:
- Backbone: The fully connected layer of ResNet18 is removed, retaining the convolutional layers.
- Custom Layers:
- A linear layer (512 → 100) with ReLU activation.
- A final linear layer (100 → 7) with Softmax activation for 7-class classification.
- Input Processing: Images are resized to 224x224, normalized, and converted to 3-channel tensors to match ResNet18's input requirements.
- Training: The model is trained for 30 epochs using the Adam optimizer (learning rate = 1e-4) and CrossEntropyLoss.
- Evaluation: Validation accuracy and loss are computed after each epoch. A confusion matrix is generated to analyze model performance across emotion classes.
- Data Augmentation: Random rotation is applied to training images to improve generalization.
- Emotion Distribution: A bar plot shows the distribution of emotions in the training set.
- Sample Predictions: Randomly selected images from the evaluation set are displayed with their predicted and true labels.
- Confusion Matrix: Visualizes the model's classification performance across all emotions.
- Test Predictions: The model generates predictions for the test set, which can be used for submission or further analysis.