This project implements and compares three deep learning models—U-Net, ResNet, and Vision Transformer (ViT)—for segmenting grain boundaries in stainless steel microstructure images. The dataset used is the ExONE Stainless Steel 316L Grains 500X from Kaggle, containing grayscale images (Grains) and their corresponding segmentation masks (Segmented).
The goal is to train and evaluate neural networks to accurately segment grain boundaries and estimate grain sizes in stainless steel images. The models are implemented in PyTorch, trained on a paired dataset, and evaluated using multiple metrics.
- U-Net: A convolutional neural network with an encoder-decoder structure and skip connections, optimized for biomedical image segmentation.
- ResNet: A residual network adapted for segmentation with skip connections, dropout regularization, and a custom decoder.
- Vision Transformer (ViT): A transformer-based model using patch embeddings and self-attention, adapted for pixel-level segmentation.
- Python 3.8+
- PyTorch 1.8+ with CUDA support (for GPU acceleration)
- Kaggle account and API token (to download the dataset)
- Clone the repository:
git clone https://github.com/adiManethia/DeepGrain cd DeepGrain - Install dependencies:
pip install -r requirements.txt
- Download the dataset: Add the dataset via Kaggle notebook or download manually to /kaggle/input/exone-stainless-steel-316l-grains-500x.
- Open the script in a Python environment (e.g., Jupyter Notebook, Kaggle Notebook.
- Ensure CUDA is available if using GPU.
- Run the script- script will load, preprocesss the dataset, train all three models for 25 epochs and evaluate and visualize results.
- Test Loss ( BCEWithLogitsLoss):
Binary cross-entropy with logits loss measures the per-pixel difference between predicted logits and true binary masks.
Lower value indicates better pixel-level accuracy.
Loss = -1/N * Σ [y_i * log(σ(x_i)) + (1-y_i) * log(1-σ(x_i))]
- Dice score:
Measures the overlap between predicted and true segmentation masks, emphasizing segmentation quality.
Higher value indicate better overlap
Dice = (2 * |P ∩ T| + ε) / (|P| + |T| + ε)
- Grain Size Difference:
Compares the average predicted grain size to the true grain size, calculated via line intercept method.
For our case:
- Positive: Predicted grains are larger (fewer edges, e.g., U-Net: 5.74).
- Negative: Predicted grains are smaller (more edges, e.g., ResNet: -4.98).
- Closer to 0 indicates better grain size estimation.
Final Model Comparison:
Model Dice Score Grain Size Diff Test Loss
-----------------------------------------------
U-Net 0.8916 5.74 0.3069
ResNet 0.8758 -4.98 0.7256
ViT 0.8695 -0.48 0.3524 - U-Net: Best performer (lowest loss, highest Dice), slightly overestimates grain size.
- ResNet: Improved with skip connections; high loss suggests noisy predictions.
- ViT: Balanced performance, minor underestimation of grain size.
- Tune ResNet’s dropout or learning rate for lower test loss.
- Experiment with ViT’s patch size (e.g., 8) for finer granularity.
- Apply post-processing (e.g., morphological operations) to smooth predictions.
- Use a separate validation set for early stopping.
