A deep learning project for object detection and instance segmentation on the ARMBench dataset using Mask R-CNN with PyTorch.
This implements object detection and instance segmentation for robotic perception tasks using the ARMBench dataset. The implementation uses Mask R-CNN with a ResNet50 backbone and Feature Pyramid Network (FPN), with custom architectural modifications to enhance segmentation performance.
- Instance Segmentation: Detects and segments objects (totes and objects) in robotic manipulation scenarios
- Multiple Test Scenarios: Evaluates model performance on:
- Mix-object-tote dataset
- Same-object-transfer set
- Zoomed-out-tote-transfer set
- Two Training Configurations:
- Small dataset (100 training images, 30 test images)
- Large dataset (1000 training images, 300 test images)
- Model Improvements: Enhanced Mask R-CNN predictor with additional convolutional layers and ReLU activations
Object-Segmentation-on-ARMBENCH/
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ Object Segmentation on ARMBench.pptx # Project presentation
β
βββ notebooks/ # Data preprocessing notebooks
β βββ ARMBENCH_json_file_conversions_100.ipynb # Prepare 100-image dataset
β βββ ARMBENCH_json_file_conversions_1000.ipynb # Prepare 1000-image dataset
β
βββ scripts/ # Training & evaluation scripts
β βββ object_detection_and_segmentation_on_armbench_100.py # Baseline (100 images)
β βββ object_detection_and_segmentation_on_armbench_100_improvement.py # Improved (100 images)
β βββ object_detection_and_segmentation_on_armbench_1000.py # Baseline (1000 images)
β βββ object_detection_and_segmentation_on_armbench_1000_improvement.py # Improved (1000 images)
β
βββ visualization/ # Visualization scripts
βββ armbench_object_detection_and_segmentation_visulaization.py
The project uses the ARMBench Segmentation Dataset v0.1, which contains images of robotic manipulation scenarios with COCO-format annotations.
Download Dataset:
wget https://armbench-dataset.s3.amazonaws.com/segmentation/armbench-segmentation-0.1.tar.gz
tar -xzf armbench-segmentation-0.1.tar.gzDataset Structure:
mix-object-tote/: Main training and testing imagessame-object-transfer-set/: Transfer learning test setzoomed-out-tote-transfer-set/: Zoomed-out test scenarios
Dataset Splits:
- 100-image configuration: 100 train, 30 test (per test set)
- 1000-image configuration: 1000 train, 300 test (per test set)
The Jupyter notebooks in the notebooks/ directory handle:
- Extracting subsets of images from the full dataset
- Creating corresponding COCO annotation JSON files
- Generating Excel files with image lists
- Copying selected images to organized folders
- Python 3.7+
- CUDA-capable GPU (recommended)
- CUDA Toolkit and cuDNN (for GPU acceleration)
- Clone this repository:
git clone <repository-url>
cd Object-Segmentation-on-ARMBENCH- Install dependencies:
pip install -r requirements.txt- Download the ARMBench dataset (see Dataset section above)
- Base Architecture: Mask R-CNN with ResNet50-FPN backbone
- Pretrained Weights: COCO pretrained
- Classes: 3 (Background, Tote, Object)
Enhanced Mask R-CNN with modified mask predictor:
- Additional convolutional layer for better feature representation
- ReLU activations between conv layers
- Improved mask prediction capability
class ModifiedMaskRCNNPredictor(nn.Module):
def __init__(self, in_channels, hidden_layer, out_channels):
super(ModifiedMaskRCNNPredictor, self).__init__()
self.conv1 = nn.Conv2d(in_channels, hidden_layer, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(hidden_layer, hidden_layer, kernel_size=3, padding=1) # Intermediate layer
self.conv3 = nn.Conv2d(hidden_layer, out_channels, kernel_size=3, padding=1)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = self.conv3(x)
return xFor 100-image improvement model:
python scripts/object_detection_and_segmentation_on_armbench_100_improvement.pyFor 1000-image improvement model:
python scripts/object_detection_and_segmentation_on_armbench_1000_improvement.pyNote: The scripts are originally designed for Google Colab. For local execution, modify paths accordingly.
Models are evaluated using:
- mAP (Mean Average Precision): Primary metric using COCO evaluation
- IoU Thresholds: Standard COCO metrics (@0.5, @0.75, @0.5:0.95)
Test sets:
- Mix-object-tote test set
- Same-object-transfer set
- Zoomed-out-tote-transfer set
The visualization script in the visualization/ directory provides:
- Annotated images with bounding boxes
- Colored instance masks
- Class labels on detected objects
python visualization/armbench_object_detection_and_segmentation_visulaization.pyimport torch
from PIL import Image
# Load trained model
model = torch.load("model_100.pt")
model.eval()
# Perform segmentation
img_path = "path/to/your/image.jpg"
img, pred_classes, masks = instance_segmentation(img_path, model, rect_th=5, text_th=4)
# Display results
import matplotlib.pyplot as plt
plt.imshow(img)
plt.show()The project evaluates model performance on three test scenarios:
- Mix-tote-object test: Standard test set
- Same-object-transfer: Transfer learning on same objects
- Zoomed-out-tote: Generalization to different viewing angles
Results are measured using mAP (mean Average Precision) at various IoU thresholds.
See requirements.txt for complete list of dependencies.
Key libraries:
- PyTorch & TorchVision
- pycocotools
- OpenCV
- NumPy
- Matplotlib
- Pillow
- ARMBench Dataset Creators: For providing the comprehensive segmentation dataset
- PyTorch Team: For the excellent deep learning framework
- TorchVision Team: For pre-trained models and utilities
- COCO Team: For the standardized evaluation metrics and tools
This project is licensed under the MIT License - see the LICENSE file for details.
- Surendhar Bandari