(Deep Learning Course Project – Universidad Politécnica de Madrid, Master in Digital Innovation – EIT Digital)
Authors:
- Ádám Földvári
- Joseph Tartivel
- Máté Lukács
Instructor: Roberto Valle
This repository contains the final project for the Deep Learning course at UPM. The objective was to classify objects in high-resolution satellite imagery using progressively advanced deep learning techniques:
- Feedforward Neural Networks (FFNN)
- Regularized FFNNs
- Convolutional Neural Networks (CNNs)
- Transfer Learning with ResNet50
- Source: xView Dataset
- Type: High-resolution satellite images (0.3m GSD, WorldView-3)
- Split: 761 training images, 85 test images
- Processed: 21,377 training objects and 2,635 test objects (cropped & resized to 224×224)
- Classes: 12 categories (e.g., building, small car, cargo plane, helicopter)
Final testing and evaluation were conducted via a private competition on Codabench. Submissions were provided in the required JSON format and benchmarked against a hidden test set.
.
├── ffnn.ipynb # Feedforward Neural Network experiments
├── reg.ipynb # Regularization strategies for FFNNs
├── cnn.ipynb # Custom Convolutional Neural Networks
├── tl.ipynb # Transfer Learning with ResNet50
├── DeepLearning_JT_AF_ML_finalReport.pdf # Full technical report
└── README.md # Project overview and methodology
Each notebook corresponds to a development phase, with models iteratively refined at each stage.
- Compared shallow vs. deep architectures using flattened image inputs.
- Observed overfitting in deeper models due to loss of spatial structure.
- Best test accuracy: 45.2%
- Applied batch normalization, dropout, and extended training.
- Improved generalization significantly.
- Best test accuracy: 55.18%
- Developed and refined five CNN architectures.
- Integrated data augmentation, L2 regularization, batch normalization, dropout, and custom LR schedules.
- Best test accuracy: 76.36%
- Employed a two-stage strategy:
- Feature extraction with frozen layers.
- Selective fine-tuning of top layers with lower LR.
- Achieved highest overall performance with reduced training time.
- Best test accuracy: 77.87%
| Model | Test Accuracy | Precision | Recall |
|---|---|---|---|
| FFNN (Simple) | 45.2% | 30.31% | 33.66% |
| FFNN + Regularization | 55.18% | 44.95% | 55.50% |
| Custom CNN | 76.36% | 74.00% | 74.53% |
| Transfer Learning (ResNet50) | 77.87% | 67.15% | 77.47% |
- Spatially-aware architectures (CNNs, ResNet50) are critical for image classification tasks.
- Regularization substantially improves generalization for non-convolutional models.
- Class imbalance remains a challenge, especially for minority categories (e.g., helicopters).
- Transfer learning offered the best trade-off between accuracy, recall, and development time.
- Platform: Kaggle with P100 GPU acceleration
- Evaluation: Codabench private competition with hidden test set
- Submission format: JSON files for leaderboard evaluation
For complete methodology, experiments, and analysis, see the Final Report.



