An automated, modular ML pipeline for mapping built-up areas from Sentinel-2 satellite imagery — using spectral indices, Random Forest classification, and Microsoft Planetary Computer.
Mapping built-up (urban) areas at scale is essential for urban planning, climate risk assessment, and land-use monitoring. Traditional methods require manual digitization or expensive commercial data.
This pipeline automates the full workflow — from satellite data acquisition to a classified built-up map — using free Sentinel-2 imagery and open-source ML tools. It is designed for city-scale and regional-scale automation, supporting both research and production use.
1. Download Sentinel-2 tiles (Planetary Computer STAC API)
↓
2. Compute per-tile spectral index composites
↓
3. Build ML training dataset from training polygons / points
↓
4. Train Random Forest classifier
↓
5. Predict built-up probability & binary maps per tile
↓
6. Evaluate accuracy (F1, Precision, Recall, Confusion Matrix)
📥 Automated Sentinel-2 Downloader
- STAC API via Microsoft Planetary Computer
- Cloud filtering using scene metadata
- Downloads selected bands at 10m & 20m resolution
- Tile-aware search for large AOIs
- Resume-safe (skips already downloaded files)
🛰️ Spectral Index Processing
- Per-tile mean & median composites
- SCL cloud masking applied before aggregation
- Indices computed: NDVI, NDBI, BSI, NDWI, MNDWI
🤖 Random Forest Classification
- Supports both point and polygon training data
- Balanced sampling from polygon regions
- Outputs built-up probability raster + binary mask
- 3-fold cross-validation, F1, precision, recall reporting
🗺️ Optional OSM Building Integration
- Augment training with OpenStreetMap building footprints
- Strengthens urban class separation
| Index | Measures | Formula |
|---|---|---|
| NDVI | Vegetation density | (NIR - Red) / (NIR + Red) |
| NDBI | Built-up surfaces | (SWIR - NIR) / (SWIR + NIR) |
| BSI | Bare soil | (SWIR + Red) - (NIR + Blue) / ... |
| NDWI | Water bodies | (Green - NIR) / (Green + NIR) |
| MNDWI | Modified water | (Green - SWIR) / (Green + SWIR) |
sentinel2_builtup_pipeline/
│
├── scripts/
│ ├── download_s2_pc_by_tile.py # Sentinel-2 downloader via STAC
│ ├── mean_indices.py # Spectral index compositing
│ └── train_and_predict_builtup.py # ML training & prediction
│
├── data/
│ ├── aoi/ # Area of interest shapefile
│ ├── training/ # Training polygons or points
│ ├── osm/ # Optional OSM building footprints
│ └── sentinel/ # Downloaded satellite tiles (auto-filled)
│
├── output/
│ ├── models/ # Saved Random Forest model (.joblib)
│ ├── prediction_tiles/ # Per-tile output rasters
│ └── logs/
│
├── run.ipynb # Interactive notebook workflow
├── environment.yml
└── requirements.txt
# Create environment
conda create --name s2builtup python=3.10
conda activate s2builtup
conda install jupyter nbconvert
conda install --file requirements.txt -c conda-forgeStep 1 — Download Sentinel-2 tiles
python scripts/download_s2_pc_by_tile.py \
--outdir data/sentinel \
--aoi data/aoi/your_aoi.shp \
--year 2024 \
--cloud 5 \
--max-workers 6Step 2 — Compute spectral index composites
python scripts/mean_indices.pyStep 3 — Train classifier and generate predictions
python scripts/train_and_predict_builtup.py| File | Description |
|---|---|
*_MEAN.tif |
Per-pixel mean spectral index composite |
*_MEDIAN.tif |
Per-pixel median spectral index composite |
*_BUILTUP_PROB.tif |
Built-up probability (0–1) |
*_BUILTUP_MASK.tif |
Binary classification (built-up / non built-up) |
builtup_rf.joblib |
Saved Random Forest model |
training_summary.csv |
Accuracy metrics per run |
The pipeline reports per-run:
- F1 Score
- Precision & Recall
- Confusion Matrix
- 3-fold Cross-Validation scores
- UNet / DeepLab deep learning segmentation module
- Time-series built-up change detection
- Zonal statistics aggregation to admin boundaries
- Web map visualization with Leaflet or Kepler.gl
Athithiyan M R — Geospatial Data Scientist | Remote Sensing | Climate Analytics
- ESA Sentinel-2 Mission
- Microsoft Planetary Computer & STAC API
- OpenStreetMap contributors
MIT License © 2026 Athithiyan M R