⚠️ DEVELOPMENT STATUS: This repository is under active development. Scripts, file paths, and workflows are subject to change. Please report issues or suggestions via GitHub Issues.
This repository implements a modular pipeline for tree crown segmentation using high-resolution aerial imagery and LiDAR data. It supports model training and evaluation with the DeepForest deep learning model and is designed to accommodate data from multiple spatial resolutions and sources, including NEON, NAIP, and MAXAR.
The workflow unfolds in five major phases:
- Imagery Acquisition & Processing - Curate high-resolution remote sensing data from NEON (10 cm), NAIP (30 cm & 60 cm), and MAXAR (in development).
- Model Implementation DeepForest - Run the out-of-the-box DeepForest model on various imagery types, then fine-tune on custom training data
- Training Data Generation - Combine curated imagery with LiDAR data to extract individual tree crowns and generate bounding boxes for supervised learning
- DeepForest Model Fine-Tuning and Application - Apply both pretrained and fine-tuned models to different regions and imagery types
- Model Comparison & Evaluation (in development) - Quantitatively evaluate and compare the performance of off-the-shelf and fine-tuned models across spatial and resolution contexts
This pipeline requires multiple types of geospatial data. Below are details on each data source, including acquisition methods, formats, and storage considerations.
- Resolution: 10 cm (native), resampled to 30 cm and 60 cm for multi-resolution analysis
- Format: GeoTIFF (.tif)
- Acquisition: Downloaded via NEON Data Portal API
- Data Product: DP3.30010.001 (High-resolution orthorectified camera imagery mosaic)
- Scripts:
Imagery/NEON_image_download.R, submitted viaImagery/NEON_image_download.slurm
- Resampling: Use
NEON_resample.sh(submitted viaNEON_resample.slurm) to resample from 10 cm to 30 cm and 60 cm usinggdalwarp - Coordinate System: UTM (zone varies by site; e.g., UTM Zone 19N for BART and HARV)
- Tile Structure: 1 km × 1 km tiles following NEON's naming convention
- Storage Requirements: ~500 GB per site for full resolution imagery
- Resolution: 30 cm and 60 cm
- Format: GeoTIFF (.tif)
- Acquisition: Downloaded from Google Earth Engine
- Processing:
- NAIP imagery does not align with NEON's 1 km × 1 km tile grid
- Use
Imagery/NAIP_retile.R(submitted viaNAIP_retile.slurm) to mosaic and crop NAIP tiles to match NEON tile boundaries
- Bands: RGB (3-band) or RGBN (4-band, includes Near-Infrared)
- Coordinate System: Must be reprojected to match NEON imagery (UTM)
- Storage Requirements: ~200 GB per site
LiDAR data is essential for training data generation, crown height extraction, and filtering detections below canopy height thresholds.
-
Classified Point Cloud (LAS/LAZ)
- Data Product: DP1.30003.001
- Format: LAZ (compressed LAS)
- Use: Tree segmentation, canopy structure analysis
- Scripts:
LiDAR/LiDAR_download.R, submitted viaLiDAR/LiDAR_download.slurm
-
Canopy Height Model (CHM)
- Data Product: DP3.30015.001
- Format: GeoTIFF (.tif)
- Resolution: 1 m
- Use: Filtering crown detections (removing detections where CHM < 3 m), extracting tree heights
-
Digital Terrain Model (DTM)
- Data Product: DP3.30024.001
- Format: GeoTIFF (.tif)
- Use: Normalizing point cloud heights
-
Coordinate System: Same UTM zone as imagery
-
Storage Requirements: ~100 GB per site for all LiDAR products
- Resolution: ~30 cm (pan-sharpened RGB)
- Format: GeoTIFF (.tif) with XML metadata
- Acquisition: Obtained via institutional access or commercial license
- Processing:
Imagery/MAXAR_unzip_sort_files.sh- Unzips and organizes data into Mono and RGB foldersImagery/MAXAR_extract_bbox_fromXML.shandMAXAR_make_bbox.R- Extracts bounding boxes from XML metadata to assess coverage
- Bands: RGB, RGB+NIR, RGB+NIR+SWIR (depending on product)
- Status: Coverage assessment in progress; full integration pending
- Format: Shapefiles (.shp) with bounding box polygons
- Creation: Manual annotation in QGIS using imagery and LiDAR products as reference
- Annotation Support:
developTrainingData.Rassists with visualization and data preparation- Uses NEON field data, LiDAR-derived tree crowns (via
lidRwatershed segmentation), and existing crown annotations from Weinstein et al. (2019)
- Uses NEON field data, LiDAR-derived tree crowns (via
- Storage Location:
- Training:
./Imagery/NAIP/Training/bbox/ - Testing:
./Imagery/NAIP/Testing/bbox/
- Training:
- Expected Format: Each shapefile contains polygons representing individual tree crowns with an optional
labelcolumn (defaults to "Tree")
Expected directory structure for organizing data:
Crown_Segmentation/
├── Imagery/
│ ├── NEON/
│ │ └── DP3.30010.001/neon-aop-products/YYYY/FullSite/DXX/YYYY_SITE_X/L3/Camera/Mosaic/
│ │ └── YYYY_SITE_X_XXXXXX_image.tif
│ ├── NAIP/
│ │ ├── SITE/
│ │ │ ├── 30cm/match_NEON/
│ │ │ │ └── NAIP_30cm_SITE_X_XXXXXX.tif
│ │ │ └── 60cm/match_NEON/
│ │ │ └── NAIP_60cm_SITE_X_XXXXXX.tif
│ │ ├── Training/
│ │ │ ├── bbox/ # Manual annotations for training
│ │ │ └── Crop_Images/ # Cropped training images and annotations.csv
│ │ └── Testing/
│ │ └── bbox/ # Manual annotations for testing
│ └── MAXAR/
│ ├── RGB/
│ └── Mono/
├── LiDAR/
│ └── NEON/
│ └── SITE/
│ ├── DP1.30003.001/.../ClassifiedPointCloud/
│ ├── DP3.30015.001/.../CanopyHeightModelGtif/
│ └── DP3.30024.001/.../DTMGtif/
├── Outputs/
│ └── PRODUCT/
│ └── SITE/
│ └── [model output shapefiles]
└── Shapefiles/
├── LiDAR_Tiles.shp # Reference tiles for processing
└── SITE_AOP.shp # AOP coverage extent
- NEON Data: Free and publicly available via NEON Data Portal
- NAIP Data: Free via Google Earth Engine or USDA NAIP
- MAXAR Data: Requires commercial license or institutional access
# Download imagery
Rscript Imagery/NEON_image_download.R
# Or submit as SLURM job
sbatch Imagery/NEON_image_download.slurm
# Resample to 30 cm and 60 cm
sbatch Imagery/NEON_resample.slurm- Download via Google Earth Engine: https://code.earthengine.google.com/1b8ec0419479e1c448f0dbd275e1a8af
- Retile to match NEON grid:
Rscript Imagery/NAIP_retile.R
# Or submit as SLURM job
sbatch Imagery/NAIP_retile.slurm# Unzip and organize files
bash Imagery/MAXAR_unzip_sort_files.sh
# Extract bounding boxes for coverage assessment
bash Imagery/MAXAR_bbox_run.shRun the pretrained DeepForest model on different imagery types to establish baseline performance.
Script naming convention: <Site>_<spatialResolution>_<dataSource>_prebuilt.py
Examples:
BART_10cm_prebuilt.py- Validate model on NEON 10 cm imagery (model training resolution)BART_30cm_prebuilt.py- Test model on resampled NEON 30 cm imageryBART_30cm_NAIP_prebuilt.py- Test model on NAIP 30 cm imageryBART_60cm_NAIP_prebuilt.py- Test model on NAIP 60 cm imagery
Submission:
sbatch DeepForest.shManual annotation of tree crowns in QGIS is supported by automated data preparation and visualization.
Script: developTrainingData.R
Workflow:
- Load NEON and NAIP imagery at multiple resolutions
- Load LiDAR products (LAS point cloud, CHM, DTM)
- Perform automated tree crown segmentation using
lidR::watershed()for visual reference - Load existing tree crown annotations (e.g., Weinstein et al. 2019)
- Visualize field data (growth form, canopy position) overlaid on imagery
- Manually annotate tree crowns in QGIS as bounding box polygons
- Convert annotations to DeepForest training format (CSV with image-relative coordinates)
Key Functions:
shapefile_to_annotations()- Converts QGIS shapefiles to DeepForest annotation CSV- Output:
./Imagery/NAIP/Training/Crop_Images/annotations.csv
LiDAR Download:
Rscript LiDAR/LiDAR_download.R
# Or submit as SLURM job
sbatch LiDAR/LiDAR_download.slurmScript: BART_30cm_NAIP_TrainModel.py
Trains a DeepForest model on custom annotations from BART (New Hampshire) for geographic generalization testing on HARV (Massachusetts).
Scripts:
BART_30cm_NAIP_Trained.py- Apply trained model to BART siteHARV_30cm_NAIP_Trained.py- Apply trained model to HARV site
Key Features:
- Runs inference with configurable patch size and overlap
- Filters detections using CHM (removes detections where canopy height < 3 m)
- Outputs shapefiles with tree crown bounding boxes
Output Format: {PRODUCT}{resolution}cm_trained_model_p{PATCH}_o{OVERLAP}_t005_f{field_of_view}_{SITE}_{TILE}.shp
Example:
NAIP30cm_trained_model_p400_o050_t005_f120_BART_123456.shp
Model evaluation is performed using DeepForest's built-in evaluation framework, comparing predictions against manually annotated ground truth data from geographically isolated test sites.
Script: HARV_30cm_NAIP_Trained_Evaluate.py
This script evaluates trained models on test annotations from HARV (Massachusetts), having been trained on BART (New Hampshire) data for geographic generalization testing.
Key Features:
- Loads trained model weights from
./TrainedModel - Runs inference on test images from
./Imagery/NAIP/Testing/Crop_Images/ - Computes evaluation metrics using DeepForest's
evaluate.evaluate_boxes()function - Generates visualization plots comparing predictions vs ground truth
- Exports detailed results to CSV
Evaluation Metrics:
- Intersection over Union (IoU) - Spatial overlap between predicted and ground truth bounding boxes
- Mean Average Precision (mAP) - Detection accuracy across confidence thresholds (in development)
- Precision - Proportion of correct detections among all predictions (True Positives / All Predictions)
- Recall - Proportion of ground truth trees successfully detected (True Positives / All Ground Truth)
- True Positive Count - Number of correctly detected trees
Outputs:
HARV_Eval_Detections.csv- Detailed detection results per treeHARV_Eval_Figures/- Visualization plots showing predictions (pink/salmon) overlaid on ground truth (white) for each test image
DeepForest model outputs are shapefiles containing tree crown bounding boxes:
Attributes:
geometry- Bounding box polygon (UTM coordinates)score- Model confidence score (0-1)label- Tree label (default: "Tree")
Location: ./Outputs/{PRODUCT}/{SITE}/
Crown segmentation outputs from this repository feed into the ScalingAcrossResolutions repository for size-abundance analysis. The downstream workflow:
-
Crown Metrics Calculation (
GenerateDatasetsIndv.R)- Assigns crowns to 1-hectare grid cells
- Calculates crown area, perimeter, and diameter
- Extracts tree height from CHM
- Estimates diameter at breast height (DBH) using allometric equations
-
Exported Data Format: CSV files with columns:
crown_id- Unique crown identifiergrid_id- 1-hectare grid cell assignmentArea- Crown area (m²)Perimeter- Crown perimeter (m)Diameter- Crown diameter (m)Max_Height- Maximum tree height from CHM (m)DBH- Estimated diameter at breast height (cm)
-
Output Location:
../ScalingAcrossResolutions/CrownDatasets/
These tree-level metrics enable Bayesian size-abundance modeling to recover complete forest size distributions from remotely sensed data, addressing canopy occlusion biases. See the ScalingAcrossResolutions repository for details on size-abundance parameter recovery.
- Script:
TreeAnnotation.R - Submission:
TreeAnnotation.sh - Purpose: Process tree crown annotations from Weinstein et al. (2019) for use as reference data
- Python 3.8+
deepforest- Deep learning tree crown detectionrasterio- Geospatial raster I/Ogeopandas- Vector geospatial operationstorch- PyTorch deep learning frameworknumpy,pandas- Data manipulationscikit-learn- Machine learning utilitiesPillow- Image processing
sf- Simple features for vector dataraster,terra- Raster data processingrgdal- Geospatial data abstractionlidR- LiDAR data processingitcSegment- Individual tree crown segmentationneonUtilities- NEON data accessgeoNEON- NEON geolocation utilities
- SLURM scheduler - For HPC job submission
- GDAL - Geospatial data abstraction library
- PROJ - Cartographic projections library
- DeepForest: Weinstein, B.G., et al. (2019). Individual tree-crown detection in RGB imagery using semi-supervised deep learning neural networks. Remote Sensing, 11(11), 1309.
- NEON: National Ecological Observatory Network. https://www.neonscience.org/