Agribound Examples

This directory contains example scripts and Jupyter notebooks demonstrating agribound's capabilities across different continents, satellite sources, and delineation engines.

Prerequisites

Install agribound with the required extras for the example you want to run:

conda create -n agribound python=3.12 rasterio geopandas fiona shapely pyproj -c conda-forge
conda activate agribound
pip install -e ".[all]"

For GEE-based examples (all except 10_local_tif_quickstart.py and 16_usa_usgs_naip_plus.py), authenticate with Google Earth Engine:

gcloud config set project YOUR_GEE_PROJECT
earthengine authenticate
agribound auth --project YOUR_GEE_PROJECT

See the GEE Setup guide for details.

Running an Example

Python Scripts

All GEE-based examples require a --gee-project argument:

python examples/01_new_mexico_landsat_timeseries.py --gee-project YOUR_GEE_PROJECT

The local TIF and USGS NAIP Plus examples do not require GEE:

python examples/10_local_tif_quickstart.py
python examples/16_usa_usgs_naip_plus.py

Jupyter Notebooks

Interactive notebook versions are available in the notebooks/ directory. Set the GEE_PROJECT variable in the first code cell of each notebook before running:

cd examples/notebooks
jupyter lab

Outputs (GeoPackage files and HTML maps) are saved to outputs/<example_name>/.

Scripts

#	Script	Region	Satellite	Engine	Est. Runtime	Description
01	`01_new_mexico_landsat_timeseries.py`	New Mexico, USA	Landsat 5--9	delineate-anything	~8--12 h	40-year annual field boundaries (1985--2025). Fine-tunes on NMOSE reference boundaries and evaluates per-year accuracy. Best run on HPC/cloud with GPU.
02	`02_india_ganges_sentinel2.py`	Nadia District (West Bengal), India	Sentinel-2 + Google + TESSERA + SPOT Pan	ftw + embedding + DA	~15--30 min	Compares FTW (supervised, S2), Google AlphaEarth (64-D) and TESSERA (128-D) embeddings (unsupervised), and SPOT panchromatic (1.5 m, restricted) for smallholder rice field delineation (2024).
03	`03_australia_murray_darling_hls.py`	Murray-Darling Basin, Australia	HLS	prithvi	~45--90 min	Compares Prithvi ViT embeddings (full encoder) vs PCA baseline on large-scale irrigated agriculture. Runs 2022--2024.
04	`04_france_beauce_sentinel2.py`	Beauce, France	Sentinel-2	ftw	~15--30 min	European large-field agriculture using FTW's pre-trained models (covers France). Single year (2023).
05	`05_pampas_embeddings.py`	Argentine Pampas (Pergamino)	Google + TESSERA	embedding	~5--10 min	CPU-only unsupervised clustering from pre-computed satellite embeddings (64-D Google, 128-D TESSERA). ~50 km bbox over the Pampas agricultural heartland (2020).
06	`06_kenya_smallholder_ftw.py`	Central Kenya	Sentinel-2	ftw	~10--20 min	Demonstrates `min_field_area` tuning for smallholder agriculture. Compares results at 100, 500, 1000, and 2500 m2 thresholds.
07	`07_usa_naip_high_res.py`	Central Valley, California, USA	NAIP	delineate-anything	~20--40 min	1 m resolution field extraction from NAIP imagery. Large commercial fields.
08	`08_china_north_plain_spot.py`	North China Plain	SPOT 6/7	delineate-anything	~15--30 min	6 m resolution SPOT imagery. Restricted access -- see note below.
09	`09_ensemble_comparison.py`	Andalusia, Spain	Sentinel-2	ensemble	~30--60 min	Runs delineate-anything and FTW on the same AOI, then vote-merges for ensemble consensus. Visualizes per-engine and consensus results.
10	`10_local_tif_quickstart.py`	User-provided	Local GeoTIFF	delineate-anything	~2--5 min	Minimal 5-line quickstart using a local file. No GEE required. Edit `LOCAL_TIF` and `STUDY_AREA` paths before running.
11	`11_mississippi_alluvial_plain_spot.py`	Mississippi Alluvial Plain, USA	SPOT 6/7	delineate-anything	~15--30 min	SPOT-based delineation of row-crop agriculture (2021--2023). Includes cross-year stability analysis using IoU/F1. Restricted access -- see note below.
12	`12_new_mexico_ensemble_timeseries.py`	Eastern Lea County, NM, USA	All (Sentinel-2, Landsat, HLS, NAIP, SPOT, Google & TESSERA embeddings)	All (per-source ensemble)	~3--6 h	Multi-model per-source ensemble (2024) over ~20 km center pivot area. Runs all engines per sensor and vote-merges within each source (not across sensors). SAM2 refines each per-source ensemble. Best run on HPC/cloud with GPU.
13	`13_sam2_refine_dinov3.py`	Lea County, NM, USA	Sentinel-2	SAM2 refinement	~5--15 min	Standalone SAM2 boundary refinement on pre-computed DINOv3 field boundaries (555 fields). Crops each field from the raster and refines with SAM2 box prompts. Compares before/after metrics against NMOSE reference.
14	`14_dinov3_sam2_ensemble.py`	Eastern Lea County, NM, USA	Sentinel-2, Landsat, HLS, NAIP, SPOT	DINOv3 + SAM2	~1--2 h	Runs DINOv3 (SAT-493M) across 5 satellite sources with per-source SAM2 refinement. Compares per-source results against NMOSE reference boundaries. Uses a ~20 km bbox over the center pivot area to keep NAIP/SPOT runtimes practical.
15	`15_pampas_semi_supervised.py`	Pampas (Pergamino), Argentina	Google + TESSERA embeddings + Dynamic World + Sentinel-2	Embedding + SAM2 (no training)	~15--30 min	Fully automated pipeline requiring no reference boundaries or training. Clusters Google (64-D) and TESSERA (128-D) embeddings, LULC-filters to crops, then refines with SAM2 using both S2 and TESSERA native bands. Includes improved SAM2 with geometry fixes, polygon exploding, and large-field separation. TESSERA produces more accurate boundaries than Google (see Embedding Comparison). GPU recommended.
16	`16_usa_usgs_naip_plus.py`	Central Valley, California, USA	USGS NAIP Plus ImageServer	delineate-anything	~30--60 min	First community contribution! High-resolution field extraction using the non-GEE `usgs-naip-plus` source -- the same NAIP imagery available on GEE but acquired directly from the USGS USGSNAIPPlus ImageServer. No GEE authentication required. Contributed by Jeremy Rapp (Michigan State University).

Notebooks

Interactive Jupyter notebook versions of each example are in the notebooks/ directory. These are designed for step-by-step exploration with inline map visualization. Set GEE_PROJECT in the first code cell before running.

#	Notebook	Description	Key Difference from Script
01	`01_new_mexico_landsat_timeseries.ipynb`	New Mexico Landsat time series with fine-tuning	Runs 2023--2025 (3 years) instead of the full 40-year range, suitable for interactive use
02	`02_india_ganges_sentinel2.ipynb`	India Nadia District (West Bengal): FTW vs Google vs TESSERA vs SPOT Pan	Same scope as script
03	`03_australia_murray_darling_hls.ipynb`	Australia Murray-Darling Basin: Prithvi ViT vs PCA (HLS)	Same scope as script
04	`04_france_beauce_sentinel2.ipynb`	France Beauce region (FTW)	Same scope as script
05	`05_pampas_embeddings.ipynb`	Pampas embeddings (CPU-only, Google + TESSERA)	Same scope as script
06	`06_kenya_smallholder_ftw.ipynb`	Kenya smallholder `min_area` tuning (FTW)	Same scope as script
07	`07_usa_naip_high_res.ipynb`	USA Central Valley NAIP 1 m (Delineate-Anything)	Same scope as script
08	`08_china_north_plain_spot.ipynb`	China North Plain SPOT 6/7 (restricted)	Same scope as script
09	`09_ensemble_comparison.ipynb`	Ensemble multi-engine comparison (Andalusia)	Same scope as script
10	`10_local_tif_quickstart.ipynb`	Local GeoTIFF quickstart (no GEE)	Same scope as script
11	`11_mississippi_alluvial_plain_spot.ipynb`	Mississippi Alluvial Plain SPOT 6/7 (restricted)	Same scope as script
12	`12_new_mexico_ensemble_timeseries.ipynb`	Lea County multi-source grand ensemble (2020--2022)	Same scope as script
13	`13_sam2_refine_dinov3.ipynb`	SAM2 boundary refinement on DINOv3 output	Same scope as script
14	`14_dinov3_sam2_ensemble.ipynb`	DINOv3 + SAM2 multi-source comparison (Eastern Lea County)	Runs single year (2022) instead of 2020--2022
15	`15_pampas_semi_supervised.ipynb`	Embedding + SAM2 (Pampas, no training required)	Same scope as script
16	`16_usa_usgs_naip_plus.ipynb`	USA Central Valley USGS NAIP Plus -- same NAIP data as GEE, from USGS ImageServer (no GEE, contributed by Jeremy Rapp)	Same scope as script

Runtime Notes

Estimated runtimes assume a single NVIDIA GPU (e.g., A100/V100) and moderate internet speed for GEE downloads.
GEE composite generation adds ~2--5 minutes per year per source.
CPU-only runs (example 05, embedding engine) are 2--5x slower for inference but have no GPU requirement.
Fine-tuning (examples 01, 12) takes ~30 minutes per model on an Apple M2 Max (MPS). In example 12, DA (2 variants) and GeoAI/Prithvi are fine-tuned on NMOSE reference boundaries (~1.5 hours total). FTW uses pre-trained weights directly (fine-tuning not yet supported — FTW requires paired temporal windows). Fine-tuned checkpoints are cached and reused across years.
SAM2 boundary refinement (example 12) runs once on the final grand ensemble output per year. Example 14 runs SAM2 per source using each sensor's native raster for accurate per-field segmentation. With the large model and per-field cropping, refinement takes ~2--5 minutes per source per year depending on field count.
NAIP and SPOT over large areas: NAIP (1 m) and SPOT (6 m) produce rasters that are 100–900x larger in pixel count than Sentinel-2 (10 m) for the same study area. Inference on these high-resolution sources over county-scale or larger areas can take hours even on GPU. Consider subsetting the study area or using tile_size to process in chunks. Fine-tuning on NAIP/SPOT is also significantly slower due to the larger training chips.
Apple Silicon (MPS): The GeoAI engine (Mask R-CNN) crashes on MPS due to Metal command buffer errors. Agribound automatically falls back to CPU for GeoAI training and inference. All other engines (FTW, Delineate-Anything, Prithvi) work correctly on MPS.
GeoAI requires fine-tuning: Without fine-tuning on region-specific reference boundaries, GeoAI's Mask R-CNN typically does not delineate any fields. For out-of-the-box delineation without reference data, use FTW (pre-trained models for 25 countries) or Delineate-Anything (resolution-agnostic).
The 40-year New Mexico script (01) is best run as an overnight batch job or on HPC. The notebook version runs only 2023--2025.

LULC Crop Filtering

Agribound automatically filters detected field boundaries to agricultural areas using land-use/land-cover (LULC) data. This is enabled by default (lulc_filter=True) and removes non-agricultural polygons (roads, water, forest, urban areas, etc.) from the output.

The appropriate LULC dataset is selected automatically based on the study area location and target year:

Region	Dataset	Years	Resolution	Crop Classes
CONUS	USGS Annual NLCD	1985–2024 (nearest year)	30 m	81 (Pasture/Hay), 82 (Cultivated Crops)
Global, ≥2015	Google Dynamic World	2015–present (nearest year)	10 m	`crops` probability band
Global, <2015	Copernicus C3S Land Cover	1992–2022 (nearest year)	300 m	10, 20, 30 (Cropland classes)

Configuration:

lulc_filter=True (default) — enable crop filtering
lulc_filter=False — disable (used for local files without GEE, or unsupervised embedding clusters)
lulc_crop_threshold=0.3 (default) — minimum fraction of crop pixels to keep a polygon

Disabled by default for:

Example 05 (unsupervised embedding clusters — no semantic meaning)
Example 10 (local GeoTIFF — no GEE access)
Example 16 (USGS NAIP Plus — purely non-GEE workflow)

SPOT Access

Examples 08 and 11 use SPOT 6/7 imagery, which is restricted to select GEE users under a data-sharing agreement. This source is primarily for internal DRI use. If you receive an access error, contact the agribound author (sayantan.majumdar@dri.edu) to request field boundary processing for your study area.

When to Use Ensembles

Ensembles work best when multiple models are run on the same sensor data. Each model architecture (DA, FTW, GeoAI, DINOv3, Prithvi) has different biases — vote-merging across models cancels out individual errors because every model sees the same pixels but interprets them differently.

Ensembles across different sensors (e.g., Sentinel-2 + Landsat + NAIP) do not work well because:

Resolution mismatch — a 1 m NAIP polygon and a 30 m Landsat polygon for the same field have different shapes, producing poor vote overlap
Temporal mismatch — each sensor captures different dates, so field states (bare vs cropped) may differ
Spatial alignment — sub-pixel registration errors between sensors create artificial disagreements at boundaries

For multi-sensor analysis, compare per-source results independently (example 14) rather than merging them. The multi-model ensemble (example 12) runs all engines on the same eastern Lea County area for this reason.

Recommended Approaches

With reference boundaries: DINOv3 + SAM2 per source (example 14). DINOv3's SAT-493M backbone fine-tunes well on each sensor with just 10--30 epochs.
Without reference boundaries: Embedding clustering + LULC filter + SAM2 (example 15). TESSERA embeddings produce more accurate boundaries than Google (see below). No training required.
Multi-model ensemble: Example 12 runs all engines on the same sensor and merges via majority vote. Best accuracy but slowest.

Embedding Comparison: Google vs TESSERA (Example 15)

Testing over the Argentine Pampas shows that TESSERA embeddings produce more accurate field boundaries than Google AlphaEarth Embeddings when used with the automated pipeline (embedding clustering + LULC filter + SAM2). The two embedding products differ fundamentally in architecture and input data:

TESSERA (Feng et al., 2025) is a pixel-wise foundation model trained on multi-modal Sentinel-1/2 time series using Barlow Twins self-supervision. It processes "d-pixels" — full temporal sequences of all spectral bands (S2) and SAR backscatter (S1) at each pixel — learning 128-D embeddings that are invariant to cloud-induced temporal gaps. Because it encodes the complete phenological trajectory (planting, growth, senescence) rather than a single composite, adjacent fields with different crop types, planting dates, or irrigation schedules produce distinct embeddings even when they appear spectrally similar in any single image.
Google AlphaEarth (Brown et al., 2025) uses a video summarization architecture with a "Space Time Precision" encoder that assimilates multiple EO sources into 64-D annual embeddings on the unit sphere S63. While it also incorporates temporal information through its support/valid period design, the released annual embedding fields are static temporal summaries that compress a full year into a single vector. The architecture prioritizes generality across diverse mapping tasks (land cover, biomass, evapotranspiration) rather than fine-grained agricultural phenology.
Why TESSERA produces better field boundaries: TESSERA's explicit modeling of temporal sampling invariance — training on random 40-observation subsets from the annual S1/S2 time series — makes it particularly sensitive to within-season crop dynamics. Two soybean fields planted two weeks apart produce different temporal profiles that TESSERA preserves in its embeddings. Google's annual summary tends to average out these intra-seasonal differences, causing adjacent fields with similar average reflectance to merge into single clusters.
Trade-offs: Google AlphaEarth has global coverage for 2017--2024 and is available as a GEE ImageCollection. TESSERA coverage varies by region/year (2017--2025) and requires the geotessera library for tile download and mosaicking.

For new study areas without reference boundaries, we recommend the example 15 pipeline with TESSERA embeddings where available, falling back to Google embeddings for global coverage.

NMOSE Reference Data

Examples 01, 12, 13, and 14 use NMOSE (New Mexico Office of the State Engineer) WUCB agricultural polygon boundaries for fine-tuning and/or evaluation. Examples 12 and 14 filter to eastern Lea County (County 25). Example 13 uses pre-computed DINOv3 boundaries from Lea County for standalone SAM2 refinement. The NMOSE shapefile is not included in the public repository — contact the agribound author (sayantan.majumdar@dri.edu) for access.

Output Structure

Each example creates an output directory under outputs/:

outputs/
├── new_mexico_timeseries/
│   ├── fields_landsat_1985.gpkg
│   ├── fields_landsat_1986.gpkg
│   ├── ...
│   ├── map_predicted_vs_reference.html
│   ├── map_timeseries_comparison.html
│   └── map_latest.html
├── india_nadia/
│   ├── fields_ftw_s2_2024.gpkg
│   ├── fields_google_2024.gpkg
│   ├── fields_tessera_2024.gpkg
│   ├── fields_spot_pan_2023.gpkg
│   └── map_ftw_vs_tessera.html
└── ...

.gpkg files contain field boundary polygons with area, perimeter, and provenance metadata.
.html files are standalone interactive maps (open in any browser) showing field boundaries overlaid on satellite basemaps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agribound Examples

Prerequisites

Running an Example

Python Scripts

Jupyter Notebooks

Scripts

Notebooks

Runtime Notes

LULC Crop Filtering

SPOT Access

When to Use Ensembles

Recommended Approaches

Embedding Comparison: Google vs TESSERA (Example 15)

NMOSE Reference Data

Output Structure

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Agribound Examples

Prerequisites

Running an Example

Python Scripts

Jupyter Notebooks

Scripts

Notebooks

Runtime Notes

LULC Crop Filtering

SPOT Access

When to Use Ensembles

Recommended Approaches

Embedding Comparison: Google vs TESSERA (Example 15)

NMOSE Reference Data

Output Structure