This directory contains example scripts and Jupyter notebooks demonstrating agribound's capabilities across different continents, satellite sources, and delineation engines.
- Install agribound with the required extras for the example you want to run:
conda create -n agribound python=3.12 rasterio geopandas fiona shapely pyproj -c conda-forge
conda activate agribound
pip install -e ".[all]"- For GEE-based examples (all except
10_local_tif_quickstart.pyand16_usa_usgs_naip_plus.py), authenticate with Google Earth Engine:
gcloud config set project YOUR_GEE_PROJECT
earthengine authenticate
agribound auth --project YOUR_GEE_PROJECTSee the GEE Setup guide for details.
All GEE-based examples require a --gee-project argument:
python examples/01_new_mexico_landsat_timeseries.py --gee-project YOUR_GEE_PROJECTThe local TIF and USGS NAIP Plus examples do not require GEE:
python examples/10_local_tif_quickstart.py
python examples/16_usa_usgs_naip_plus.pyInteractive notebook versions are available in the notebooks/ directory. Set the GEE_PROJECT variable in the first code cell of each notebook before running:
cd examples/notebooks
jupyter labOutputs (GeoPackage files and HTML maps) are saved to outputs/<example_name>/.
| # | Script | Region | Satellite | Engine | Est. Runtime | Description |
|---|---|---|---|---|---|---|
| 01 | 01_new_mexico_landsat_timeseries.py |
New Mexico, USA | Landsat 5--9 | delineate-anything | ~8--12 h | 40-year annual field boundaries (1985--2025). Fine-tunes on NMOSE reference boundaries and evaluates per-year accuracy. Best run on HPC/cloud with GPU. |
| 02 | 02_india_ganges_sentinel2.py |
Nadia District (West Bengal), India | Sentinel-2 + Google + TESSERA + SPOT Pan | ftw + embedding + DA | ~15--30 min | Compares FTW (supervised, S2), Google AlphaEarth (64-D) and TESSERA (128-D) embeddings (unsupervised), and SPOT panchromatic (1.5 m, restricted) for smallholder rice field delineation (2024). |
| 03 | 03_australia_murray_darling_hls.py |
Murray-Darling Basin, Australia | HLS | prithvi | ~45--90 min | Compares Prithvi ViT embeddings (full encoder) vs PCA baseline on large-scale irrigated agriculture. Runs 2022--2024. |
| 04 | 04_france_beauce_sentinel2.py |
Beauce, France | Sentinel-2 | ftw | ~15--30 min | European large-field agriculture using FTW's pre-trained models (covers France). Single year (2023). |
| 05 | 05_pampas_embeddings.py |
Argentine Pampas (Pergamino) | Google + TESSERA | embedding | ~5--10 min | CPU-only unsupervised clustering from pre-computed satellite embeddings (64-D Google, 128-D TESSERA). ~50 km bbox over the Pampas agricultural heartland (2020). |
| 06 | 06_kenya_smallholder_ftw.py |
Central Kenya | Sentinel-2 | ftw | ~10--20 min | Demonstrates min_field_area tuning for smallholder agriculture. Compares results at 100, 500, 1000, and 2500 m2 thresholds. |
| 07 | 07_usa_naip_high_res.py |
Central Valley, California, USA | NAIP | delineate-anything | ~20--40 min | 1 m resolution field extraction from NAIP imagery. Large commercial fields. |
| 08 | 08_china_north_plain_spot.py |
North China Plain | SPOT 6/7 | delineate-anything | ~15--30 min | 6 m resolution SPOT imagery. Restricted access -- see note below. |
| 09 | 09_ensemble_comparison.py |
Andalusia, Spain | Sentinel-2 | ensemble | ~30--60 min | Runs delineate-anything and FTW on the same AOI, then vote-merges for ensemble consensus. Visualizes per-engine and consensus results. |
| 10 | 10_local_tif_quickstart.py |
User-provided | Local GeoTIFF | delineate-anything | ~2--5 min | Minimal 5-line quickstart using a local file. No GEE required. Edit LOCAL_TIF and STUDY_AREA paths before running. |
| 11 | 11_mississippi_alluvial_plain_spot.py |
Mississippi Alluvial Plain, USA | SPOT 6/7 | delineate-anything | ~15--30 min | SPOT-based delineation of row-crop agriculture (2021--2023). Includes cross-year stability analysis using IoU/F1. Restricted access -- see note below. |
| 12 | 12_new_mexico_ensemble_timeseries.py |
Eastern Lea County, NM, USA | All (Sentinel-2, Landsat, HLS, NAIP, SPOT, Google & TESSERA embeddings) | All (per-source ensemble) | ~3--6 h | Multi-model per-source ensemble (2024) over ~20 km center pivot area. Runs all engines per sensor and vote-merges within each source (not across sensors). SAM2 refines each per-source ensemble. Best run on HPC/cloud with GPU. |
| 13 | 13_sam2_refine_dinov3.py |
Lea County, NM, USA | Sentinel-2 | SAM2 refinement | ~5--15 min | Standalone SAM2 boundary refinement on pre-computed DINOv3 field boundaries (555 fields). Crops each field from the raster and refines with SAM2 box prompts. Compares before/after metrics against NMOSE reference. |
| 14 | 14_dinov3_sam2_ensemble.py |
Eastern Lea County, NM, USA | Sentinel-2, Landsat, HLS, NAIP, SPOT | DINOv3 + SAM2 | ~1--2 h | Runs DINOv3 (SAT-493M) across 5 satellite sources with per-source SAM2 refinement. Compares per-source results against NMOSE reference boundaries. Uses a ~20 km bbox over the center pivot area to keep NAIP/SPOT runtimes practical. |
| 15 | 15_pampas_semi_supervised.py |
Pampas (Pergamino), Argentina | Google + TESSERA embeddings + Dynamic World + Sentinel-2 | Embedding + SAM2 (no training) | ~15--30 min | Fully automated pipeline requiring no reference boundaries or training. Clusters Google (64-D) and TESSERA (128-D) embeddings, LULC-filters to crops, then refines with SAM2 using both S2 and TESSERA native bands. Includes improved SAM2 with geometry fixes, polygon exploding, and large-field separation. TESSERA produces more accurate boundaries than Google (see Embedding Comparison). GPU recommended. |
| 16 | 16_usa_usgs_naip_plus.py |
Central Valley, California, USA | USGS NAIP Plus ImageServer | delineate-anything | ~30--60 min | First community contribution! High-resolution field extraction using the non-GEE usgs-naip-plus source -- the same NAIP imagery available on GEE but acquired directly from the USGS USGSNAIPPlus ImageServer. No GEE authentication required. Contributed by Jeremy Rapp (Michigan State University). |
Interactive Jupyter notebook versions of each example are in the notebooks/ directory. These are designed for step-by-step exploration with inline map visualization. Set GEE_PROJECT in the first code cell before running.
| # | Notebook | Description | Key Difference from Script |
|---|---|---|---|
| 01 | 01_new_mexico_landsat_timeseries.ipynb |
New Mexico Landsat time series with fine-tuning | Runs 2023--2025 (3 years) instead of the full 40-year range, suitable for interactive use |
| 02 | 02_india_ganges_sentinel2.ipynb |
India Nadia District (West Bengal): FTW vs Google vs TESSERA vs SPOT Pan | Same scope as script |
| 03 | 03_australia_murray_darling_hls.ipynb |
Australia Murray-Darling Basin: Prithvi ViT vs PCA (HLS) | Same scope as script |
| 04 | 04_france_beauce_sentinel2.ipynb |
France Beauce region (FTW) | Same scope as script |
| 05 | 05_pampas_embeddings.ipynb |
Pampas embeddings (CPU-only, Google + TESSERA) | Same scope as script |
| 06 | 06_kenya_smallholder_ftw.ipynb |
Kenya smallholder min_area tuning (FTW) |
Same scope as script |
| 07 | 07_usa_naip_high_res.ipynb |
USA Central Valley NAIP 1 m (Delineate-Anything) | Same scope as script |
| 08 | 08_china_north_plain_spot.ipynb |
China North Plain SPOT 6/7 (restricted) | Same scope as script |
| 09 | 09_ensemble_comparison.ipynb |
Ensemble multi-engine comparison (Andalusia) | Same scope as script |
| 10 | 10_local_tif_quickstart.ipynb |
Local GeoTIFF quickstart (no GEE) | Same scope as script |
| 11 | 11_mississippi_alluvial_plain_spot.ipynb |
Mississippi Alluvial Plain SPOT 6/7 (restricted) | Same scope as script |
| 12 | 12_new_mexico_ensemble_timeseries.ipynb |
Lea County multi-source grand ensemble (2020--2022) | Same scope as script |
| 13 | 13_sam2_refine_dinov3.ipynb |
SAM2 boundary refinement on DINOv3 output | Same scope as script |
| 14 | 14_dinov3_sam2_ensemble.ipynb |
DINOv3 + SAM2 multi-source comparison (Eastern Lea County) | Runs single year (2022) instead of 2020--2022 |
| 15 | 15_pampas_semi_supervised.ipynb |
Embedding + SAM2 (Pampas, no training required) | Same scope as script |
| 16 | 16_usa_usgs_naip_plus.ipynb |
USA Central Valley USGS NAIP Plus -- same NAIP data as GEE, from USGS ImageServer (no GEE, contributed by Jeremy Rapp) | Same scope as script |
- Estimated runtimes assume a single NVIDIA GPU (e.g., A100/V100) and moderate internet speed for GEE downloads.
- GEE composite generation adds ~2--5 minutes per year per source.
- CPU-only runs (example 05, embedding engine) are 2--5x slower for inference but have no GPU requirement.
- Fine-tuning (examples 01, 12) takes ~30 minutes per model on an Apple M2 Max (MPS). In example 12, DA (2 variants) and GeoAI/Prithvi are fine-tuned on NMOSE reference boundaries (~1.5 hours total). FTW uses pre-trained weights directly (fine-tuning not yet supported — FTW requires paired temporal windows). Fine-tuned checkpoints are cached and reused across years.
- SAM2 boundary refinement (example 12) runs once on the final grand ensemble output per year. Example 14 runs SAM2 per source using each sensor's native raster for accurate per-field segmentation. With the
largemodel and per-field cropping, refinement takes ~2--5 minutes per source per year depending on field count. - NAIP and SPOT over large areas: NAIP (1 m) and SPOT (6 m) produce rasters that are 100–900x larger in pixel count than Sentinel-2 (10 m) for the same study area. Inference on these high-resolution sources over county-scale or larger areas can take hours even on GPU. Consider subsetting the study area or using
tile_sizeto process in chunks. Fine-tuning on NAIP/SPOT is also significantly slower due to the larger training chips. - Apple Silicon (MPS): The GeoAI engine (Mask R-CNN) crashes on MPS due to Metal command buffer errors. Agribound automatically falls back to CPU for GeoAI training and inference. All other engines (FTW, Delineate-Anything, Prithvi) work correctly on MPS.
- GeoAI requires fine-tuning: Without fine-tuning on region-specific reference boundaries, GeoAI's Mask R-CNN typically does not delineate any fields. For out-of-the-box delineation without reference data, use FTW (pre-trained models for 25 countries) or Delineate-Anything (resolution-agnostic).
- The 40-year New Mexico script (01) is best run as an overnight batch job or on HPC. The notebook version runs only 2023--2025.
Agribound automatically filters detected field boundaries to agricultural areas using land-use/land-cover (LULC) data. This is enabled by default (lulc_filter=True) and removes non-agricultural polygons (roads, water, forest, urban areas, etc.) from the output.
The appropriate LULC dataset is selected automatically based on the study area location and target year:
| Region | Dataset | Years | Resolution | Crop Classes |
|---|---|---|---|---|
| CONUS | USGS Annual NLCD | 1985–2024 (nearest year) | 30 m | 81 (Pasture/Hay), 82 (Cultivated Crops) |
| Global, ≥2015 | Google Dynamic World | 2015–present (nearest year) | 10 m | crops probability band |
| Global, <2015 | Copernicus C3S Land Cover | 1992–2022 (nearest year) | 300 m | 10, 20, 30 (Cropland classes) |
Configuration:
lulc_filter=True(default) — enable crop filteringlulc_filter=False— disable (used for local files without GEE, or unsupervised embedding clusters)lulc_crop_threshold=0.3(default) — minimum fraction of crop pixels to keep a polygon
Disabled by default for:
- Example 05 (unsupervised embedding clusters — no semantic meaning)
- Example 10 (local GeoTIFF — no GEE access)
- Example 16 (USGS NAIP Plus — purely non-GEE workflow)
Examples 08 and 11 use SPOT 6/7 imagery, which is restricted to select GEE users under a data-sharing agreement. This source is primarily for internal DRI use. If you receive an access error, contact the agribound author (sayantan.majumdar@dri.edu) to request field boundary processing for your study area.
Ensembles work best when multiple models are run on the same sensor data. Each model architecture (DA, FTW, GeoAI, DINOv3, Prithvi) has different biases — vote-merging across models cancels out individual errors because every model sees the same pixels but interprets them differently.
Ensembles across different sensors (e.g., Sentinel-2 + Landsat + NAIP) do not work well because:
- Resolution mismatch — a 1 m NAIP polygon and a 30 m Landsat polygon for the same field have different shapes, producing poor vote overlap
- Temporal mismatch — each sensor captures different dates, so field states (bare vs cropped) may differ
- Spatial alignment — sub-pixel registration errors between sensors create artificial disagreements at boundaries
For multi-sensor analysis, compare per-source results independently (example 14) rather than merging them. The multi-model ensemble (example 12) runs all engines on the same eastern Lea County area for this reason.
- With reference boundaries: DINOv3 + SAM2 per source (example 14). DINOv3's SAT-493M backbone fine-tunes well on each sensor with just 10--30 epochs.
- Without reference boundaries: Embedding clustering + LULC filter + SAM2 (example 15). TESSERA embeddings produce more accurate boundaries than Google (see below). No training required.
- Multi-model ensemble: Example 12 runs all engines on the same sensor and merges via majority vote. Best accuracy but slowest.
Testing over the Argentine Pampas shows that TESSERA embeddings produce more accurate field boundaries than Google AlphaEarth Embeddings when used with the automated pipeline (embedding clustering + LULC filter + SAM2). The two embedding products differ fundamentally in architecture and input data:
-
TESSERA (Feng et al., 2025) is a pixel-wise foundation model trained on multi-modal Sentinel-1/2 time series using Barlow Twins self-supervision. It processes "d-pixels" — full temporal sequences of all spectral bands (S2) and SAR backscatter (S1) at each pixel — learning 128-D embeddings that are invariant to cloud-induced temporal gaps. Because it encodes the complete phenological trajectory (planting, growth, senescence) rather than a single composite, adjacent fields with different crop types, planting dates, or irrigation schedules produce distinct embeddings even when they appear spectrally similar in any single image.
-
Google AlphaEarth (Brown et al., 2025) uses a video summarization architecture with a "Space Time Precision" encoder that assimilates multiple EO sources into 64-D annual embeddings on the unit sphere S63. While it also incorporates temporal information through its support/valid period design, the released annual embedding fields are static temporal summaries that compress a full year into a single vector. The architecture prioritizes generality across diverse mapping tasks (land cover, biomass, evapotranspiration) rather than fine-grained agricultural phenology.
-
Why TESSERA produces better field boundaries: TESSERA's explicit modeling of temporal sampling invariance — training on random 40-observation subsets from the annual S1/S2 time series — makes it particularly sensitive to within-season crop dynamics. Two soybean fields planted two weeks apart produce different temporal profiles that TESSERA preserves in its embeddings. Google's annual summary tends to average out these intra-seasonal differences, causing adjacent fields with similar average reflectance to merge into single clusters.
-
Trade-offs: Google AlphaEarth has global coverage for 2017--2024 and is available as a GEE ImageCollection. TESSERA coverage varies by region/year (2017--2025) and requires the geotessera library for tile download and mosaicking.
For new study areas without reference boundaries, we recommend the example 15 pipeline with TESSERA embeddings where available, falling back to Google embeddings for global coverage.
Examples 01, 12, 13, and 14 use NMOSE (New Mexico Office of the State Engineer) WUCB agricultural polygon boundaries for fine-tuning and/or evaluation. Examples 12 and 14 filter to eastern Lea County (County 25). Example 13 uses pre-computed DINOv3 boundaries from Lea County for standalone SAM2 refinement. The NMOSE shapefile is not included in the public repository — contact the agribound author (sayantan.majumdar@dri.edu) for access.
Each example creates an output directory under outputs/:
outputs/
├── new_mexico_timeseries/
│ ├── fields_landsat_1985.gpkg
│ ├── fields_landsat_1986.gpkg
│ ├── ...
│ ├── map_predicted_vs_reference.html
│ ├── map_timeseries_comparison.html
│ └── map_latest.html
├── india_nadia/
│ ├── fields_ftw_s2_2024.gpkg
│ ├── fields_google_2024.gpkg
│ ├── fields_tessera_2024.gpkg
│ ├── fields_spot_pan_2023.gpkg
│ └── map_ftw_vs_tessera.html
└── ...
.gpkgfiles contain field boundary polygons with area, perimeter, and provenance metadata..htmlfiles are standalone interactive maps (open in any browser) showing field boundaries overlaid on satellite basemaps.