GeoPlant is a large-scale, multimodal dataset for spatial plant species prediction across Europe.
It integrates expert-verified species observations with rich environmental predictors and enables research, benchmarking, and applications in biodiversity, earth observation, and deep learning.
Figure 1. GeoPlant combines 5M Presence-Only and 90k Presence-Absence records with Sentinel-2 imagery, Landsat time series, CHELSA climate, and environmental rasters for 10k+ European plant species.
- Dataset Overview: Learn about provided presenceβabsence and presenceβonly species data.
- Environmental Predictors: Explore different variables, e.g., satellite imagery, time series, climate, soil, land cover, and human footprint.
- Baselines & Benchmarking: See benchmark tasks, metrics, and baseline models.
- Resources & Download: Links to Kaggle, Seafile, Hugging Face, and the NeurIPS 2024 paper.
| Resource | Description | Link |
|---|---|---|
| π Dataset Paper | NeurIPS 2024 proceedings paper (Datasets & Benchmarks track) | Proceedings |
| π Extended Version | arXiv preprint with supplementary details | arXiv:2408.13928 |
| π Starter Notebooks | Baseline models, pipelines, and scripts | GeoPlant Code on Kaggle |
| π¦ Full Dataset | Full data including PO and environmental rasters | GeoPlant Seafile |
| π€ Pretrained Models | Hugging Face collection of baselines | Hugging Face |
| Branch | Whatβs inside |
|---|---|
picekl/xgboost-baselines |
XGBoost tabular baselines + ablations (location, climate, soil, land cover, human footprint). |
picekl/MME-baselines |
Multimodal ensemble (Sentinel-2, Landsat, climate) notebooks + training scripts. |
picekl/initial-experiemnts |
Initial experiments, dataset preprocessing etc. |
docs |
Sources for the website documentation. |
# Example: switch to a baseline branch
git fetch origin
git checkout picekl/xgboost-baselinesIf you use GeoPlant, please cite the NeurIPS proceedings (and optionally the extended arXiv version):
NeurIPS 2024 (Datasets & Benchmarks Track)
@inproceedings{picek2024geoplant_neurips,
title = {GeoPlant: Spatial Plant Species Prediction Dataset},
author = {Picek, Lukas and Botella, Christophe and Servajean, Maximilien and Leblanc, C{\'e}sar and Palard, R{\'e}mi and Larcher, Th{\'e}o and Deneu, Benjamin and Marcos, Diego and Bonnet, Pierre and Joly, Alexis},
booktitle = {NeurIPS 2024 Datasets and Benchmarks Track},
year = {2024}
}- Issues & feature requests: GitHub Issues
- Kaggle discussion: GeoPlant on Kaggle

