This is the official repository for the paper "WildSAT: Learning Satellite Image Representations from Wildlife Observations".

Species distributions encode valuable ecological and environmental information, yet their potential for guiding representation learning in remote sensing remains underexplored. We introduce WildSAT, which pairs satellite images with millions of geo-tagged wildlife observations readily-available on citizen science platforms. WildSAT employs a contrastive learning approach that jointly leverages satellite images, species occurrence maps, and textual habitat descriptions to train or fine-tune models. This approach significantly improves performance on diverse satellite image recognition tasks, outperforming both ImageNet-pretrained models and satellite-specific baselines. Additionally, by aligning visual and textual information, WildSAT enables zero-shot retrieval, allowing users to search geographic locations based on textual descriptions. WildSAT surpasses recent cross-modal learning methods, including approaches that align satellite images with ground imagery or wildlife photos, demonstrating the advantages of our approach.
- Create a conda environment:
conda create -n wildsat python=3.9 - Activate the environment:
conda activate wildsat - Install required packages
pip install -r requirements.txt
This shows how to extract features from satellite images and use them for retrieving relevant images.
- Activate your environment and download the required package for GritLM:
pip install gritlm - Download our sample model here. This is an ImageNet pre-trained ViT-B/16 model that is further fine-tuned with WildSAT.
- Download a small set of data here
- Run the notebook
quickstart.ipynb
- Make sure to specify the location of the sample data downloaded in the previous step, and the location of the checkpoint in step 2
- Download the Sentinel satellite images from SatlasPretrain here
- Download the Wikipedia data from LE-SINR here. Place it in
data/wiki_data_v4.pt - Download the bioclimatic variables here. Place it in
data/bioclim*.npy. This is used by SINR to extract location features. - Download the mapping between satellite images, location, and text here. Place it in
data/dataloader_data.npy
A sample code is provided for visualizing the dataset in data_explore.ipynb
- Make sure all components of the dataset has been downloaded (see Dataset)
- Download all pre-trained model checkpoints. Extract it and place it in
wildsat/checkpoints/*. This is needed for the different pre-trained models as the starting point. This is not needed if you want to start from a randomly initialized model or an ImagetNet pre-trained model. - Run training for a randomly initialized RN50 model:
python train.py --satellite_encoder "resnet50" --satellite_notpretrained. For other model options see the table below.
| Architecture | Pre-training | Training command | Checkpoint when fully trained with WildSAT |
|---|---|---|---|
| ViT-B/16 | ImageNet1k | python train.py --satellite_encoder "vitb16" --use_bnft --is_tunefc |
link |
| ViT-B/16 | CLIP | python train.py --satellite_encoder "vitb16" --satellite_encoder_ckpt "clip" --lora_layer_types 'attn.k_proj' 'attn.v_proj' 'attn.q_proj' 'attn.out_proj' 'visual_projection' --use_lora --use_dora |
link |
| ViT-B/16 | Prithvi | python train.py --satellite_encoder "vitb16" --satellite_encoder_ckpt "prithvi" |
link |
| ViT-B/16 | SatCLIP | python train.py --satellite_encoder "vitb16" --satellite_encoder_ckpt "checkpoints/satclip/satclip-vit16-l10.ckpt" |
link |
| ViT-B/16 | None (Random) | python train.py --satellite_encoder "vitb16" --satellite_notpretrained |
link |
| Swin-T | ImageNet1k | python train.py --satellite_encoder "swint" --use_bnft --is_tunefc |
link |
| Swin-T | Satlas | python train.py --satellite_encoder "swint" --satellite_encoder_ckpt "satlas-backbone" |
link |
| Swin-T | None (Random) | python train.py --satellite_encoder "swint" --satellite_notpretrained |
link |
| RN50 | ImageNet1k | python train.py --satellite_encoder "resnet50" --use_bnft --is_tunefc |
link |
| RN50 | MoCov3 | python train.py --satellite_encoder "resnet50" --satellite_encoder_ckpt "checkpoints/moco_v3/r-50-100ep.pth.tar" --use_bnft --is_tunefc |
link |
| RN50 | SatCLIP | python train.py --satellite_encoder "resnet50" --satellite_encoder_ckpt "checkpoints/satclip/satclip-resnet50-l10.ckpt" |
link |
| RN50 | Satlas | python train.py --satellite_encoder "resnet50" --satellite_encoder_ckpt "satlas-backbone" |
link |
| RN50 | SeCo | python train.py --satellite_encoder "resnet50" --satellite_encoder_ckpt "checkpoints/seco/seco_resnet50_100k.ckpt" |
link |
| RN50 | None (Random) | python train.py --satellite_encoder "resnet50" --satellite_notpretrained |
link |
If you found this helpful, please cite our paper:
@inproceedings{daroya2025wildsat,
title={WildSAT: Learning Satellite Image Representations from Wildlife Observations},
author={Daroya, Rangel and Cole, Elijah and Mac Aodha, Oisin and Van Horn, Grant and Maji, Subhransu},
booktitle={IEEE/CVF International Conference on Computer Vision},
year={2025}
}