Guide to cymplEO

Steps to modeling:

Step 1: Prepare RS Data
Step 2: Prepare ML models

(a) Select Models To Use
(b) Convert EO data into descriptive, model-compatible features
(c) Perform feature importance and hyperparameter tuning

Run ML models
Analyze Model Output

Analyze CSV output metrics
create graphics

Step 1: Prepare Remote Sensing Data

Part (a) - Extraction of input features via Google Earth Engine

This model was created with the intention of relying exclusively on inputs derived from remote sensing (RS) products, to assess their feasibility in producing results without requiring extensive field campaigns. This data has been sourced in the past exclusively using Google Earth Engine (GEE) scripts, however methods are available for extracting data not available on GEE as well. The variables chosen in this example cover both meterological influences to crop growth, and metrics of ecological conditions related to crop growth. These include air temperature, precipitation, soil moisture, evaporative stress index (ESI) and normalized differential vegetation index (NDVI).

Copy repository from this link: https://code.earthengine.google.com/?accept_repo=users/acgins/inputdata_keadm1 (not shared 5/27) Needed files:

Shapefile of regional boundaries (Administrative 1 County-level for Kenya) asset
Shapefile of crop mask (maize for this model) asset
CCI global LC Layers 2015-2020 assets

Part (b) - Python package requirements and installation

Package requirements:

pandas
numpy
scikit-learn
pip install merf
pip install xgboost
pip install glob2

Most code for this model can be executed as scripts without an IDE and just a code editor. However, the scripts have been organized into Jupyter Notebooks for this repository, and can be ran with various local platforms such as Jupyter lab/notebook via Anaconda Navigator or Visual Studio Code. Find more information about Jupyter notebooks here. If using one of these platforms, all python packages needed can be organized within a conda environment. See x file for a list of some required package installation commands.

These notebooks can also be adapted to be run with Google CoLab.

Part (c) - Scripts to combine GEE Data and Yield

modvars_share.py
1. Extracting each variable, concattenating multiple variables
2. Transposing data frame and adding yield and crop calendar info

Step 2: Prepare scripts to run machine learning algorithms

Scripts or notebooks that run models randomforestregressor, xgboost, or merf use functions from the machinelearn class defined in machinelearns6.py. Once the class is instantialized, functions for each of these models can be run, with room for modification to hyperparameters and training and testing data, interpreted in the form of dictionaries of numpy arrays.

Part (a) - Select machine learning models to use

Find regression models, best to employ those in python. Refer to literature.

Part (b) - Convert EO data into model-compatible features

The model will generate 1 yield prediction based on each year and unique administrative region (we rely on county-level (KE Admin 1)) that all data is available for. As the highest resolution of data for a feature is daily, we need to find the best way to describe each spread with 1 or more aggregation metrics. Code exists within machinelearn6 to convert output from modvars_share.py to produce either: (1) a feature for monthly averages of each variable and (2) decile percentiles for the growing season. Saving these features as new csv files will help cut runtimes. Files to generate these two feature styles are included in featureengineering.

Part (c) - Perform feature importance and hyperparameter tuning

Step 3: Run ML Models

agroml_tv_run2xgb.py This file is a script that can be used to run the models defined in machinelearns6.py.

Inputs by model:

RFR: (test, xy, t)

test: testing dataframe with X and y testing set, can be produced by set_maker, but is acceptable as long as indices match arrays in xy

xy: dictionary produced by set_maker, or dictionary that contains numpy arrays of an X training set, y training set, X testing set, and y testing set

*model inputs must have no missing values

t: n_estimators (hyperparameter, integer)

MERF: (train, test, xy,t)

train: training dataframe with X and y testing set, can be produced by set_maker, but is acceptable as long as indices match arrays in xy

XGBoost: (train, test, xy, t)

t: num_parallel_tree (hyperparameter, integer)

Naming:

Crop Yield Model (with) Python Machine Learning (&) Earth Observations

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
featureengineering		featureengineering
LICENSE		LICENSE
README.md		README.md
agroml_tv_run2xgb.py		agroml_tv_run2xgb.py
machinelearns6.py		machinelearns6.py
modvars_share.py		modvars_share.py
overleaftodo.txt		overleaftodo.txt
umd-agroml3.ipynb		umd-agroml3.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Guide to cymplEO

Steps to modeling:

Step 1: Prepare Remote Sensing Data

Part (a) - Extraction of input features via Google Earth Engine

Part (b) - Python package requirements and installation

Part (c) - Scripts to combine GEE Data and Yield

Step 2: Prepare scripts to run machine learning algorithms

Part (a) - Select machine learning models to use

Part (b) - Convert EO data into model-compatible features

Part (c) - Perform feature importance and hyperparameter tuning

Step 3: Run ML Models

Naming:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Guide to cymplEO

Steps to modeling:

Step 1: Prepare Remote Sensing Data

Part (a) - Extraction of input features via Google Earth Engine

Part (b) - Python package requirements and installation

Part (c) - Scripts to combine GEE Data and Yield

Step 2: Prepare scripts to run machine learning algorithms

Part (a) - Select machine learning models to use

Part (b) - Convert EO data into model-compatible features

Part (c) - Perform feature importance and hyperparameter tuning

Step 3: Run ML Models

Naming:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages