- (a) Select Models To Use
- (b) Convert EO data into descriptive, model-compatible features
- (c) Perform feature importance and hyperparameter tuning
- Analyze CSV output metrics
- create graphics
This model was created with the intention of relying exclusively on inputs derived from remote sensing (RS) products, to assess their feasibility in producing results without requiring extensive field campaigns. This data has been sourced in the past exclusively using Google Earth Engine (GEE) scripts, however methods are available for extracting data not available on GEE as well. The variables chosen in this example cover both meterological influences to crop growth, and metrics of ecological conditions related to crop growth. These include air temperature, precipitation, soil moisture, evaporative stress index (ESI) and normalized differential vegetation index (NDVI).
Copy repository from this link: https://code.earthengine.google.com/?accept_repo=users/acgins/inputdata_keadm1 (not shared 5/27) Needed files:
- Shapefile of regional boundaries (Administrative 1 County-level for Kenya) asset
- Shapefile of crop mask (maize for this model) asset
- CCI global LC Layers 2015-2020 assets
Package requirements:
- pandas
- numpy
- scikit-learn
- pip install merf
- pip install xgboost
- pip install glob2
Most code for this model can be executed as scripts without an IDE and just a code editor. However, the scripts have been organized into Jupyter Notebooks for this repository, and can be ran with various local platforms such as Jupyter lab/notebook via Anaconda Navigator or Visual Studio Code. Find more information about Jupyter notebooks here. If using one of these platforms, all python packages needed can be organized within a conda environment. See x file for a list of some required package installation commands.
These notebooks can also be adapted to be run with Google CoLab.
- modvars_share.py
- Extracting each variable, concattenating multiple variables
- Transposing data frame and adding yield and crop calendar info
Scripts or notebooks that run models randomforestregressor, xgboost, or merf use functions from the machinelearn class defined in machinelearns6.py. Once the class is instantialized, functions for each of these models can be run, with room for modification to hyperparameters and training and testing data, interpreted in the form of dictionaries of numpy arrays.
Find regression models, best to employ those in python. Refer to literature.
The model will generate 1 yield prediction based on each year and unique administrative region (we rely on county-level (KE Admin 1)) that all data is available for. As the highest resolution of data for a feature is daily, we need to find the best way to describe each spread with 1 or more aggregation metrics. Code exists within machinelearn6 to convert output from modvars_share.py to produce either: (1) a feature for monthly averages of each variable and (2) decile percentiles for the growing season. Saving these features as new csv files will help cut runtimes. Files to generate these two feature styles are included in featureengineering.
agroml_tv_run2xgb.py This file is a script that can be used to run the models defined in machinelearns6.py.
Inputs by model:
RFR: (test, xy, t)
MERF: (train, test, xy,t)
XGBoost: (train, test, xy, t)
Crop Yield Model (with) Python Machine Learning (&) Earth Observations