Carob creates reproducible workflows that standardize primary agricultural research data from experiments and surveys. Standardization includes the use of a common file format, variable names, units and accepted values according to the terminag standard. Standardized data sets are aggregated into larger collections that can be used in further research. We do this by writing an R script for each individual dataset. See the website for more information.
Carob is an open access Extract, Transform, and Load (ETL) framework supported by CGIAR to support predictive analytics (machine learning, artifical intelligence) and other types of data analysis.
Contributions are welcome from anyone, and they can be made via pull-requests. Feel free to improve these scripts, or provide new ones. See the instructions on how to write a Carob script described here. You can also raise an issues. A good place to discover new data sets is the Gardian website or our to-do list.
Standardized data can be downloaded from carob-data.org (data with a CC license only), or with R package caramba.
You can also compile your own version by cloning this repo and running
remotes::install_github("carob-data/carobiner")
ff <- carobiner::make_carob(path)
where path is the folder of the cloned repo (e.g. "d:/github/carob")
