This is the repository of both raw and cleaned data for the Caterpillars Count! project.
As of 2022, the intended workflow for integrating newly submitted data is as follows:
-
Update the raw data files
-- Sourceupdate_catcount_data.Rand then runupdateCatCountData().
-- This will grab the most recent table versions from https://caterpillarscount.unc.edu/backups and replace the older versions. -
Update the raw data files
-- Sourceupdate_catcount_data.Rand then runupdateCatCountData().
-- This will grab the most recent table versions from https://caterpillarscount.unc.edu/backups and replace the older versions. -
Plant taxonomy and status
--In theplantSpeciesfolder we keep files including ourofficialPlantListwhich translates all user-inputted plant species names into standardized (using ITIS) taxonomic concepts (seecleaning_plant_names.r).
--We also have a workflow for inferring plant species when the Site Manager never specified it based on user-inputted names and/or arthropod photos that reveal the nature of the plant species (seeIDforPlantsThatAreNotIdentified.rwhich generates aninferredPlantNamesfile).
--Finally, we have a workflow for assigning native/alien status to plant species based on the USDA PLANTS Database (seeplant_origin_status.r).
--Any use of Caterpillars Count! data that evaluates tree species differences should be sure they have dealt with these complexities. -
Prepare any newly submitted data since the last update for cleaning
-- RundataCleaning/reading_and_cleaning_new_data.r
-- This will create a file calledflagged_dataset_YYYY-MM-DD.csv. -
Manually clean records
-- Manually check any records in this file for whichstatusis not "ok". Theflagsfield indicates which information was identified as requiring checking. For example, a value of "ants numLeaves rareArthDiv" indicates that 1) either the number or length of ants, 2) the number of leaves, and 3) the diversity of rare arthropod groups were all unusual.
-- If the error and its appropriate fix can be inferred (e.g. 'daddylonglegs' was given a length of 30 mm where the user was clearly including leg length in the estimate, change length to 5 [the median length of a daddylonglegs in the dataset]), then 1) modify the necessary value(s), 2) describe what was done in theactionTakencolumn, and 3) change thestatusto "ok".
-- If there is a clear error in either the number or length of leaves, but no obvious solution, then change thestatusfor every record pertaining to this Survey ID to "bad leaves".
-- If there is a clear error in the arthropod quantity for a single arthropod group but no obvious solution, then change thestatusto "bad quantity".
-- If there is a clear error in the arthropod length for a single arthropod group but no obvious solution, then change thestatusto "bad length".
-- If there is a clear error that pertains to the entire survey, e.g. the total abundance or diversity of arthropods, then change thestatusfor every record pertaining to this Survey ID to "remove".
-- Finally, if upon examination, it is decided that these flagged values are still plausible and can be included in an analysis, change thestatusto "ok" and in theactionTakencolumn put "none".
-- This cleaning step is done when there are no longer any records with astatusof "check". -
Add new records to cleaned data file
-- Append the file of newly cleaned records to the bottom of the most recent file namedcleaned_dataset_YYYY-MM-DD.csv.
-- This is the file that should be used for any analyses.
-- Depending on the analysis, the user shouldfilterout any undesirable records based on thestatuscolumn.
...STILL IN PROGRESS...