Skip to content

hurlbertlab/caterpillars-count-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

caterpillars-count-data

This is the repository of both raw and cleaned data for the Caterpillars Count! project.

As of 2022, the intended workflow for integrating newly submitted data is as follows:

  • Update the raw data files
    -- Source update_catcount_data.R and then run updateCatCountData().
    -- This will grab the most recent table versions from https://caterpillarscount.unc.edu/backups and replace the older versions.

  • Update the raw data files
    -- Source update_catcount_data.R and then run updateCatCountData().
    -- This will grab the most recent table versions from https://caterpillarscount.unc.edu/backups and replace the older versions.

  • Plant taxonomy and status
    --In the plantSpecies folder we keep files including our officialPlantList which translates all user-inputted plant species names into standardized (using ITIS) taxonomic concepts (see cleaning_plant_names.r).
    --We also have a workflow for inferring plant species when the Site Manager never specified it based on user-inputted names and/or arthropod photos that reveal the nature of the plant species (see IDforPlantsThatAreNotIdentified.r which generates an inferredPlantNames file).
    --Finally, we have a workflow for assigning native/alien status to plant species based on the USDA PLANTS Database (see plant_origin_status.r).
    --Any use of Caterpillars Count! data that evaluates tree species differences should be sure they have dealt with these complexities.

  • Prepare any newly submitted data since the last update for cleaning
    -- Run dataCleaning/reading_and_cleaning_new_data.r
    -- This will create a file called flagged_dataset_YYYY-MM-DD.csv.

  • Manually clean records
    -- Manually check any records in this file for which status is not "ok". The flags field indicates which information was identified as requiring checking. For example, a value of "ants numLeaves rareArthDiv" indicates that 1) either the number or length of ants, 2) the number of leaves, and 3) the diversity of rare arthropod groups were all unusual.
    -- If the error and its appropriate fix can be inferred (e.g. 'daddylonglegs' was given a length of 30 mm where the user was clearly including leg length in the estimate, change length to 5 [the median length of a daddylonglegs in the dataset]), then 1) modify the necessary value(s), 2) describe what was done in the actionTaken column, and 3) change the status to "ok".
    -- If there is a clear error in either the number or length of leaves, but no obvious solution, then change the status for every record pertaining to this Survey ID to "bad leaves".
    -- If there is a clear error in the arthropod quantity for a single arthropod group but no obvious solution, then change the status to "bad quantity".
    -- If there is a clear error in the arthropod length for a single arthropod group but no obvious solution, then change the status to "bad length".
    -- If there is a clear error that pertains to the entire survey, e.g. the total abundance or diversity of arthropods, then change the status for every record pertaining to this Survey ID to "remove".
    -- Finally, if upon examination, it is decided that these flagged values are still plausible and can be included in an analysis, change the status to "ok" and in the actionTaken column put "none".
    -- This cleaning step is done when there are no longer any records with a status of "check".

  • Add new records to cleaned data file
    -- Append the file of newly cleaned records to the bottom of the most recent file named cleaned_dataset_YYYY-MM-DD.csv.
    -- This is the file that should be used for any analyses.
    -- Depending on the analysis, the user should filter out any undesirable records based on the status column.

...STILL IN PROGRESS...

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages