Skip to content

bergstand/aarc-metadata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

The Python3 script validate-AaRC-metadata.py can be used to validate entries in the AaRC metadata spreadsheet. Download the Google spreadsheet as an Excel spreadsheet, and give that as input to the script. It currently does not report empty cells as invalid.

By default simply prints identified errors, but with optional flags these can be written to text files or to a multi-sheet Excel file.

usage: validate-AaRC-metadata.py [-h] [--sheets SHEETS] [--skip-urls] [--fields FIELDS] [--txt-reports TXT_REPORTS]
                                 [--xlsx-reports XLSX_REPORTS]
                                 excel_file

Validate metadata in an Excel file against 'field_definitions' sheet.

positional arguments:
  excel_file            Path to the Excel file to validate (e.g., metadata.xlsx).

optional arguments:
  -h, --help            show this help message and exit
  --sheets SHEETS       Optional: Comma-separated list of sheet names to validate (e.g., --sheets canids,capra).
  --skip-urls           Skip external URL and NCBI TaxID validation checks.
  --fields FIELDS       Optional: Comma-separated list of column names to validate, e.g., --fields samp_taxon_ID,sample_age.
  --txt-reports TXT_REPORTS
                        Optional: Prefix for writing tab-delimited reports to files (e.g., 'errors'). Output files will be named <PREFIX>.<SHEET_NAME>.txt
  --xlsx-reports XLSX_REPORTS
                        Optional: Prefix for writing a single consolidated Excel report (e.g., 'xlsx_errors'). The output file will be named <PREFIX>.xlsx

About

Scripts to support the AaRC metadata curation project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages