Skip to content

2. basic ops #13

@ashlinrichardson

Description

@ashlinrichardson

Basic operations to support minimal data quality assessment, make life more live-able, and increase the ease and effectiveness for data-science swat-team deployments, all in the large-tabular-data context

  • path normalization for interop between environments (classify path format by OS and translate to native format)
  • data type detect: nominal, numeric, date, geo
  • date detect and format validation
  • data dictionary vs file matching
  • data dict normalization plus recovery from multiline cells
  • metadata: fields search, description search, w support for fuzzy matching
  • semantic matching
  • autodetect and application of human-readable lookups present in other tables
  • flatfile parsing -- all sets
  • dataset identification and integration
  • redundant records detection -- large data
  • lossless data compression
  • windowing for multitemporal analysis
  • low memory (large data) sorting, incl. but not limited to: by date!
  • not require specific install location
  • allow people to select versions for data
  • parse and filter largest files bypassing RAM memory limitation restrictions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions