-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Basic operations to support minimal data quality assessment, make life more live-able, and increase the ease and effectiveness for data-science swat-team deployments, all in the large-tabular-data context
- path normalization for interop between environments (classify path format by OS and translate to native format)
- data type detect: nominal, numeric, date, geo
- date detect and format validation
- data dictionary vs file matching
- data dict normalization plus recovery from multiline cells
- metadata: fields search, description search, w support for fuzzy matching
- semantic matching
- autodetect and application of human-readable lookups present in other tables
- flatfile parsing -- all sets
- dataset identification and integration
- redundant records detection -- large data
- lossless data compression
- windowing for multitemporal analysis
- low memory (large data) sorting, incl. but not limited to: by date!
- not require specific install location
- allow people to select versions for data
- parse and filter largest files bypassing RAM memory limitation restrictions
Metadata
Metadata
Assignees
Labels
No labels