Skip to content

Reorder CSVs prior to S3 upload #2131

@Ben-Hodgkiss

Description

@Ben-Hodgkiss

Overview
Given the changes to using the Config Manager mean that we are generally appending data to the bottom, we should create a new Github Action to sort this data every time we commit to main. This will additionally help with debugging in Github as it will be easier to visually find data.

Each collection/pipeline file will need a bespoke ordering. These are listed below (or will be). We should also ensure that the CSVs for each file have the same header order (i.e. the column CSV for Dataset A doesn't have a different order than the column CSV for Dataset B). We may need to check with @pooleycodes before changing the order of these as it may impact the generation of new lines via the Add Data process. The default ordering for new collections is set in create_collection.py, but we'd have to manually change any existing files.

  • collection/endpoint.csv - Sort by entry-date (asc), then endpoint (asc)
  • collection/old-resource.csv - No sorting necessary
  • collection/source.csv - Sort by entry-date (asc), then endpoint (asc)
  • pipeline/column.csv - Sort by dataset, then endpoint, then resource, then field
  • pipeline/combine.csv - Sort by dataset, then endpoint, then field
  • pipeline/concat.csv - Sort by dataset, then endpoint, then resource, then field
  • pipeline/convert.csv - No sorting necessary - may be deleted following Convert.CSV research #2326
  • pipeline/default-value.csv - Sort by dataset, then field
  • pipeline/default.csv - Sort by dataset, then field, then default-field
  • pipeline/entity-organisation.csv - Sort by dataset, then organisation, then entity-minimum
  • pipeline/expect.csv - Sort by dataset, then operation, then organisations
  • pipeline/filter.csv - Sort by dataset, then endpoint, then field
  • pipeline/lookup.csv - Sort by prefix then entity
  • pipeline/old-entity.csv - Sort by old-entity
  • pipeline/patch.csv - Sort by dataset, then endpoint, then field
  • pipeline/skip.csv - Sort by dataset, then endpoint, then pattern
  • pipeline/transform.csv - Sort by dataset, then replacement-field

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done - Consider for Weeknotes

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions