You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This table is used to add extra mappings from the resource column headers to our specification field names, for example mapping a field named `UID` in a resource to our `reference` field.
49
+
Used to map column headers in an endpoint or resource to specification field names. Unlike `transform.csv` which handles spec-level renames globally, `column.csv` is typically used to handle inconsistent or non-standard column naming in specific endpoints. Leaving `resource` and `endpoint` blank applies the mapping gloablly.
50
+
51
+
> _Example_
52
+
> Mapping a field named `UID` in an endpoint to our `reference` field
51
53
52
54
Important fields:
53
55
54
56
-`column`\- the column header in the resource being mapped
55
57
-`field`\- the field name in our specification the column header should be mapped to
58
+
-`endpoint`\- (optional) limit the mapping to a specific endpoint
Used to combine values across multiple rows. The grouping is based on the reference field so this only works when there are multiple rows per reference (note this happens after concat so concat can be used to create a reference from multiple fields and control the grouping to some extent)
62
+
Used to merge field values across multiple facts for the same entity. This runs later in the pipeline than other configuration (after entity resolution) so operates on facts rather than raw rows.
63
+
64
+
For geometry fields, values are merged into a single `Multipolygon` using a spatial union rather than string joining. For all other fields, unique values are deduplicated, sorted and joined using the specified separator.
60
65
61
66
> _Example_
62
67
> In the `agricultural-land-classification` collection the `geometry` field of the Natural England is grouped by the reference, resulting in individual polygons being grouped into a multipolygon or geometry collection.
_Not currently in active use. This file was intended to configure converstion behaviour for specific resources but the functionality is handled automatically by the pipeline. The only existing configuration is in [brownfield-land](https://github.com/digital-land/config/blob/main/pipeline/brownfield-land/convert.csv?plain=1) and contains no active parameters._
Used to populate an empty field by copying the value from another field in the same row. Only applies when the target field has no value - existing values are never overwritten. This is different to `default-value.csv` which sets a hardcoded value rather than copying from another field.
102
+
103
+
> _Example_
104
+
> If `start-date` should default to the value of `actual-date` when not provided, add a row mapping `start-date` → `actual-date`.
Used to set a default value for all values in a field
110
+
Used to set a hardcoded default value for a field when it is empty. Unlike `default.csv` which copies a value from another field, this sets a fixed literal value.
97
111
98
112
> _Example_
99
-
> Set the value of `flood-risk-level` to 2 for all values from an endpoint in the `flood-risk-zone`, because the data is provided split into a different endpoint per flood risk level but each resource doesn’t record the level explicitly in a field.
113
+
> Set the value of `flood-risk-level` to `2` for all values from an endpoint in the `flood-risk-zone`, because the data is provided split into a different endpoint per flood risk level but each resource doesn’t record the level explicitly in a field.
This configuration file is used to assign the organisation responsible for managing an entity or range of entities. For any entities within the dataset and entity range given, facts from the assigned organisation will be prioritised over facts from any other organisation. In practice this means when we have multiple sources of data for a single entity, the organisation can be kept as the authoritative organisation by setting the entity-organisation in this file.
@@ -225,8 +235,17 @@ Sometimes, the raw data contains extraneous lines that can cause issues during p
225
235
226
236
Important fields:
227
237
228
-
-`pattern` - the pattern to search for in the raw endpoint file
238
+
-`pattern`\- the pattern to search for in the raw endpoint file
Used to rename fields to match the latest specification. Maps old field names to their current replacements, applied globally across all resources. Use this when a field has been renamed in the specification and you need existing data to continue flowing through correctly.
243
+
244
+
> _Example_
245
+
> The brownfield-land specification changed from using fields like `OrganisationURI` and `SiteNameAddress` to `organisation` and `site-address`. These changes were added to the relevant `transform.csv` to accommodate this specification change.
0 commit comments