Skip to content

Commit 1ecfa2f

Browse files
authored
Update Configure-an-endpoint.md with clarifications
Clarified usage and examples for various pipeline configurations, including default values and entity organisation.
1 parent 99c822e commit 1ecfa2f

File tree

1 file changed

+31
-12
lines changed

1 file changed

+31
-12
lines changed

docs/data-operations-manual/How-To-Guides/Adding/Configure-an-endpoint.md

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -42,21 +42,26 @@ Important fields:
4242

4343
- `old-resource` \- the hash of the resource being ended
4444
- `status` \- the status code for this entry, use `410` for ended _(\!\! are there others which can be used?_)
45-
- `resource` \- _uncertain, can a resource be re-directed?_
4645
- `notes` \- to record why this configuration change was made
4746

4847
## [pipeline/column](https://github.com/digital-land/specification/blob/main/content/dataset/column.md?plain=1)
4948

50-
This table is used to add extra mappings from the resource column headers to our specification field names, for example mapping a field named `UID` in a resource to our `reference` field.
49+
Used to map column headers in an endpoint or resource to specification field names. Unlike `transform.csv` which handles spec-level renames globally, `column.csv` is typically used to handle inconsistent or non-standard column naming in specific endpoints. Leaving `resource` and `endpoint` blank applies the mapping gloablly.
50+
51+
> _Example_
52+
> Mapping a field named `UID` in an endpoint to our `reference` field
5153
5254
Important fields:
5355

5456
- `column` \- the column header in the resource being mapped
5557
- `field` \- the field name in our specification the column header should be mapped to
58+
- `endpoint` \- (optional) limit the mapping to a specific endpoint
5659

5760
## [pipeline/combine](https://github.com/digital-land/specification/blob/main/content/dataset/combine.md?plain=1)
5861

59-
Used to combine values across multiple rows. The grouping is based on the reference field so this only works when there are multiple rows per reference (note this happens after concat so concat can be used to create a reference from multiple fields and control the grouping to some extent)
62+
Used to merge field values across multiple facts for the same entity. This runs later in the pipeline than other configuration (after entity resolution) so operates on facts rather than raw rows.
63+
64+
For geometry fields, values are merged into a single `Multipolygon` using a spatial union rather than string joining. For all other fields, unique values are deduplicated, sorted and joined using the specified separator.
6065

6166
> _Example_
6267
> In the `agricultural-land-classification` collection the `geometry` field of the Natural England is grouped by the reference, resulting in individual polygons being grouped into a multipolygon or geometry collection.
@@ -89,14 +94,23 @@ Important fields:
8994

9095
## [pipeline/convert](https://github.com/digital-land/specification/blob/main/content/dataset/convert.md?plain=1)
9196

92-
_Unsure\!_
97+
_Not currently in active use. This file was intended to configure converstion behaviour for specific resources but the functionality is handled automatically by the pipeline. The only existing configuration is in [brownfield-land](https://github.com/digital-land/config/blob/main/pipeline/brownfield-land/convert.csv?plain=1) and contains no active parameters._
98+
99+
## [pipeline/default](https://github.com/digital-land/specification/blob/main/content/dataset/default.md?plain=1)
100+
101+
Used to populate an empty field by copying the value from another field in the same row. Only applies when the target field has no value - existing values are never overwritten. This is different to `default-value.csv` which sets a hardcoded value rather than copying from another field.
102+
103+
> _Example_
104+
> If `start-date` should default to the value of `actual-date` when not provided, add a row mapping `start-date``actual-date`.
105+
>
106+
> See: [https://github.com/digital-land/config/blob/78c2167948503f794b6023ae17796b5d086514de/pipeline/local-plan/default.csv#L5](https://github.com/digital-land/config/blob/78c2167948503f794b6023ae17796b5d086514de/pipeline/local-plan/default.csv#L5)
93107
94108
## [pipeline/default-value](https://github.com/digital-land/specification/blob/main/content/dataset/default-value.md?plain=1)
95109

96-
Used to set a default value for all values in a field
110+
Used to set a hardcoded default value for a field when it is empty. Unlike `default.csv` which copies a value from another field, this sets a fixed literal value.
97111

98112
> _Example_
99-
> Set the value of `flood-risk-level` to 2 for all values from an endpoint in the `flood-risk-zone`, because the data is provided split into a different endpoint per flood risk level but each resource doesn’t record the level explicitly in a field.
113+
> Set the value of `flood-risk-level` to `2` for all values from an endpoint in the `flood-risk-zone`, because the data is provided split into a different endpoint per flood risk level but each resource doesn’t record the level explicitly in a field.
100114
>
101115
> See: [https://github.com/digital-land/config/blob/main/pipeline/flood-risk-zone/default-value.csv\#L3](https://github.com/digital-land/config/blob/main/pipeline/flood-risk-zone/default-value.csv#L3)
102116
@@ -105,10 +119,6 @@ Important fields:
105119
- `field` \- the field to use the default value in
106120
- `value` \- the value to enter as default in the field
107121

108-
## [pipeline/default](https://github.com/digital-land/specification/blob/main/content/dataset/default.md?plain=1)
109-
110-
_I think to set a default value using another field in the resource, but uncertain how this is different to column. Need more info._
111-
112122
## [pipeline/entity-organisation](https://github.com/digital-land/specification/blob/main/content/dataset/entity-organisation.md?plain=1)
113123

114124
This configuration file is used to assign the organisation responsible for managing an entity or range of entities. For any entities within the dataset and entity range given, facts from the assigned organisation will be prioritised over facts from any other organisation. In practice this means when we have multiple sources of data for a single entity, the organisation can be kept as the authoritative organisation by setting the entity-organisation in this file.
@@ -225,8 +235,17 @@ Sometimes, the raw data contains extraneous lines that can cause issues during p
225235
226236
Important fields:
227237

228-
- `pattern` - the pattern to search for in the raw endpoint file
238+
- `pattern` \- the pattern to search for in the raw endpoint file
229239

230240
## [pipeline/transform](https://github.com/digital-land/specification/blob/main/content/dataset/transform.md?plain=1)
231241

232-
_Unsure\!_
242+
Used to rename fields to match the latest specification. Maps old field names to their current replacements, applied globally across all resources. Use this when a field has been renamed in the specification and you need existing data to continue flowing through correctly.
243+
244+
> _Example_
245+
> The brownfield-land specification changed from using fields like `OrganisationURI` and `SiteNameAddress` to `organisation` and `site-address`. These changes were added to the relevant `transform.csv` to accommodate this specification change.
246+
>
247+
> See: [https://github.com/digital-land/config/blob/main/pipeline/brownfield-land/transform.csv](https://github.com/digital-land/config/blob/main/pipeline/brownfield-land/transform.csv)
248+
249+
Important fields:
250+
- `field` \- the old field name in the source data
251+
- `replacement-field` \- the new field name in the current specification

0 commit comments

Comments
 (0)