2396 check that all entity numbers in lookupcsv are contained within the ranges in entity organisationcsv by gibahjoe · Pull Request #2406 · digital-land/config

gibahjoe · 2026-03-19T17:00:10Z

Lookup/old-entity entity validation

Datatype Validation coverage

Notes

Datatypes come from field.csv in specification repo, with field-level overrides where needed.
Validation is inspired from specification repo especially for dateformat.

Collapsed #2397 into this pr so we have only one pr to manage instead

…pipeline-csvs' into 2396-check-that-all-entity-numbers-in-lookupcsv-are-contained-within-the-ranges-in-entity-organisationcsv

removed wrong column made error clearer clearer error

…csv-are-contained-within-the-ranges-in-entity-organisationcsv

Ben-Hodgkiss · 2026-03-20T09:13:22Z

tests/acceptance/test_config_dataset.py

+def _datasette_query_csv(db, sql):
+    params = urllib.parse.urlencode({"sql": sql, "_size": "max"})
+    query_url = f"{DATASETTE_BASE_URL}/{db}.csv?{params}"
+
+    try:
+        with urllib.request.urlopen(query_url, timeout=30) as response:
+            content = response.read().decode("utf-8")
+    except Exception as exc:
+        pytest.skip(f"Could not load Datasette query from {query_url}: {exc}")
+
+    return list(csv.DictReader(io.StringIO(content)))
+
+
+def _ranges_for_collection_from_datasette(collection_name):
+    escaped_collection = collection_name.replace("'", "''")
+    sql = (
+        "select dataset, collection, entity_minimum, entity_maximum "
+        "from dataset "
+        f"where collection = '{escaped_collection}'"
+    )
+
+    rows = _datasette_query_csv(DATASETTE_DB, sql)
+
+    ranges = []
+
+    for row in rows:
+        if not isinstance(row, dict):
+            continue
+
+        try:
+            min_val = int(row.get("entity_minimum"))
+            max_val = int(row.get("entity_maximum"))
+        except (TypeError, ValueError):
+            continue
+
+        dataset_name = (row.get("collection") or "").strip()
+        ranges.append((dataset_name, min_val, max_val))
+
+    return ranges


Datasette represents the post-processing state of the data. So if we are adding a new dataset then it won't be in there yet and might fail.

Instead we should use the raw specification to check these at https://raw.githubusercontent.com/digital-land/specification/refs/heads/main/specification/dataset.csv as this will always be updated first.

Should be a fairly easy swap-around though!

test can be improved to 1. only check entities with organisation and ignore those without org value.
2. ignore orgs government-organisation:D1342 and government-organisation:PB1164 from conservation area collection

…hanged the entity ranges to read from specification

…csv-are-contained-within-the-ranges-in-entity-organisationcsv # Conflicts: # collection/brownfield-land/endpoint.csv # collection/brownfield-land/source.csv

…ead of local file

…ests for CSV data integrity

…lookup tests

…tained-within-the-ranges-in-entity-organisationcsv' into 2397-check-that-any-values-in-columns-with-a-date-datatype-contain-valid-dates

…ong dates

eveleighoj

I think you should look at using expectations to run these checks. It'll be easier to run them on other csvs in the future. Right now it's support specific to these and uses inefficient methods for running this analysis.

eveleighoj · 2026-03-25T19:41:34Z

tests/acceptance/test_config_dataset.py


 REPO_ROOT = Path(__file__).resolve().parents[2]
 SEARCH_DIRS = ["pipeline", "collection"]
+SPECIFICATION_DATASET_URL = "https://raw.githubusercontent.com/digital-land/specification/refs/heads/main/specification/dataset.csv"


Instead of just getting the one csv from the specification create a pytest fixture to download the full spec, look at the specification class in digital-land-python. It should have a method to download it to a directory, you can use a temporary directory by using the tmp_path fixture.

This means the spec will be downloaded once and only once for any tests, especially if you set the fixture scope to the session. You can put the fixture in a conftest.py file

eveleighoj · 2026-03-25T20:08:31Z

tests/acceptance/test_config_dataset.py

+    lookup_files,
+    ids=[_test_id(f) for f in lookup_files],
+)
+def test_lookup_entities_within_organisation_ranges(lookup_file):


I suggest using duckdb or a similar technology to run analytical queries like this, reading and scanning through them can become costly and inefficient.

again you could write an expectation for this that can be reused on other csvs

eveleighoj · 2026-03-25T20:10:16Z

tests/acceptance/test_config_dataset.py

+    return ranges
+
+
+OLD_ENTITY_IGNORED_ORGANISATIONS = {


the old_entity table doesn't contain any organisation data so I'm not sure why this variable is called this?

eveleighoj · 2026-03-25T20:15:18Z

tests/acceptance/test_config_dataset.py

+    old_entity_files,
+    ids=[_test_id(f) for f in old_entity_files],
+)
+def test_old_entity_entities_are_within_specification_entity_ranges(old_entity_file):


the old_entity files already have a test which can be expanded using expectations from digital-land-python. for this one I suggest creating a csv operation in here https://github.com/digital-land/digital-land-python/blob/main/digital_land/expectations/operations/csv.py which changes that a column contains integers that belong to specific ranges.

You can then use that expectation in the OLD_ENTITY_RULES to remove a lot of the code here and keep things simple. It's more complicated to set up but the code is more portable and can be used in several different scenarios.

Updated tree-preservation-zone entries for local authorities HOR and BEN.

Updated heritage-at-risk entity ranges in the CSV.

Removed duplicate header and fixed buffer zone range.

…csv-are-contained-within-the-ranges-in-entity-organisationcsv # Conflicts: # pipeline/local-plan/entity-organisation.csv # pipeline/tree-preservation-order/entity-organisation.csv

…tained-within-the-ranges-in-entity-organisationcsv' into 2397-check-that-any-values-in-columns-with-a-date-datatype-contain-valid-dates # Conflicts: # collection/article-4-direction/source.csv

Removed HTML entities from the CSV file.

Updated the CSV file to correct the formatting of the documentation URL.

Reordered the header and data rows in the source.csv file.

Reinserted header and added new entries to the local planning authority data.

…csv-are-contained-within-the-ranges-in-entity-organisationcsv # Conflicts: # collection/developer-contributions/endpoint.csv # collection/developer-contributions/source.csv # collection/historic-england/endpoint.csv # collection/historic-england/source.csv # pipeline/developer-contributions/lookup.csv # pipeline/historic-england/entity-organisation.csv # tests/acceptance/test_config_dataset.py

…pping

…csv-are-contained-within-the-ranges-in-entity-organisationcsv

…csv-are-contained-within-the-ranges-in-entity-organisationcsv # Conflicts: # collection/border/source.csv # collection/developer-contributions/endpoint.csv # collection/developer-contributions/source.csv # collection/historic-england/endpoint.csv # collection/historic-england/source.csv # pipeline/article-4-direction/entity-organisation.csv # pipeline/brownfield-land/entity-organisation.csv # pipeline/conservation-area/entity-organisation.csv # pipeline/historic-england/entity-organisation.csv # pipeline/listed-building/entity-organisation.csv # pipeline/tree-preservation-order/entity-organisation.csv

Merge branch '2395-check-there-are-no-blank-rows-in-any-of-the-config…

889c5ae

…pipeline-csvs' into 2396-check-that-all-entity-numbers-in-lookupcsv-are-contained-within-the-ranges-in-entity-organisationcsv

This was linked to issues Mar 19, 2026

Check that all entity numbers in lookup.csv are contained within the ranges in entity-organisation.csv #2396

Open

entity in the old-entity.csv belongs to entity range of the pipelines for the specific collection #2400

Open

Swati-Dash and others added 2 commits March 19, 2026 17:15

Update column.csv

5076d76

Add validation tests for entity ranges in lookup and old-entity CSVs

a162a9c

removed wrong column made error clearer clearer error

gibahjoe force-pushed the 2396-check-that-all-entity-numbers-in-lookupcsv-are-contained-within-the-ranges-in-entity-organisationcsv branch from 52224b1 to a162a9c Compare March 19, 2026 17:15

Merge branch 'main' into 2396-check-that-all-entity-numbers-in-lookup…

7e31baa

…csv-are-contained-within-the-ranges-in-entity-organisationcsv

Ben-Hodgkiss requested changes Mar 20, 2026

View reviewed changes

gibahjoe and others added 18 commits March 23, 2026 11:02

Add range validation for entities against specification dataset and c…

1b063b7

…hanged the entity ranges to read from specification

Merge branch 'main' into 2396-check-that-all-entity-numbers-in-lookup…

edae244

…csv-are-contained-within-the-ranges-in-entity-organisationcsv # Conflicts: # collection/brownfield-land/endpoint.csv # collection/brownfield-land/source.csv

Remove unused dataset URL constants

7c63a91

Refactor range validation to load specification dataset from URL inst…

c96e55d

…ead of local file

test: Add validation functions for various data types and implement t…

396539e

…ests for CSV data integrity

fix: entity organisation validation to skip ignored organisations in …

5cf2c23

…lookup tests

fix: entity organisation validation to skip ignored organisations in …

a305e85

…lookup tests

Merge branch '2396-check-that-all-entity-numbers-in-lookupcsv-are-con…

df039a0

…tained-within-the-ranges-in-entity-organisationcsv' into 2397-check-that-any-values-in-columns-with-a-date-datatype-contain-valid-dates

test: improved integer validation and added script for fixing some wr…

a2ad7ba

…ong dates

corrected date formats.

b401997

Update entity-organisation.csv

c8601a1

Update entity-organisation.csv

63b0ae6

Update entity-organisation.csv

ea622d2

Fix entity-organisation.csv local authority entries

efc050c

Update entity-organisation.csv

c7c93a2

Update entity-organisation.csv

64fa21b

Update entity-organisation.csv

572ec92

Update entity-organisation.csv

562b60b

eveleighoj requested changes Mar 25, 2026

View reviewed changes

Swati-Dash added 3 commits March 25, 2026 21:01

Update entity-organisation.csv

14140e1

Fix tree-preservation-zone IDs for local authorities

e2dc7dc

Updated tree-preservation-zone entries for local authorities HOR and BEN.

Update entity-organisation.csv

7eb5552

Swati-Dash and others added 9 commits March 25, 2026 21:29

Update entity-minimum value in CSV

8b6416b

remove entities with wrong pre-fixes

a1b825e

Add new heritage-at-risk entries to CSV

05305c0

Modify entity-organisation.csv with new data ranges

3ebe0b4

Updated heritage-at-risk entity ranges in the CSV.

Fix CSV header and update world heritage site buffer

a44ce4b

Removed duplicate header and fixed buffer zone range.

Add planning-application-condition to entity-organisation

a73f748

update-lookup

eeb007a

Merge branch 'main' into 2396-check-that-all-entity-numbers-in-lookup…

bf079a9

…csv-are-contained-within-the-ranges-in-entity-organisationcsv # Conflicts: # pipeline/local-plan/entity-organisation.csv # pipeline/tree-preservation-order/entity-organisation.csv

Merge branch '2396-check-that-all-entity-numbers-in-lookupcsv-are-con…

83e934e

…tained-within-the-ranges-in-entity-organisationcsv' into 2397-check-that-any-values-in-columns-with-a-date-datatype-contain-valid-dates # Conflicts: # collection/article-4-direction/source.csv

This was linked to issues Mar 26, 2026

Check that any values in columns with a date datatype contain valid dates #2397

Open

This could then be generalised further to do other datatype checks (integers etc...) #2398

Open

Swati-Dash and others added 19 commits March 26, 2026 17:35

Clean up HTML entities in source.csv

ea9cc2c

Removed HTML entities from the CSV file.

Fix formatting in source.csv

d5be69b

Updated the CSV file to correct the formatting of the documentation URL.

Reorder rows in source.csv

35ab48e

Reordered the header and data rows in the source.csv file.

Update local planning authority source data

35fd3ff

Reinserted header and added new entries to the local planning authority data.

Update source.csv

b04b063

Update requirements and refactor tests to use digital_land_python lib

2eff0fe

fixed errors with some files having different line endings.

c783f19

completed refactoring

d955109

Update test_all_csv to accept specification_dir for field datatype ma…

f51e510

…pping

Merge branch 'main' into 2396-check-that-all-entity-numbers-in-lookup…

e01965b

…csv-are-contained-within-the-ranges-in-entity-organisationcsv

temporarily skip all csv tests

0f95c7a

skipped lookup test

a21eff8

removed all csv test skip

371e668

fixed some date formats

0707520

fixed invalid refs

23c2686

temporarily disabled date format checks

9740e70

updated datetime expectation to use date format in test_all_csv

504e928

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2396 check that all entity numbers in lookupcsv are contained within the ranges in entity organisationcsv#2406

2396 check that all entity numbers in lookupcsv are contained within the ranges in entity organisationcsv#2406
gibahjoe wants to merge 54 commits intomainfrom
2396-check-that-all-entity-numbers-in-lookupcsv-are-contained-within-the-ranges-in-entity-organisationcsv

gibahjoe commented Mar 19, 2026 •

edited

Loading

Uh oh!

Ben-Hodgkiss Mar 20, 2026

Uh oh!

Swati-Dash Mar 20, 2026

Uh oh!

eveleighoj left a comment

Uh oh!

eveleighoj Mar 25, 2026

Uh oh!

eveleighoj Mar 25, 2026

Uh oh!

eveleighoj Mar 25, 2026

Uh oh!

eveleighoj Mar 25, 2026

Uh oh!

eveleighoj Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gibahjoe commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lookup/old-entity entity validation

Datatype Validation coverage

Notes

Uh oh!

Ben-Hodgkiss Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Swati-Dash Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

eveleighoj left a comment

Choose a reason for hiding this comment

Uh oh!

eveleighoj Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

eveleighoj Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

eveleighoj Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

eveleighoj Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

eveleighoj Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gibahjoe commented Mar 19, 2026 •

edited

Loading