Config tests expectations by gibahjoe · Pull Request #514 · digital-land/digital-land-python

gibahjoe · 2026-03-30T11:59:44Z

What type of PR is this? (check all applicable)

Description

Created reusable checkpoint functions to help with testing in various projects that depend on this project.

check_fields_are_within_range
Validates one or more numeric fields against min/max ranges from an external CSV, with optional row filtering via lookup_rules.
check_field_is_within_range_by_dataset_org
Range validation like above, but matched on dataset + organisation keys between source and range files.
check_allowed_values
Checks a field only contains values from an allowed list and reports invalid rows/values.
check_no_blank_rows
Fails when a row is fully blank (all columns empty/whitespace).
check_values_have_the_correct_datatype
Uses datatype validators to verify column values match expected datatypes and returns invalid rows with line_number, field, datatype, and value.

Related Tickets & Documents

Ticket Link [#2130] (Create more config acceptance tests config#2130)
Related Issue #
Closes #

QA Instructions, Screenshots, Recordings

Please replace this line with instructions on how to test your changes, a note
on the devices and browsers this has been tested on, as well as any relevant
images for UI changes.

Added/updated tests?

We encourage you to keep the code coverage percentage at 80% and above. Please refer to the Digital Land Testing Guidance for more information.

Yes
No, and this is why: please replace this line with details on why tests
have not been included
I need help with writing tests

[optional] Are there any post deployment tasks we need to perform?

[optional] Are there any dependencies on other PRs or Work?

…anges

…s with check_field_is_within_range in CSV operations and tests

… and matching

… add dataset organization matching

…e validators

cleanup

eveleighoj · 2026-03-30T13:02:32Z

digital_land/expectations/operations/datatype_validators.py

+from shapely.geometry import GeometryCollection, MultiPolygon, Point, Polygon, shape
+
+
+def _is_valid_datetime_value(value):


I believe this looks like duplication of the code from elsewhere in the repository. I'm not sure we should be duplicating it.

so for this function thee should be a csv expectation which checks the date format of he input file. you can make it so that i accepts an argument which is the valid date formats that can be accepted int he date column itself duckdb can that cast to that format.

For the purposes of what you'er doing we should fix a specific format. The code below was developed to convert from any number of formats where as you should be checking that it directly matches a specific format as for it to be good data all data in the csv file should have the same format.

instead of commenting on each function below we should look at converting them to duckdb queries which check the properties are true. we should be as specific as possible.

eveleighoj · 2026-03-30T17:31:59Z

digital_land/expectations/operations/csv.py

+        field: the column name to validate
+        allowed_values: allowed values for the field
+    """
+    cleaned_allowed_values = [


this allows for a lot of cleaning of the values themselves. Is this needed? surely we want values to match almost exactly?

No. Its not. will take it out.

gibahjoe · 2026-04-01T08:48:35Z

digital_land/expectations/operations/csv.py

+        "pattern": _is_valid_pattern_value,
+        "multipolygon": _is_valid_multipolygon_value,
+        "point": _is_valid_point_value,
+    }


I wasn’t able to use DuckDB with these data types. I am not sure if custom functions does work here like in Postgres.

Heya let's get these as separate expectations for now rather than one massive function, you already need to pass a data type dictionary in so you're already having to define what the datatypes are somewhere, instead of defining data types let's just have an expectation that says this column should be x.

e.g.

expect_column_to_be_datetime(conn,filepath,field='entry-date',date_format-'%Y-%M-%D')

separating it out makes it easier to read

gibahjoe · 2026-04-01T08:52:34Z

I have pushed up the fixes @eveleighoj. The datatype validator now uses duckDB except a few pattern, multipolygon and point.

eveleighoj

let's have different expectation functions for different data validators. if they can be done with duckdb then great if not they can just use python and then they can fail individually. e.g. it can pass a datetime one and fail an integer one.

If we find a way to do a better one that tests everything then we can re-use these functions in the future

eveleighoj · 2026-04-01T11:05:00Z

digital_land/expectations/operations/csv.py

+        "pattern": _is_valid_pattern_value,
+        "multipolygon": _is_valid_multipolygon_value,
+        "point": _is_valid_point_value,
+    }


Heya let's get these as separate expectations for now rather than one massive function, you already need to pass a data type dictionary in so you're already having to define what the datatypes are somewhere, instead of defining data types let's just have an expectation that says this column should be x.

e.g.

expect_column_to_be_datetime(conn,filepath,field='entry-date',date_format-'%Y-%M-%D')

separating it out makes it easier to read

- Replaced the previous `check_values_have_the_correct_datatype` function with specific functions for each datatype (e.g., `expect_column_to_be_integer`, `expect_column_to_be_decimal`, etc.). - Each new function performs validation for a specific datatype and returns detailed results including invalid rows. - Updated integration tests to reflect the new validation functions and ensure they cover various scenarios for each datatype. - Removed unused pattern validation function from `datatype_validators.py`.

gibahjoe added 9 commits March 26, 2026 16:46

feat: add CSV validation checks for allowed values and organisation r…

49374a9

…anges

refactor: replace check_lookup_entities_are_within_organisation_range…

38bcf9e

…s with check_field_is_within_range in CSV operations and tests

check_field_is_within_range to support structured rules for filtering…

3089bc8

… and matching

refactor: update CSV validation functions to improve range checks and…

3be0e32

… add dataset organization matching

add check_no_blank_rows function and corresponding tests

7052487

add datatype validation for CSV values and corresponding tests

2df8646

remove unused test for datatype validation in CsvCheckpoint

7be8ced

added conn to test

1e1d979

improve code formatting and readability in CSV operations and datatyp…

acb1449

…e validators

gibahjoe requested a review from eveleighoj March 30, 2026 12:45

gibahjoe marked this pull request as ready for review March 30, 2026 12:46

fixed pipeline failure

6818887

cleanup

gibahjoe force-pushed the config_tests_expectations branch from 171d4e1 to 6818887 Compare March 30, 2026 15:08

eveleighoj requested changes Mar 31, 2026

View reviewed changes

used duckdb for datatype validation

4ed3035

gibahjoe force-pushed the config_tests_expectations branch from f778c10 to 4ed3035 Compare March 31, 2026 14:04

gibahjoe commented Apr 1, 2026

View reviewed changes

gibahjoe requested a review from eveleighoj April 1, 2026 09:23

eveleighoj requested changes Apr 1, 2026

View reviewed changes

eveleighoj approved these changes Apr 2, 2026

View reviewed changes

gibahjoe merged commit 83476ad into main Apr 2, 2026
5 checks passed

gibahjoe deleted the config_tests_expectations branch April 2, 2026 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Config tests expectations#514

Config tests expectations#514
gibahjoe merged 12 commits intomainfrom
config_tests_expectations

gibahjoe commented Mar 30, 2026 •

edited

Loading

Uh oh!

eveleighoj Mar 30, 2026

Uh oh!

eveleighoj Mar 30, 2026

Uh oh!

eveleighoj Mar 30, 2026

Uh oh!

gibahjoe Mar 31, 2026

Uh oh!

gibahjoe Apr 1, 2026

Uh oh!

eveleighoj Apr 1, 2026

Uh oh!

gibahjoe commented Apr 1, 2026 •

edited

Loading

Uh oh!

eveleighoj left a comment

Uh oh!

eveleighoj Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from shapely.geometry import GeometryCollection, MultiPolygon, Point, Polygon, shape


		def _is_valid_datetime_value(value):

Conversation

gibahjoe commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Description

Related Tickets & Documents

QA Instructions, Screenshots, Recordings

Added/updated tests?

[optional] Are there any post deployment tasks we need to perform?

[optional] Are there any dependencies on other PRs or Work?

Uh oh!

eveleighoj Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

eveleighoj Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

eveleighoj Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gibahjoe Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gibahjoe Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

eveleighoj Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gibahjoe commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eveleighoj left a comment

Choose a reason for hiding this comment

Uh oh!

eveleighoj Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gibahjoe commented Mar 30, 2026 •

edited

Loading

gibahjoe commented Apr 1, 2026 •

edited

Loading