Provision-Quality- Suggested Approach #377

paris-dp · 2025-05-02T15:25:34Z

paris-dp
May 2, 2025
Collaborator

Background

We have been exploring how to approach populating the provision-quality table. Posting on here to discuss the suggested approach in the open.

Here is the notebook which demonstrates the process we’re currently using to create quality scores, and produces a demo provision-quality dataset. This notebook contains information about the purpose of the report and methodology of scoring provisions using the data quality framework.

Criteria not included in this example method are:

Data from an authoritative source
No deleted entities

Tests and where they are implemented

The below diagram shows where tests which raise issues and expectations are implemented.

Tests included in suggested approach

The below is a write up of the Mural board here if a visual representation is preferred.

No geometry errors
This element of the quality criteria is made up of the following tests. All of which are recorded in the issue table.

OSGB out of bounds of England- this test raises an issue
Test description: Our default bounds are a bounding box around England. This issue is raised when the geometry is outside of this.
Unexpected geom type within GeometryCollection- this test raises an issue
Test description: When a geometry collection is provided, we check for a polygon if it doesn't exist then it is removed.
WGS84 out of bounds- this test raises an issue
Test description: The geospatial coordinates are out of bounds.
Invalid coordinates- this test raises an issue
Test description: It was not possible to process the field as geospatial coordinates. When processing a geometry or a point the coordinates contained are not WGS84, OSGB or Mercator so we could not process the geometry or point.
Unexpected geom type- this test raises an issue
Test description: Check the geometry type of a WKT if it's not within a given set and we remove it from the data.
WGS84 out of bounds of England- this test raises an issue
Test description: Our default bounds are a bounding box around England. This issue is raised when the geometry is outside of this.
Invalid geometry- not fixable- this test raises an issue
Test description: An error encountered while processing the geometry remains unresolved.
WGS84 out of bounds of custom boundary- this test raises an issue
Note: This test is currently not implemented at present as the custom boundary has not been defined. Out of bounds of England issue would be raised.
Test description: Our default bounds are a bounding box around England. This issue is raised when the geometry is outside of this.
OSGB out of bounds of custom boundary- this test raises an issue
Note: This test is currently not implemented at present as the custom boundary has not been defined. Out of bounds of England issue would be raised.
Test description: Our default bounds are a bounding box around England. This issue is raised when the geometry is outside of this.

No unknown entities
This element of the quality criteria is made up of one test as below. Unknown entities are recorded in the issue table.

Unknown entity- this test raises an issue
Test description: The entry needs an entity assigning before it will appear in the data.

No duplicate references
This element of the quality criteria is made up of one test as below. Duplicate references are recorded in the issue table.

Reference values are not unique- this test raises an issue
Test description: There are multiple entries in this resource with this reference and entry date.

No other types of validity error
This element of the quality criteria is made up of the following 13 tests. Validity errors are recorded in the issue table.

Future entry-date- this test raises an issue
Test description: Dates must be valid. If supplied date is not in a recognisable pattern
Invalid URI- this test raises an issue
Test description: URIs must be valid. When the URL is not valid (uses https://yozachar.github.io/pyvalidators/stable/api/url/)
Invalid WKT- this test raises an issue
Test description: Geometry must be in Well-Known Text (WKT) format. It was not possible to process the field as a WKT value.
Invalid category value- this test raises an issue
Test description: If a field has a typology of category, it should only contain values within the field dataset.e.g. the field brownfield-land.planning-permission-status should only contain values in the planning-permission-status.reference field.
Invalid date- this test raises an issue
Test description: If the supplied date format is not in a recognisable pattern.
Invalid decimal- this test raises an issue
Test description: It was not possible to process the field as a decimal number.
Invalid flag- this test raises an issue
Test description: If the field value is not one of : "", "yes", "no".
Invalid integer- this test raises an issue
Test description: It was not possible to process the field as an integer value. Decimals are converted to an integer.
Invalid organisation- this test raises an issue
Test description: Organisation data obtained during processing is invalid.
Missing value- this test raises an issue
Test description: This field is required in the dataset and while the field is in the resource the value is blank for the entry in the resource.
Too large- this test raises an issue
Test description: The decimal value was greater than the maximum value allowed for the field.
Too small- this test raises an issue
Test description: Numerical fields shouldn't be negative. When an integer or decimal field is less than zero.
Unknown entity- missing reference- this test raises an issue
Test description: The entry has a missing reference because of which an entity cannot be assigned.

Conservation area entity count matches LPA
This element of the quality criteria is made up of one test as below. Count expectation issues are recorded in the expectation table.

Check number of entities inside the local planning authority boundary matches the manual count- this test raises an expectation
Test description: When the number of entities on the platform which have their centroid within an LPA's boundary doesn't match the number of areas published on the council website.

Entities within LPA boundary
This element of the quality criteria is made up of one test as below. Boundary expectation issues are recorded in the expectation table.

Check no entities are outside of the local planning authority boundary- this test raises an expectation
Test description: An ‘out of expected LPA bounds’ issue for ODP datasets is when the supplied geometry does not intersect with the boundary of the provider’s Local Planning Authority.

Future tests/iterations

If the above meets the initial needs of the provision-quality table, it can be handed over to engineering so they can begin building this process into the pipeline. We will be approaching this iteratively and there are tests that we would like to introduce in future below.

Note: We will be looking to record movement over time in the provision-quality table, using counts (frequency TBC).

No deleted entities

We suggest that this element of the quality criteria would be made up of one test as below. We suggest this would be handled by the expectation framework and would be recorded in the expectation table.

Check no entities are missing on the active resource- this test raises an expectation
Test description: A test to check there are no entities missing from the entities on the active resource. It uses all active resource(s) for the provision.
Progress of this test: This is defined in expectations already, so we can add it in next. It would be recorded in the expectation table.

Area and document dataset
We suggest that this element of the quality criteria would be made up of one test as below. We suggest this would be handled via the issues framework and recorded in the issue table.

No associated documents found for this area
Test description: When there are any values for the foreign key tree-preservation-order in tree and tree-preservation-zone geography datasets which don't match to primary key reference in the tree-preservation-order document.
Or when there are any values for the foreign key article-4-direction in the article-4-direction-area geography dataset which don't match to the primary key reference in the article-4-direction document dataset.
Or when there are any values for the foreign key conservation-area in the conservation-area-document dataset which don't match to the primary key reference in the conservation-area geography dataset.

Progress of this test: There is a new issue type which identifies this. It’s not fully live yet as Providers need to work on the content to explain the issue to LPAs, but once it’s live we can integrate it. We would need to inform Providers that this is a dependency item.

Data from an authoritative source
Progress of this test: We need to define this further as there are numerous ways we could approach this, we need to agree how we determine this and do it in a structured way so we can easily pull it into the scoring process (perhaps a new expectation test). Could we use the provision table to highlight who the authoritative sources are?

psd · 2025-05-08T10:44:43Z

psd
May 8, 2025
Maintainer

The diagram is a little fuzzy for my old eyes, but can see these technical tests result in more issues, which seems right for the kinds of basic tests listed. There are some things which may need refining when it comes to implementing the issue tests, for example a future start-date is allowed where we have data ahead of a conservation-area, boundary change, etc coming into effect. But I think the team should crack-on and implement these kinds of tests.

There are two other kinds of tests which we need to add which could have high value.

We need tests on the information, eg do we have any missing or too many conservation areas. Some of these are listed in the quality challenge
We also need a way of feeding back issues we or our users have noted with the data, or if we're missing a source, for example.

The challenge for the service is presenting a lot more issues and feedback in a way which doesn't overwhelm the provider. We don't want to cover their work with issues if the information is good after we've processed it.

1 reply

paris-dp May 8, 2025
Collaborator Author

Hi Paul,

Here is the diagram on Mural- hope that is helpful. Most of the tests listed are already happening, but yes as we add more tests, this would introduce more issues. I have referred to future entry-date above, agree with what you're saying around future start-date though.

We do have conservation area entity count test included in the approach, and if new issue types arise from the AI work,/other tests we want to introduce, they could be included as criteria which impact data quality/the provision quality table.

The approach outlined is to feed in to data visualisation via the performance reporting dashboard, with you as the main user, and consumers as possible users, however I agree, when we introduce new issue types over time we do need to think through how this is displayed through the providers service.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provision-Quality- Suggested Approach #377

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Provision-Quality- Suggested Approach #377

Uh oh!

Uh oh!

paris-dp May 2, 2025 Collaborator

Background

Tests and where they are implemented

Tests included in suggested approach

Future tests/iterations

Replies: 1 comment · 1 reply

Uh oh!

psd May 8, 2025 Maintainer

Uh oh!

paris-dp May 8, 2025 Collaborator Author

paris-dp
May 2, 2025
Collaborator

Replies: 1 comment 1 reply

psd
May 8, 2025
Maintainer

paris-dp May 8, 2025
Collaborator Author