Provision-Quality- Suggested Approach #377
Replies: 1 comment 1 reply
-
|
The diagram is a little fuzzy for my old eyes, but can see these technical tests result in more issues, which seems right for the kinds of basic tests listed. There are some things which may need refining when it comes to implementing the issue tests, for example a future start-date is allowed where we have data ahead of a conservation-area, boundary change, etc coming into effect. But I think the team should crack-on and implement these kinds of tests. There are two other kinds of tests which we need to add which could have high value.
The challenge for the service is presenting a lot more issues and feedback in a way which doesn't overwhelm the provider. We don't want to cover their work with issues if the information is good after we've processed it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
We have been exploring how to approach populating the provision-quality table. Posting on here to discuss the suggested approach in the open.
Here is the notebook which demonstrates the process we’re currently using to create quality scores, and produces a demo provision-quality dataset. This notebook contains information about the purpose of the report and methodology of scoring provisions using the data quality framework.
Criteria not included in this example method are:
Tests and where they are implemented
The below diagram shows where tests which raise issues and expectations are implemented.
Tests included in suggested approach
The below is a write up of the Mural board here if a visual representation is preferred.
No geometry errors
This element of the quality criteria is made up of the following tests. All of which are recorded in the issue table.
OSGB out of bounds of England- this test raises an issue
Test description: Our default bounds are a bounding box around England. This issue is raised when the geometry is outside of this.
Unexpected geom type within GeometryCollection- this test raises an issue
Test description: When a geometry collection is provided, we check for a polygon if it doesn't exist then it is removed.
WGS84 out of bounds- this test raises an issue
Test description: The geospatial coordinates are out of bounds.
Invalid coordinates- this test raises an issue
Test description: It was not possible to process the field as geospatial coordinates. When processing a geometry or a point the coordinates contained are not WGS84, OSGB or Mercator so we could not process the geometry or point.
Unexpected geom type- this test raises an issue
Test description: Check the geometry type of a WKT if it's not within a given set and we remove it from the data.
WGS84 out of bounds of England- this test raises an issue
Test description: Our default bounds are a bounding box around England. This issue is raised when the geometry is outside of this.
Invalid geometry- not fixable- this test raises an issue
Test description: An error encountered while processing the geometry remains unresolved.
WGS84 out of bounds of custom boundary- this test raises an issue
Note: This test is currently not implemented at present as the custom boundary has not been defined. Out of bounds of England issue would be raised.
Test description: Our default bounds are a bounding box around England. This issue is raised when the geometry is outside of this.
OSGB out of bounds of custom boundary- this test raises an issue
Note: This test is currently not implemented at present as the custom boundary has not been defined. Out of bounds of England issue would be raised.
Test description: Our default bounds are a bounding box around England. This issue is raised when the geometry is outside of this.
No unknown entities
This element of the quality criteria is made up of one test as below. Unknown entities are recorded in the issue table.
Test description: The entry needs an entity assigning before it will appear in the data.
No duplicate references
This element of the quality criteria is made up of one test as below. Duplicate references are recorded in the issue table.
Test description: There are multiple entries in this resource with this reference and entry date.
No other types of validity error
This element of the quality criteria is made up of the following 13 tests. Validity errors are recorded in the issue table.
Future entry-date- this test raises an issue
Test description: Dates must be valid. If supplied date is not in a recognisable pattern
Invalid URI- this test raises an issue
Test description: URIs must be valid. When the URL is not valid (uses https://yozachar.github.io/pyvalidators/stable/api/url/)
Invalid WKT- this test raises an issue
Test description: Geometry must be in Well-Known Text (WKT) format. It was not possible to process the field as a WKT value.
Invalid category value- this test raises an issue
Test description: If a field has a typology of category, it should only contain values within the field dataset.e.g. the field
brownfield-land.planning-permission-statusshould only contain values in theplanning-permission-status.referencefield.Invalid date- this test raises an issue
Test description: If the supplied date format is not in a recognisable pattern.
Invalid decimal- this test raises an issue
Test description: It was not possible to process the field as a decimal number.
Invalid flag- this test raises an issue
Test description: If the field value is not one of : "", "yes", "no".
Invalid integer- this test raises an issue
Test description: It was not possible to process the field as an integer value. Decimals are converted to an integer.
Invalid organisation- this test raises an issue
Test description: Organisation data obtained during processing is invalid.
Missing value- this test raises an issue
Test description: This field is required in the dataset and while the field is in the resource the value is blank for the entry in the resource.
Too large- this test raises an issue
Test description: The decimal value was greater than the maximum value allowed for the field.
Too small- this test raises an issue
Test description: Numerical fields shouldn't be negative. When an integer or decimal field is less than zero.
Unknown entity- missing reference- this test raises an issue
Test description: The entry has a missing reference because of which an entity cannot be assigned.
Conservation area entity count matches LPA
This element of the quality criteria is made up of one test as below. Count expectation issues are recorded in the expectation table.
Test description: When the number of entities on the platform which have their centroid within an LPA's boundary doesn't match the number of areas published on the council website.
Entities within LPA boundary
This element of the quality criteria is made up of one test as below. Boundary expectation issues are recorded in the expectation table.
Test description: An ‘out of expected LPA bounds’ issue for ODP datasets is when the supplied geometry does not intersect with the boundary of the provider’s Local Planning Authority.
Future tests/iterations
If the above meets the initial needs of the provision-quality table, it can be handed over to engineering so they can begin building this process into the pipeline. We will be approaching this iteratively and there are tests that we would like to introduce in future below.
Note: We will be looking to record movement over time in the provision-quality table, using counts (frequency TBC).
No deleted entities
We suggest that this element of the quality criteria would be made up of one test as below. We suggest this would be handled by the expectation framework and would be recorded in the expectation table.
Test description: A test to check there are no entities missing from the entities on the active resource. It uses all active resource(s) for the provision.
Progress of this test: This is defined in expectations already, so we can add it in next. It would be recorded in the expectation table.
Area and document dataset
We suggest that this element of the quality criteria would be made up of one test as below. We suggest this would be handled via the issues framework and recorded in the issue table.
Test description: When there are any values for the foreign key tree-preservation-order in tree and tree-preservation-zone geography datasets which don't match to primary key reference in the tree-preservation-order document.
Or when there are any values for the foreign key article-4-direction in the article-4-direction-area geography dataset which don't match to the primary key reference in the article-4-direction document dataset.
Or when there are any values for the foreign key conservation-area in the conservation-area-document dataset which don't match to the primary key reference in the conservation-area geography dataset.
Progress of this test: There is a new issue type which identifies this. It’s not fully live yet as Providers need to work on the content to explain the issue to LPAs, but once it’s live we can integrate it. We would need to inform Providers that this is a dependency item.
Data from an authoritative source
Progress of this test: We need to define this further as there are numerous ways we could approach this, we need to agree how we determine this and do it in a structured way so we can easily pull it into the scoring process (perhaps a new expectation test). Could we use the provision table to highlight who the authoritative sources are?
Beta Was this translation helpful? Give feedback.
All reactions