Open Design Proposal - 003 - Querying on Complex geometries #330
Ben-Hodgkiss
started this conversation in
Open design proposal
Replies: 1 comment
-
|
As part of optimising the queries for the Hackathon, we have made the following improvements:
Currently, querying on the entity_subdivided table is enabled only when the dataset filter contains only flood-risk-zone. If other datasets are included, the query continues to use the original entity table. Due to time constraints for the Hackathon, full support for mixed datasets has not been implemented yet. Also, need to extend the functionality for all kind of queries (not just geometry related). These will be added after the Hackathon. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Author(s) - Samriti Sadhu
Introduction
Some of the API queries are exremely slow and often result in 502s, particularly when querying near large and complex datasets such as flood-risk-zone. To improve query performance, we propose using ST_Subdivide function of postgis to split large geometries into smaller, more manageable components and store them in a separate table. This approach will optimize spatial queries.
Result
The execution time was reduced from ~30 seconds to ~4 seconds for an example API query in local environment, significantly improving performance.
example query: https://www.planning.data.gov.uk/entity.json?longitude=-1.5093779539231569&latitude=53.36780846610864&entries=current&geometry_relation=intersects&limit=100&exclude_field=geometry,point
Status
Open
Detail
Overview
Significant performance degradation in spatial queries was observed, especially when dealing with large bounding boxes containing complex geometries. This was because some dastasets and entities like flood risk zones contained highly detailed geometries that extended across large areas, generating extensive bounding boxes. As a result, even when the queried points were outside the actual geometry, the large bounding box caused the geometry to be considered in spatial queries for that area. See the example below.
Things Tried
Index Optimization
Tried optimizing the existing spatial indexes to improve query performance. This involved re-indexing the entity.geometry column and ensuring that appropriate index types (e.g., GiST, SPGIST) were applied. Despite these optimizations, there wasn't significant performance improvements due to the size and complexity of the geometries involved.
Refining Spatial Queries
In an attempt to optimize the spatial queries, the ST_Subdivide function was used directly within the query. This approach aimed to split large geometries on the fly during query execution. However, it was found that applying ST_Subdivide in the query itself resulted in increased execution times, as the subdivision process was performed alongside the spatial filtering. This approach was not ideal, as it added computational overhead during query processing.
Implementation
A new table, entity_subdivided, will be introduced to store subdivided geometries derived from entity.geometry of complex datasets like flood-risk-zone. The subdivision will be done using ST_Subdivide, ensuring that large and complex geometries are broken into smaller parts while maintaining spatial accuracy.
The entity_subdivided table will store the multiple subdivided geometries corresponding to each entity hence a One-to-Many Relationship.
Key Changes:
Deployment and Testing Plan
Questions to consider
How will updates to geometries in entity_subdivided be handled?
A: This needs to be done along with entity table in digital-land-postgres.
Will there be DB updates or data migrations?
A: Yes, a new table needs to be created and updated with subdivided geometries from entity table for complex datasets like flood-risk-zone
Beta Was this translation helpful? Give feedback.
All reactions