Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 35 additions & 12 deletions md5_queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,30 +41,38 @@ potential improvement: make exception for first tile on z=0 - do not filter on
```sql
SELECT MD5(ST_AsBinary(geom)::text) as geom_hash
FROM bertt.nantes_reconstructed_buildings
WHERE MD5(ST_AsBinary(geom)::text) in ('9759cdee666f512a0c13df8245b667f9',... )
WHERE MD5(ST_AsBinary(geom)::text) = ANY($1)
AND ST_Within(
ST_Centroid(ST_Envelope(geom)),
ST_Transform(ST_MakeEnvelope(@xmin, @ymin, @xmax, @ymax, 4326), 5698)
ST_Transform(ST_MakeEnvelope($2, $3, $4, $5, 4326), 5698)
)
```

Note: Using parameterized query with array parameter instead of string concatenation.

5] Count geometries in bounding box on level 1 excluding the geometries from tile 0_0_0.glb, including only the geometries within the tile

```sql
SELECT COUNT(geom) FROM bertt.nantes_reconstructed_buildings WHERE ST_Centroid(ST_Envelope(geom)) && st_transform(ST_MakeEnvelope(-1.847105103048876, 47.14626198148698, -1.497208649149572, 47.384471872766284, 4326), 5698) AND MD5(ST_AsBinary(geom)::text) NOT IN ('9759cdee666f512a0c13df8245b667f9',..1000 items, ...)
SELECT COUNT(geom) FROM bertt.nantes_reconstructed_buildings WHERE ST_Centroid(ST_Envelope(geom)) && st_transform(ST_MakeEnvelope(-1.847105103048876, 47.14626198148698, -1.497208649149572, 47.384471872766284, 4326), 5698) AND MD5(ST_AsBinary(geom)::text) != ALL($1)
```

Note: Using parameterized query with array parameter instead of string concatenation.

Result: 235787

6] Get geometries for tile 1_0_0.glb - 1000 largest geometries in tile 1_0_0 (10 seconds!)
6] Get geometries for tile 1_0_0.glb - 1000 largest geometries in tile 1_0_0

```sql
SELECT ST_AsBinary(st_transform(geom, 4978)), id , MD5(ST_AsBinary(geom)::text) as geom_hash FROM bertt.nantes_reconstructed_buildings where ST_Centroid(ST_Envelope(geom)) && st_transform(ST_MakeEnvelope(-1.847105103048876, 47.14626198148698, -1.497208649149572, 47.384471872766284, 4326), 5698) AND MD5(ST_AsBinary(geom)::text) NOT IN ('9759cdee666f512a0c13df8245b667f9', ..1000 items, ...) ORDER BY ST_Area(ST_Envelope(geom)) DESC LIMIT 1000
SELECT ST_AsBinary(st_transform(geom, 4978)), id , MD5(ST_AsBinary(geom)::text) as geom_hash FROM bertt.nantes_reconstructed_buildings where ST_Centroid(ST_Envelope(geom)) && st_transform(ST_MakeEnvelope(-1.847105103048876, 47.14626198148698, -1.497208649149572, 47.384471872766284, 4326), 5698) AND MD5(ST_AsBinary(geom)::text) != ALL($1) ORDER BY ST_Area(ST_Envelope(geom)) DESC LIMIT 1000
```

Note: Using parameterized query with array parameter instead of string concatenation.

## Issue

List of hashes can get long (maximum (z*1000 items), giving more slow query
List of hashes can get long (maximum z*1000 items). Previously this was handled with string concatenation which could lead to performance issues and potential SQL injection vulnerabilities.

**Solution**: Now using parameterized queries with PostgreSQL's `= ANY()` and `!= ALL()` operators for better performance and security.

## Spatial indexing

Expand All @@ -85,27 +93,42 @@ List of hashes can get long (maximum (z*1000 items), giving more slow query
The queries now use three main patterns:

1] Spatial filtering with MD5 hash exclusion (GetGeometrySubset): WHERE ST_Centroid(ST_Envelope(geom_triangle)) && <envelope>
AND MD5(ST_AsBinary(geom_triangle)::text) NOT IN (<hash_list>)
AND MD5(ST_AsBinary(geom_triangle)::text) != ALL($1)

2] MD5 hash filtering with spatial validation (FilterHashesByEnvelope): WHERE MD5(ST_AsBinary(geom_triangle)::text) IN (<hash_list>)
2] MD5 hash filtering with spatial validation (FilterHashesByEnvelope): WHERE MD5(ST_AsBinary(geom_triangle)::text) = ANY($1)
AND ST_Within(ST_Centroid(ST_Envelope(geom_triangle)), <envelope>)

3] Hash-only filtering (GetGeometriesBoundingBox): WHERE MD5(ST_AsBinary(geom_triangle)::text) IN (<hash_list>)
3] Hash-only filtering (GetGeometriesBoundingBox): WHERE MD5(ST_AsBinary(geom_triangle)::text) = ANY($1)

Performance Notes:

1] The GIST spatial index handles the ST_Centroid(ST_Envelope(geom_triangle)) predicates

2] The MD5 hash BTREE index handles the MD5(ST_AsBinary(geom_triangle)::text) IN/NOT IN predicates
2] The MD5 hash BTREE index handles the MD5(ST_AsBinary(geom_triangle)::text) = ANY/!= ALL predicates

3] PostgreSQL will use both indexes (bitmap index scan) for queries with both predicates

4] Using parameterized queries with ANY/ALL operators provides better performance than string-concatenated IN/NOT IN clauses

Optional: Materialized Hash Column

## Solution

The hash filtering now uses PostgreSQL's `= ANY(@param)` operator with array parameters instead of string concatenation:

1. **Hash Inclusion (IN clause)**: Changed from `MD5(...) IN ('hash1', 'hash2', ...)` to `MD5(...) = ANY(@hashes)` with parameterized array
2. **Hash Exclusion (NOT IN clause)**: Changed from `MD5(...) NOT IN ('hash1', 'hash2', ...)` to `MD5(...) != ALL(@excludeHashes)` with parameterized array

Benefits:
- Eliminates SQL injection risk (even though MD5 hashes are predictable)
- Better performance with large hash lists
- Cleaner, more maintainable code
- Proper use of parameterized queries

## Todo

- idea: make a temporary blacklist table with the to be exluded hashes?
- ~~idea: make a temporary blacklist table with the to be exluded hashes?~~ (Solved using parameterized arrays)

- idea: force use of id column (longs)?

- Other solutions?
- ~~Other solutions?~~ (Implemented using `ANY` and `ALL` operators)
28 changes: 20 additions & 8 deletions src/b3dm.tileset/FeatureCountRepository.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
using System.Collections.Generic;
using System.Linq;
using Npgsql;
using Wkx;

Expand All @@ -9,16 +10,27 @@ public static class FeatureCountRepository
public static int CountFeaturesInBox(NpgsqlConnection conn, string geometry_table, string geometry_column, Point from, Point to, string query, int source_epsg, bool keepProjection = false, HashSet<string> excludeHashes = null)
{
var select = $"COUNT({geometry_column})";
var where = GeometryRepository.GetWhere(geometry_column, from, to, query, source_epsg, keepProjection, excludeHashes);
var where = GeometryRepository.GetWhere(geometry_column, from, to, query, source_epsg, keepProjection);

// Add hash exclusion filter using parameterized query
if (excludeHashes != null && excludeHashes.Count > 0) {
where += $" AND MD5(ST_AsBinary({geometry_column})::text) != ALL(@excludeHashes)";
}

var sql = $"SELECT {select} FROM {geometry_table} WHERE {where}";
conn.Open();
var cmd = new NpgsqlCommand(sql, conn);
var reader = cmd.ExecuteReader();
reader.Read();
var count = reader.GetInt32(0);
reader.Close();
conn.Close();
return count;
try {
using var cmd = new NpgsqlCommand(sql, conn);
if (excludeHashes != null && excludeHashes.Count > 0) {
cmd.Parameters.AddWithValue("excludeHashes", excludeHashes.ToArray());
}
using var reader = cmd.ExecuteReader();
reader.Read();
var count = reader.GetInt32(0);
return count;
}
finally {
conn.Close();
}
}
}
123 changes: 66 additions & 57 deletions src/b3dm.tileset/GeometryRepository.cs
Original file line number Diff line number Diff line change
Expand Up @@ -21,39 +21,39 @@ public static HashSet<string> FilterHashesByEnvelope(NpgsqlConnection conn, stri

var filteredHashes = new HashSet<string>();

var hashList = string.Join(",", geometryHashes.Select(h => $"'{h}'"));

conn.Open();
try {
var envelope = keepProjection ?
$"ST_MakeEnvelope(@xmin, @ymin, @xmax, @ymax, {source_epsg})" :
$"ST_Transform(ST_MakeEnvelope(@xmin, @ymin, @xmax, @ymax, 4326), {source_epsg})";

var query = $@"
SELECT MD5(ST_AsBinary({geometryColumn})::text) as geom_hash
FROM {tableName}
WHERE MD5(ST_AsBinary({geometryColumn})::text) = ANY(@hashes)
AND ST_Within(
ST_Centroid(ST_Envelope({geometryColumn})),
{envelope}
)";

using var cmd = new NpgsqlCommand(query, conn);
cmd.Parameters.AddWithValue("hashes", geometryHashes.ToArray());
cmd.Parameters.AddWithValue("xmin", bbox.XMin);
cmd.Parameters.AddWithValue("ymin", bbox.YMin);
cmd.Parameters.AddWithValue("xmax", bbox.XMax);
cmd.Parameters.AddWithValue("ymax", bbox.YMax);

using var reader = cmd.ExecuteReader();
while (reader.Read()) {
var hash = reader.GetString(0);
filteredHashes.Add(hash);
}

var envelope = keepProjection ?
$"ST_MakeEnvelope(@xmin, @ymin, @xmax, @ymax, {source_epsg})" :
$"ST_Transform(ST_MakeEnvelope(@xmin, @ymin, @xmax, @ymax, 4326), {source_epsg})";

var query = $@"
SELECT MD5(ST_AsBinary({geometryColumn})::text) as geom_hash
FROM {tableName}
WHERE MD5(ST_AsBinary({geometryColumn})::text) in ({hashList})
AND ST_Within(
ST_Centroid(ST_Envelope({geometryColumn})),
{envelope}
)";

using var cmd = new NpgsqlCommand(query, conn);
cmd.Parameters.AddWithValue("hashes", geometryHashes.ToArray());
cmd.Parameters.AddWithValue("xmin", bbox.XMin);
cmd.Parameters.AddWithValue("ymin", bbox.YMin);
cmd.Parameters.AddWithValue("xmax", bbox.XMax);
cmd.Parameters.AddWithValue("ymax", bbox.YMax);

using var reader = cmd.ExecuteReader();
while (reader.Read()) {
var hash = reader.GetString(0);
filteredHashes.Add(hash);
return filteredHashes;
}
finally {
conn.Close();
}

conn.Close();

return filteredHashes;
}

/// <summary>
Expand All @@ -65,23 +65,24 @@ public static double[] GetGeometriesBoundingBox(NpgsqlConnection conn, string ge
$"select st_Asbinary(st_3dextent({geometry_column})) ":
$"select st_Asbinary(st_3dextent(st_transform({geometry_column}, 4979))) ";

var hashList = string.Join(",", tileHashes.Select(h => $"'{h}'"));

var sqlWhere = $" MD5(ST_AsBinary({geometry_column})::text) in ({hashList})";
var sqlWhere = $" MD5(ST_AsBinary({geometry_column})::text) = ANY(@hashes)";
var sql = $"{sqlSelect} from {geometry_table} where {sqlWhere}";

conn.Open();
var cmd = new NpgsqlCommand(sql, conn);
var reader = cmd.ExecuteReader();
reader.Read();
var stream = reader.GetStream(0);
var geometry = Geometry.Deserialize<WkbSerializer>(stream);
var result = BBox3D.GetBoundingBoxPoints(geometry);

reader.Close();
conn.Close();
try {
using var cmd = new NpgsqlCommand(sql, conn);
cmd.Parameters.AddWithValue("hashes", tileHashes.ToArray());
using var reader = cmd.ExecuteReader();
reader.Read();
var stream = reader.GetStream(0);
var geometry = Geometry.Deserialize<WkbSerializer>(stream);
var result = BBox3D.GetBoundingBoxPoints(geometry);

return result;
return result;
}
finally {
conn.Close();
}
}

public static List<GeometryRecord> GetGeometrySubset(NpgsqlConnection conn, string geometry_table, string geometry_column, double[] bbox, int source_epsg, int target_srs, string shaderColumn = "", string attributesColumns = "", string query = "", string radiusColumn = "", bool keepProjection = false, HashSet<string> excludeHashes = null, int? maxFeatures = null)
Expand All @@ -92,16 +93,33 @@ public static List<GeometryRecord> GetGeometrySubset(NpgsqlConnection conn, stri
// todo: fix unit test when there is no z
var points = GetPoints(bbox);

var sqlWhere = GetWhere(geometry_column, points.fromPoint, points.toPoint, query, source_epsg, keepProjection, excludeHashes);
var sqlWhere = GetWhere(geometry_column, points.fromPoint, points.toPoint, query, source_epsg, keepProjection);

// Add hash exclusion filter using parameterized query
if (excludeHashes != null && excludeHashes.Count > 0) {
sqlWhere += $" AND MD5(ST_AsBinary({geometry_column})::text) != ALL(@excludeHashes)";
}

var sqlOrderBy = GetOrderBy(geometry_column);
var sqlLimit = maxFeatures.HasValue ? $" LIMIT {maxFeatures.Value}" : "";
var sql = sqlselect + sqlFrom + " where " + sqlWhere + sqlOrderBy + sqlLimit;

var geometries = GetGeometries(conn, shaderColumn, attributesColumns, sql, radiusColumn, geometry_column);
return geometries;
conn.Open();
try {
using var cmd = new NpgsqlCommand(sql, conn);
if (excludeHashes != null && excludeHashes.Count > 0) {
cmd.Parameters.AddWithValue("excludeHashes", excludeHashes.ToArray());
}

var geometries = GetGeometries(cmd, shaderColumn, attributesColumns, radiusColumn, geometry_column);
return geometries;
}
finally {
conn.Close();
}
}

public static string GetWhere(string geometry_column, Point from, Point to, string query, int source_epsg, bool keepProjection, HashSet<string> excludeHashes = null)
public static string GetWhere(string geometry_column, Point from, Point to, string query, int source_epsg, bool keepProjection)
{
var fromX = from.X.Value.ToString(CultureInfo.InvariantCulture);
var fromY = from.Y.Value.ToString(CultureInfo.InvariantCulture);
Expand Down Expand Up @@ -130,12 +148,6 @@ public static string GetWhere(string geometry_column, Point from, Point to, stri
$"ST_3DIntersects({geom}, st_transform(ST_3DMakeBox({fromBox}, {toBox}), {source_epsg})) {query}";
}

// Add hash exclusion filter
if (excludeHashes != null && excludeHashes.Count > 0) {
var hashList = string.Join(",", excludeHashes.Select(h => $"'{h}'"));
where += $" AND MD5(ST_AsBinary({geometry_column})::text) NOT IN ({hashList})";
}

return where;
}

Expand Down Expand Up @@ -168,11 +180,9 @@ public static string GetOrderBy(string geometry_column)
return $" ORDER BY ST_Area(ST_Envelope({geometry_column})) DESC";
}

public static List<GeometryRecord> GetGeometries(NpgsqlConnection conn, string shaderColumn, string attributesColumns, string sql, string radiusColumn, string geometry_column = "")
public static List<GeometryRecord> GetGeometries(NpgsqlCommand cmd, string shaderColumn, string attributesColumns, string radiusColumn, string geometry_column = "")
{
var geometries = new List<GeometryRecord>();
conn.Open();
var cmd = new NpgsqlCommand(sql, conn);
var reader = cmd.ExecuteReader();
var attributesColumnIds = new Dictionary<string, int>();
var shadersColumnId = int.MinValue;
Expand Down Expand Up @@ -230,7 +240,6 @@ public static List<GeometryRecord> GetGeometries(NpgsqlConnection conn, string s
}

reader.Close();
conn.Close();
return geometries;
}

Expand Down
Loading