This ticket tries to capture the disucsion with @steveloughran, @csringhofer, myself and others on #156 (review)
It's been pointed out to me that the coverage matrix doesn't cover statistics/geometry bounding, without which predicate pushdown doesn't work: every rowgroup with the column needs scanning.
The core point as I understand it is that there are several features that must be implemented in software libraries to realize the full benefits of the new Geometry and Geography types in Parquet. Specifically mentioned were
- Logical type annotation (to know what columns hold Geometry and Geography types) <-- this is what the page currently reflects
- Statistics implementation (e.g. the bounding boxes, and potentially different algorithms to compute them)
- Query engine implementation (e.g. using the bounding box statistics to prune parquet files at query time)
There are probably more
Suggestions
One the idea is to add more specific detail on https://parquet.apache.org/docs/file-format/implementationstatus/ .
Perhaps it would be appropriate to add a specific line for the geography/geometry statistics, for example
In addition to making the current implementation status more clear, red X's on the page seems to have the effect of pressuring additional ecosystem adoption.
This ticket tries to capture the disucsion with @steveloughran, @csringhofer, myself and others on #156 (review)
The core point as I understand it is that there are several features that must be implemented in software libraries to realize the full benefits of the new Geometry and Geography types in Parquet. Specifically mentioned were
There are probably more
Suggestions
One the idea is to add more specific detail on https://parquet.apache.org/docs/file-format/implementationstatus/ .
Perhaps it would be appropriate to add a specific line for the geography/geometry statistics, for example
In addition to making the current implementation status more clear, red X's on the page seems to have the effect of pressuring additional ecosystem adoption.