Skip to content

Add more granular Geometry / Geography support to the implementation status page #158

@alamb

Description

@alamb

This ticket tries to capture the disucsion with @steveloughran, @csringhofer, myself and others on #156 (review)

It's been pointed out to me that the coverage matrix doesn't cover statistics/geometry bounding, without which predicate pushdown doesn't work: every rowgroup with the column needs scanning.

The core point as I understand it is that there are several features that must be implemented in software libraries to realize the full benefits of the new Geometry and Geography types in Parquet. Specifically mentioned were

  • Logical type annotation (to know what columns hold Geometry and Geography types) <-- this is what the page currently reflects
  • Statistics implementation (e.g. the bounding boxes, and potentially different algorithms to compute them)
  • Query engine implementation (e.g. using the bounding box statistics to prune parquet files at query time)

There are probably more

Suggestions

One the idea is to add more specific detail on https://parquet.apache.org/docs/file-format/implementationstatus/ .

Image

Perhaps it would be appropriate to add a specific line for the geography/geometry statistics, for example

In addition to making the current implementation status more clear, red X's on the page seems to have the effect of pressuring additional ecosystem adoption.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions