Feature description
Hi! The Parquet spec now contains GEOMETRY with basic CRS and bbox support 🙂 . The main advantage is that there are now row group-level bounding boxes in files written naively (e.g., by pyarrow) for those who don't know about or whose tools don't support the bbox covering or non-WKB encodings. It would be great to be able to read and/or write that from OGR at some point!
Additional context
The PR for adding this into Parquet C++ was recently merged ( apache/arrow#45459 ), and there are test files available in the parquet-testing repo ( https://github.com/apache/parquet-testing/tree/master/data/geospatial ) and for every item in the (freshly revamped) geoarrow-data repo ( https://github.com/geoarrow/geoarrow-data ). This should be in Arrow 21 (~late June ish), and there's an example of writing an arbitrary GeoPandas data frame + examining statistics here: https://gist.github.com/paleolimbot/1fc8e3de99d11d968e7ac01e0298c297
I imagine the DuckDB mechanism to read GeoParquet will work with little modification when DuckDB supports this...Arrow C++ will hopefully not be too onerous since the concepts in theory align with GeoParquet. The Arrow reader should give you geoarrow.wkb output (with CRS) if you set ArrowReaderProperties::arrow_extensions_enabled(true) AND somebody has registered the GeoArrow extension type, the column will be read as Binary WKB otherwise and the logical type annotation + statistics are still available from the Parquet schema and Column metadata, respectively. There's talk of having arrow extensions enabled by default and to have geoarrow.wkb added as a "canonical extension type" (which would mean it's registered all the time), although I don't know if those will be resolved before Arrow 21 is released.
cc @cholmes, who put me up to this!
Feature description
Hi! The Parquet spec now contains GEOMETRY with basic CRS and bbox support 🙂 . The main advantage is that there are now row group-level bounding boxes in files written naively (e.g., by pyarrow) for those who don't know about or whose tools don't support the bbox covering or non-WKB encodings. It would be great to be able to read and/or write that from OGR at some point!
Additional context
The PR for adding this into Parquet C++ was recently merged ( apache/arrow#45459 ), and there are test files available in the parquet-testing repo ( https://github.com/apache/parquet-testing/tree/master/data/geospatial ) and for every item in the (freshly revamped) geoarrow-data repo ( https://github.com/geoarrow/geoarrow-data ). This should be in Arrow 21 (~late June ish), and there's an example of writing an arbitrary GeoPandas data frame + examining statistics here: https://gist.github.com/paleolimbot/1fc8e3de99d11d968e7ac01e0298c297
I imagine the DuckDB mechanism to read GeoParquet will work with little modification when DuckDB supports this...Arrow C++ will hopefully not be too onerous since the concepts in theory align with GeoParquet. The Arrow reader should give you
geoarrow.wkboutput (with CRS) if you setArrowReaderProperties::arrow_extensions_enabled(true)AND somebody has registered the GeoArrow extension type, the column will be read as Binary WKB otherwise and the logical type annotation + statistics are still available from the Parquet schema and Column metadata, respectively. There's talk of having arrow extensions enabled by default and to havegeoarrow.wkbadded as a "canonical extension type" (which would mean it's registered all the time), although I don't know if those will be resolved before Arrow 21 is released.cc @cholmes, who put me up to this!