Skip to content

Update to PyArrow v21 and native GeoParquet outputs #4601

@zaneselvans

Description

@zaneselvans

Overview

Success Criteria

  • We have consolidated our Parquet & GeoParquet IO Managers into a single generic Parquet IO Manager that can handle both spatial and non-spatial outputs.
  • Our spatial and non-spatial Parquet outputs are easy to read in with Pandas/GeoPandas, DuckDB (with Spatial extensions), R, and Polars.
  • Our spatial and non-spatial Parquet outputs still work with our Kaggle notebooks and their python environment... or we have moved our notebooks to another setup where we can specify a more up to date python environment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dependenciesPull requests that update a dependency filegeospatialSpatial data and transformations. Anything related to mapping.kaggleSharing our data and analysis with the Kaggle communityparquetIssues related to the Apache Parquet file format which we use for long tables.

    Type

    No type

    Projects

    Status

    Blocked

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions