Skip to content

Add geoparquet as an optional file format for the export #4

@datadavev

Description

@datadavev

Geoparquet 12 is a compressed spatial data format that is convenient for consumers and is becoming widely supported.

Task here is to enable geoparquet as an export file format for iSamples.

Tooling for creating geoparquet is still a bit dynamic, but the following approach worked for me (there are likely optimizations that could be done).

  1. Retrieve the records in json lines
  2. Load the jsonlines into geopandas 3
  3. Export from geopandas to geoparquet 4

This worked for me (I could not determine if this requires loading the entire dataset into memory for processing, which may be an issue if using on the server):

import pandas as pd
import geopandas as gpd

src = "smithsonian"
json_src = f"{src}.jsonl"
with open(json_src, "r") as json_file:
    df = pd.read_json(json_file, lines=True)
gdf = gpd.GeoDataFrame(
    df, geometry=gpd.points_from_xy(
      df.producedBy_samplingSite_location_longitude,
      df.producedBy_samplingSite_location_latitude), 
    crs="EPSG:4326"
)
gdf.to_parquet(f"{src}_geo.parquet")

I think dependencies were:

pip install pandas
pip install geopandas
pip install geoarrow-pyarrow geoarrow-pandas

Footnotes

  1. https://geoparquet.org/

  2. https://getindata.com/blog/introducing-geoparquet-data-format/

  3. https://geopandas.org/en/stable/gallery/create_geopandas_from_pandas.html

  4. https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_parquet.html

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions