Skip to content

Add examples for how to use DuckDB to query geoparquet files #16

@rdhyee

Description

@rdhyee

I managed to export the Smithonsian records with

isample export -j $ISAMPLES_JWT -f geoparquet -d /tmp -q 'source:SMITHSONIAN'

Next challenge is how to do simple spatial queries. I installed duckdb on my Mac:

brew install duckdb

I installed duckdb extensions as suggested by cholmes/duckdb-geoparquet-tutorials:

INSTALL spatial;
LOAD spatial;
INSTALL httpfs;
LOAD httpfs;

I then

CREATE TABLE geoparquet_data AS SELECT * FROM read_parquet('isamples_export_2024_06_07_07_40_00_geo.parquet');
DESCRIBE geoparquet_data
  ;

to get

┌──────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├──────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ sample_identifier │ VARCHAR │ YES │ │ │ │
│ label │ VARCHAR │ YES │ │ │ │
│ description │ VARCHAR │ YES │ │ │ │
│ source_collection │ VARCHAR │ YES │ │ │ │
│ has_specimen_categ… │ STRUCT("label" VARCHAR)[] │ YES │ │ │ │
│ has_material_categ… │ STRUCT("label" VARCHAR)[] │ YES │ │ │ │
│ has_context_category │ STRUCT("label" VARCHAR)[] │ YES │ │ │ │
│ keywords │ STRUCT(keyword VARCHAR)[] │ YES │ │ │ │
│ produced_by │ STRUCT(description VARCHAR, has_feature_of_interest VARCHAR, identifier VARCHAR, "label" VARCHAR, responsibility STRUCT("name" VARCHAR, "role" VARCHAR)[], result_time VARCHAR, sampling_site STRUCT(desc… │ YES │ │ │ │
│ registrant │ STRUCT("name" VARCHAR) │ YES │ │ │ │
│ sample_location_lo… │ DOUBLE │ YES │ │ │ │
│ sample_location_la… │ DOUBLE │ YES │ │ │ │
│ geometry │ BLOB │ YES │ │ │ │
├──────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┤
│ 13 rows 6 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

but I'm trying to a spatial extent query:

D SELECT ST_Extent(geometry) AS extent FROM geoparquet_data;
Binder Error: No function matches the given name and argument types 'ST_Extent(BLOB)'. You might need to add explicit type casts.
	Candidate functions:
	ST_Extent(GEOMETRY) -> BOX_2D

LINE 1: SELECT ST_Extent(geometry) AS extent FROM geop...
               ^
D SELECT ST_Extent(CAST(geometry AS BOX_2D)) AS extent FROM geoparquet_data;
Conversion Error: Unimplemented type for cast (BLOB -> BOX_2D)
LINE 1: SELECT ST_Extent(CAST(geometry AS BOX_2D)) AS extent FRO...
                         ^
D SELECT ST_Extent(CAST(geometry AS GEOMETRY)) AS extent FROM geoparquet_data;
Conversion Error: Unimplemented type for cast (BLOB -> GEOMETRY)
LINE 1: SELECT ST_Extent(CAST(geometry AS GEOMETRY)) AS extent F...

How to fix?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions