Skip to content

Spectrum / Athena Support #22

@norton120

Description

@norton120

Description

During the publish phase we have everything we need to create an external schema in Redshift / register the meta for Athena. Since we know this is in AWS, this would be a hugely powerful addition to current functionality

Pseudocode

parq.register(target="Redshift")

Why?

By registering a schema at publish, this makes the written data immediately queryable via any SQL workbench tool. We should standardize that the external schema is everything in the path leading up to the dataset name, and the table is the dataset name. So for a path
s3://bananabucket/this/is/a/prefix/dataset/id=123/name=steve/asf809dg8jkljsd12.parquet
the external schema to register would be bananabucket_this_is_a_prefix and the table would be
dataset. So querying it via Spectrum / Athena would be
SELECT * FROM bananabucket_this_is_a_prefix.dataset WHERE id > 122 ... WOAH.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions