Skip to content

Blog post topic: Encodings #18

@taddyb

Description

@taddyb

Something that has recently intrigued me is the choice of data format for files. Whether it be csv, json, pickle, zarr, netCDF, parquet, arrow, COG, icechunk, there isn't a right or wrong answer, just trade offs for what can and can't be done.

I'd like to dive very deep into the encodings/backends of why these data formats shine in one way or another. Examples include how parquets are column-oriented and save a schema, zarr/icechunk is compressed and chunked, etc.

I think this would be useful to understand why to use something to avoid "using something because everyone else does

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions