-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Labels
experimentalThis issue is related to an experimental new feature, test, or integrationThis issue is related to an experimental new feature, test, or integrationhelp wantedExtra attention is neededExtra attention is neededicebergThis issue is related to Apache Iceberg catalog supportThis issue is related to Apache Iceberg catalog support
Description
Apache Beam currently supports writing batch or streaming data files (i.e. appends) to Iceberg (see apache/beam#30797) and may eventually support writing delete files to Iceberg (presumably as part of apache/beam#33608).
DeltaCAT's Delete Converter can be used to:
- Treat Beam writes to Iceberg as upserts (today).
- Converting Beam equality deletes to positional/vector deletes (pending support for writing Iceberg equality deletes in Apache Beam).
A native Python Beam IO Connector for DeltaCAT would expand on these capabilities to let Beam/Dataflow users:
- Automatically produce Iceberg positional/vector deletes inline as part of their writes (vs. requiring them to be produced by a separate Iceberg table commit)
- Configure automatic Iceberg positional/vector delete production during synchronization from a DeltaCAT native metastore to Iceberg (which lets users buffer a batch of high-frequency streaming writes to a DeltaCAT native metastore before synchronizing a batch of writes to Iceberg, and thus avoid Iceberg concurrent write conflicts, slower write serialization times, rapid O(N^2) Iceberg table metadata growth, etc.)
- Read directly from, and write directly to, the DeltaCAT metastore (also via Ray Data, Daft, Pandas, Polars, Pyarrow, or NumPy).
Zyiqin-Miranda
Metadata
Metadata
Assignees
Labels
experimentalThis issue is related to an experimental new feature, test, or integrationThis issue is related to an experimental new feature, test, or integrationhelp wantedExtra attention is neededExtra attention is neededicebergThis issue is related to Apache Iceberg catalog supportThis issue is related to Apache Iceberg catalog support