Skip to content

Implement Apache Iceberg Sink #5

@valdo404

Description

@valdo404

Apache Iceberg Sink Implementation

Description

Implement Apache Iceberg table format support for the S3 sink connector, enabling modern data lake capabilities with atomic transactions, schema evolution, and partition management.

Tasks

  • Implement basic Apache Iceberg table format support
  • Add support for Iceberg metadata files and structure
  • Implement commit coordination through Kafka control topics
  • Support table creation and schema evolution
  • Add multi-table fan-out capabilities
  • Implement exactly-once semantics for Iceberg tables
  • Support AWS Glue Catalog integration for Iceberg tables

Technical Details

  • Evaluate existing Rust libraries for Iceberg or implement bindings to Java libraries
  • Implement efficient memory management for Iceberg metadata
  • Support Parquet as the underlying file format
  • Implement Iceberg's transaction protocol for atomic commits
  • Add configuration options matching the original Kafka Connect Iceberg connector

Acceptance Criteria

  • Data can be written to Iceberg tables with proper metadata
  • Atomic transactions are supported
  • Schema evolution works correctly
  • Performance meets or exceeds Java implementation
  • Integration tests pass with AWS Glue and S3

Priority

High (Priority 6 in GAP analysis)

Complexity

Very High

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfeature:icebergFeatures related to Apache Iceberg supportpriority:highHigh priority task that should be addressed in the next release

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions