Skip to content

Implement Direct Partition Management for S3 Sink #9

@valdo404

Description

@valdo404

Direct Partition Management for S3 Sink

Description

Implement direct management of partitions without relying on external crawlers for the S3 sink connector, giving the connector full control over partition registration and discovery.

Tasks

  • Implement partition tracking and registration directly in the connector
  • Add metadata management for partitions (creation time, record count, size)
  • Support partition discovery for existing data
  • Provide APIs for partition management
  • Implement efficient partition lookup for high-throughput scenarios
  • Add partition pruning capabilities for query optimization
  • Support partition evolution over time

Technical Details

  • Store partition metadata in a dedicated location in S3
  • Implement atomic updates to partition metadata
  • Use efficient data structures for partition tracking
  • Add comprehensive tests for partition management scenarios
  • Ensure thread-safe implementation for concurrent partition updates

Acceptance Criteria

  • Partitions are correctly tracked and registered without external crawlers
  • Partition metadata is accurately maintained
  • Existing partitions are discovered correctly
  • Performance exceeds Java implementation
  • All tests pass including edge cases like partition evolution

Priority

Medium (Priority 5 in GAP analysis)

Complexity

Medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfeature:s3-sinkFeatures related to the S3 sink connectorpriority:mediumMedium priority task that should be addressed in upcoming releases

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions