Direct Partition Management for S3 Sink
Description
Implement direct management of partitions without relying on external crawlers for the S3 sink connector, giving the connector full control over partition registration and discovery.
Tasks
- Implement partition tracking and registration directly in the connector
- Add metadata management for partitions (creation time, record count, size)
- Support partition discovery for existing data
- Provide APIs for partition management
- Implement efficient partition lookup for high-throughput scenarios
- Add partition pruning capabilities for query optimization
- Support partition evolution over time
Technical Details
- Store partition metadata in a dedicated location in S3
- Implement atomic updates to partition metadata
- Use efficient data structures for partition tracking
- Add comprehensive tests for partition management scenarios
- Ensure thread-safe implementation for concurrent partition updates
Acceptance Criteria
- Partitions are correctly tracked and registered without external crawlers
- Partition metadata is accurately maintained
- Existing partitions are discovered correctly
- Performance exceeds Java implementation
- All tests pass including edge cases like partition evolution
Priority
Medium (Priority 5 in GAP analysis)
Complexity
Medium
Direct Partition Management for S3 Sink
Description
Implement direct management of partitions without relying on external crawlers for the S3 sink connector, giving the connector full control over partition registration and discovery.
Tasks
Technical Details
Acceptance Criteria
Priority
Medium (Priority 5 in GAP analysis)
Complexity
Medium