Skip to content

Implement Time-Based Partitioning for S3 Sink #7

@valdo404

Description

@valdo404

Time-Based Partitioning for S3 Sink

Description

Implement TimeBasedPartitioner to create Hive-compatible partitions (year/month/day/hour) for the S3 sink connector, matching the capabilities of the original Kafka Connect implementation.

Tasks

  • Implement TimeBasedPartitioner with configurable time formats
  • Support different timestamp extraction methods:
    • Wallclock: System time when record is processed
    • Record: Using Kafka record timestamp
    • RecordField: Extracting timestamp from a field in the record
  • Generate Hive-compatible directory structure (e.g., 'year=2025/month=03/day=11/hour=22/')
  • Add configuration options for partition duration and path format
  • Support custom time zone configurations
  • Implement efficient path generation for high-throughput scenarios

Technical Details

  • Use Rust's chrono library for time handling
  • Ensure thread-safe implementation for concurrent record processing
  • Support all configuration options from original Kafka Connect
  • Add comprehensive tests for different time formats and extraction methods

Acceptance Criteria

  • TimeBasedPartitioner correctly creates Hive-compatible partitions
  • All timestamp extraction methods work correctly
  • Configuration options match the original Kafka Connect implementation
  • Performance exceeds Java implementation
  • All tests pass including edge cases like DST changes

Priority

Very High (Priority 1 in GAP analysis)

Complexity

Medium

Release Target

v0.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfeature:s3-sinkFeatures related to the S3 sink connectorpriority:highHigh priority task that should be addressed in the next release

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions