Skip to content

Enhance S3 Upload Management and Flush Options #6

@valdo404

Description

@valdo404

S3 Upload Management and Flush Options

Description

Enhance S3 upload management with configurable flush strategies and atomic file operations to ensure data consistency and optimal performance.

Tasks

  • Implement configurable flush strategies based on:
    • Record count ()
    • Time-based intervals ()
    • File size thresholds
  • Add support for atomic file operations to prevent partial uploads
  • Implement proper error handling and retries for S3 operations
  • Add backpressure mechanisms for memory management
  • Support configurable S3 client options (region, credentials, endpoints)
  • Implement efficient buffering strategies to minimize memory usage

Technical Details

  • Use the Rust AWS SDK for S3 operations
  • Implement memory-efficient buffering to outperform Java implementation
  • Add metrics for monitoring flush operations and performance
  • Support both synchronous and asynchronous flush modes
  • Implement proper cleanup of temporary files

Acceptance Criteria

  • All flush strategies work correctly and respect configured thresholds
  • S3 uploads are atomic and consistent
  • Performance exceeds Java implementation
  • Memory usage remains low even with large record batches
  • All tests pass including edge cases like network failures

Priority

High (Priority 3 in GAP analysis)

Complexity

Medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfeature:s3-sinkFeatures related to the S3 sink connectorpriority:highHigh priority task that should be addressed in the next release

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions