Separate data by topic/stream_key into distinct folders #11

Copilot · 2025-11-13T07:48:39Z

Data from different Kafka topics and Redis streams was being written to the same directory, making organization and querying inefficient.

Changes

Sources

Kafka: Group messages by topic, add topic to RecordBatch schema metadata
Redis: Group messages by stream_key/list_key, add stream_key to schema metadata

Sinks (S3, HDFS, Local)

Extract topic/stream_key from batch metadata during flush
Create separate folders: prefix/{topic}/data_*.parquet
Maintains compatibility with partitioning: prefix/{topic}/{partition}/data_*.parquet

Folder Structure

Before:

s3://bucket/raw-data/
  data_20241113_120000_000001.parquet  # mixed topics
  data_20241113_120001_000002.parquet

After:

s3://bucket/raw-data/
  events/
    date=2024-01-01/data_*.parquet
  logs/
    date=2024-01-01/data_*.parquet
  metrics/
    data_*.parquet

Backward Compatibility

Batches without topic/stream_key metadata write to base directory as before. No configuration changes required.

Original prompt

This section details on the original issue you should resolve

<issue_title>Sink should store data from different topics/stream_keys separately in different folders</issue_title>
<issue_description></issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Sink should store data from different topics/stream_keys separately in different folders #10

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: slhmy <31381093+slhmy@users.noreply.github.com>

Initial plan

25f0e08

Copilot AI assigned Copilot and slhmy Nov 13, 2025

Copilot started work on behalf of slhmy November 13, 2025 07:49 View session

Copilot AI and others added 3 commits November 13, 2025 08:03

Implement topic/stream_key folder separation for data sinks

0989015

Co-authored-by: slhmy <31381093+slhmy@users.noreply.github.com>

Apply linting and formatting

3d4281d

Co-authored-by: slhmy <31381093+slhmy@users.noreply.github.com>

Add folder structure comments to example configs

ecd7db2

Co-authored-by: slhmy <31381093+slhmy@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add functionality to store data from different topics separately~~ Separate data by topic/stream_key into distinct folders Nov 13, 2025

Copilot AI requested a review from slhmy November 13, 2025 08:12

Copilot finished work on behalf of slhmy November 13, 2025 08:12

slhmy marked this pull request as ready for review November 13, 2025 08:16

slhmy merged commit 3378b6f into main Nov 13, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate data by topic/stream_key into distinct folders #11

Separate data by topic/stream_key into distinct folders #11

Uh oh!

Copilot AI commented Nov 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Separate data by topic/stream_key into distinct folders #11

Separate data by topic/stream_key into distinct folders #11

Uh oh!

Conversation

Copilot AI commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Folder Structure

Backward Compatibility

Comments on the Issue (you are @copilot in this section)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 13, 2025 •

edited

Loading