Conversation
|
We encountered an interesting problem: what happens when users want to define a watermark on this table? If we directly apply the ideas from this RFC, because the order of data inserted into the table by batch queries is unordered, unexpected records will likely be expired and deleted by the watermark. I think there are a few possibilities in my thoughts.
|
|
The 3rd proposal I slightly prefer the 2nd proposal |
xxchan
left a comment
There was a problem hiding this comment.
Some new ideas by @st1page from https://risingwave-labs.slack.com/archives/C07CU2YBKCG/p1721184731055789
Since we are adding batch read function risingwavelabs/risingwave#17673, we can combine a batch query with a connector. Then we don't need a batch source and no need to ALTER TABLE any more.
tentative syntax:
CREATE TABLE orders (
order_id INT,
customer_name VARCHAR,
data JSONB,
PRIMARY KEY (order_id, customer_name)
) INITIAL WITH SELECT * FROM file_scan(
'parquet',
's3',
'ap-southeast-2',
'xxxxxxxxxx',
'yyyyyyyy',
's3://your-bucket/path/to/*'
);
WITH (
connector = 'kinesis',
stream = 'wkx-dynamo-orders',
scan.startup.mode='earliest',
aws.region = 'us-east-1',
kinesis.credentials.access = 'ABCDEFG',
kinesis.credentials.secret = 'abcdefg',
) FORMAT DYNAMODB_CDC ENCODE JSON;@xiangjinwu : Can be achieved by pause_on_create + insert into t select + resume
Is it just Here, taking your example, the columns in the table definition and the columns in |
Yes, I asked the same question. 😄 @st1page feels for the specific needs, the syntax CTAS is weird, so he wants to introduce a separated syntax. Specifically,
|
Order is not that important when processing historical data. Particularly, considering multiple parallelism, the order might be less useful to users.
I feel Hmmm, overall, I feel this is not better than the idea of |
LGTM. detailly we need
And we might need to discuss about insert statment on the source later. |
Migrated from Notion.
Preview: https://github.com/risingwavelabs/rfcs/blob/eric/iceberg_source/rfcs/0085-iceberg-source.md