Skip to content

Commit deb53e9

Browse files
authored
Distributed hive-style join testing with superset satisfaction (#256)
* Added join testdata and method to generate parquet from csv. * Added basic single node join test for comparison. * Use hive partitioning. * Results are the same with hive partitioning and Gene's PR. * Fixed configs and achieved optimal distributed plan. * Refactoring, adding comments. * Added check to ensure optimal plan is achieved. * Update based on Nga's comments. * Added second test. * Added third test. * Refactor. * Add ORDER BY to queries and switch to snapshot testing. * Nulls last instead of nulls first. * Fix tests.
1 parent 15c5f7a commit deb53e9

File tree

19 files changed

+734
-378
lines changed

19 files changed

+734
-378
lines changed

Cargo.lock

Lines changed: 329 additions & 378 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
env,service,host
2+
dev,log,host-y
3+
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
env,service,host
2+
prod,log,host-x
3+
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
env,service,host
2+
dev,trace,host-z
3+
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
env,service,host
2+
prod,trace,host-x
3+
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
timestamp,value
2+
2023-01-01T09:00:00,95.5
3+
2023-01-01T09:00:10,102.3
4+
2023-01-01T09:00:20,98.7
5+
2023-01-01T09:12:20,105.1
6+
2023-01-01T09:12:30,100.0
7+
2023-01-01T09:12:40,150.0
8+
2023-01-01T09:12:50,120.8
9+
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
timestamp,value
2+
2023-01-01T09:00:00,75.2
3+
2023-01-01T09:00:10,82.4
4+
2023-01-01T09:00:20,78.9
5+
2023-01-01T09:00:30,85.6
6+
2023-01-01T09:12:30,80.0
7+
2023-01-01T09:12:40,120.0
8+
2023-01-01T09:12:50,92.3
9+
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
timestamp,value
2+
2023-01-01T10:00:00,310.5
3+
2023-01-01T10:00:10,225.7
4+
2023-01-01T10:00:20,380.2
5+
2023-01-01T10:00:30,205.8
6+
2023-01-01T10:00:40,350.0
7+
2023-01-01T10:12:40,200.0
8+
2023-01-01T10:12:50,205.4
9+
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
timestamp,value
2+
2023-01-01T10:00:00,24.8
3+
2023-01-01T10:00:10,72.1
4+
2023-01-01T10:00:20,42.5
5+
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
-- datafusion-cli -f testdata/join/generate_parquet_from_csv.sql
2+
3+
-- Generate parquet dim files from csv files.
4+
COPY (SELECT * FROM "testdata/join/csv/dim/d_dkey=A/data0.csv")
5+
TO "testdata/join/parquet/dim/d_dkey=A/data0.parquet"
6+
STORED AS PARQUET;
7+
8+
COPY (SELECT * FROM "testdata/join/csv/dim/d_dkey=B/data0.csv")
9+
TO "testdata/join/parquet/dim/d_dkey=B/data0.parquet"
10+
STORED AS PARQUET;
11+
12+
COPY (SELECT * FROM "testdata/join/csv/dim/d_dkey=C/data0.csv")
13+
TO "testdata/join/parquet/dim/d_dkey=C/data0.parquet"
14+
STORED AS PARQUET;
15+
16+
COPY (SELECT * FROM "testdata/join/csv/dim/d_dkey=D/data0.csv")
17+
TO "testdata/join/parquet/dim/d_dkey=D/data0.parquet"
18+
STORED AS PARQUET;
19+
20+
-- Generate parquet fact files from csv files.
21+
COPY (SELECT * FROM "testdata/join/csv/fact/f_dkey=A/data0.csv")
22+
TO "testdata/join/parquet/fact/f_dkey=A/data0.parquet"
23+
STORED AS PARQUET;
24+
25+
COPY (SELECT * FROM "testdata/join/csv/fact/f_dkey=B/data0.csv")
26+
TO "testdata/join/parquet/fact/f_dkey=B/data0.parquet"
27+
STORED AS PARQUET;
28+
29+
COPY (SELECT * FROM "testdata/join/csv/fact/f_dkey=C/data0.csv")
30+
TO "testdata/join/parquet/fact/f_dkey=C/data0.parquet"
31+
STORED AS PARQUET;
32+
33+
COPY (SELECT * FROM "testdata/join/csv/fact/f_dkey=D/data0.csv")
34+
TO "testdata/join/parquet/fact/f_dkey=D/data0.parquet"
35+
STORED AS PARQUET;
36+

0 commit comments

Comments
 (0)