From 4f35d20529c9ace0b1bd0a8bd8c9381368f21312 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Thu, 13 Nov 2025 12:21:22 +0100 Subject: [PATCH 01/22] new ssb_benchmark branch --- scripts/ssb/ReadMe.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 scripts/ssb/ReadMe.md diff --git a/scripts/ssb/ReadMe.md b/scripts/ssb/ReadMe.md new file mode 100644 index 00000000000..4baf04727c4 --- /dev/null +++ b/scripts/ssb/ReadMe.md @@ -0,0 +1 @@ +Hello world. \ No newline at end of file From f59790f849962fc21a4c58ce85e6ce7aabc15d8d Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Thu, 13 Nov 2025 12:44:45 +0100 Subject: [PATCH 02/22] Update first ReadMe. --- scripts/ssb/ReadMe.md | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/scripts/ssb/ReadMe.md b/scripts/ssb/ReadMe.md index 4baf04727c4..6681ff5dedc 100644 --- a/scripts/ssb/ReadMe.md +++ b/scripts/ssb/ReadMe.md @@ -1 +1,34 @@ -Hello world. \ No newline at end of file +# Star Schema Benchmark (SSB) for SystemDS [SystemDS-3862](https://issues.apache.org/jira/browse/SYSTEMDS-3862) + + +## Foundation: +- There are [13 queries already written in SQL](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-queries). +- There are existing DML relational algebra operators raSelect(), raJoin() and raGroupBy(). +- Our task is to implement the DML version of these queries to run them in SystemDS. +- There are existing DML query implementations ([Git request](https://github.com/apache/systemds/pull/2280) and [code](https://github.com/ghafek/systemds/tree/feature/ssb-benchmark/scripts/ssb)) of the previous group which are a bit slow and contain errors. + +## Setup + +- First, install [Docker](https://docs.docker.com/get-started/get-docker/) and its necessary libraries. + + For Ubuntu, there is the [following tutorial using apt repository](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository). You can add [Docker Desktop](https://docs.docker.com/desktop/setup/install/linux/ubuntu/), too. + + +- Now, follow the tutorials to install Docker versions of database systems [SystemDS](https://apache.github.io/systemds/site/docker), [PostgreSQL](https://hub.docker.com/_/postgres), .... + + +If the example in the SystemDS link does not work, use that code line instead. Create a DML file, open its directory and execute the code. +``` +docker run -it --rm -v $PWD:/scripts apache/systemds -f /scripts/[file_name].dml +# Example +docker run -it --rm -v $PWD:/scripts apache/systemds -f /scripts/hello.dml +``` +--- SSB ... + +## General steps +- Prepare the setup. +- Translate/rewrite the queries into DML language to run them in SystemDS. +- Therefore, we should use these relational algebra operators in DML. +- Use [SSB generator](https://github.com/eyalroz/ssb-dbgen) to generate data. +- Run ssh scripts for experiments in the selected database systems. Use also scale factors. +- Compare the runtime of each query in each system. From a6b79092cd019e043e73bcd6139f024ea70ac204 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Tue, 18 Nov 2025 20:17:56 +0100 Subject: [PATCH 03/22] Basic Join example not working --- .gitignore | 3 + scripts/ssb/ReadMe.md | 17 +++-- scripts/ssb/queries/simple_join_example.dml | 84 +++++++++++++++++++++ scripts/ssb/sql/q1.1.sql | 23 ++++++ scripts/ssb/sql/q1.2.sql | 23 ++++++ scripts/ssb/sql/q1.3.sql | 25 ++++++ scripts/ssb/sql/q2.1.sql | 26 +++++++ scripts/ssb/sql/q2.2.sql | 26 +++++++ scripts/ssb/sql/q2.3.sql | 26 +++++++ scripts/ssb/sql/q3.1.sql | 32 ++++++++ scripts/ssb/sql/q3.2.sql | 32 ++++++++ scripts/ssb/sql/q3.3.sql | 38 ++++++++++ scripts/ssb/sql/q3.4.sql | 37 +++++++++ scripts/ssb/sql/q4.1.sql | 34 +++++++++ scripts/ssb/sql/q4.2.sql | 39 ++++++++++ scripts/ssb/sql/q4.3.sql | 35 +++++++++ 16 files changed, 495 insertions(+), 5 deletions(-) create mode 100644 scripts/ssb/queries/simple_join_example.dml create mode 100644 scripts/ssb/sql/q1.1.sql create mode 100644 scripts/ssb/sql/q1.2.sql create mode 100644 scripts/ssb/sql/q1.3.sql create mode 100644 scripts/ssb/sql/q2.1.sql create mode 100644 scripts/ssb/sql/q2.2.sql create mode 100644 scripts/ssb/sql/q2.3.sql create mode 100644 scripts/ssb/sql/q3.1.sql create mode 100644 scripts/ssb/sql/q3.2.sql create mode 100644 scripts/ssb/sql/q3.3.sql create mode 100644 scripts/ssb/sql/q3.4.sql create mode 100644 scripts/ssb/sql/q4.1.sql create mode 100644 scripts/ssb/sql/q4.2.sql create mode 100644 scripts/ssb/sql/q4.3.sql diff --git a/.gitignore b/.gitignore index d2fcdb9a4de..64ab02ef165 100644 --- a/.gitignore +++ b/.gitignore @@ -74,6 +74,9 @@ docs/_site # TODO Make the API auto generate and relocate into this api folder for webpage # docs/api +# Input dataset +scripts/ssb/data + # Test Artifacts src/test/scripts/**/*.dmlt src/test/scripts/functions/mlcontextin/ diff --git a/scripts/ssb/ReadMe.md b/scripts/ssb/ReadMe.md index 6681ff5dedc..2d3f262a983 100644 --- a/scripts/ssb/ReadMe.md +++ b/scripts/ssb/ReadMe.md @@ -9,21 +9,28 @@ ## Setup -- First, install [Docker](https://docs.docker.com/get-started/get-docker/) and its necessary libraries. +1. First, install [Docker](https://docs.docker.com/get-started/get-docker/) and its necessary libraries. For Ubuntu, there is the [following tutorial using apt repository](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository). You can add [Docker Desktop](https://docs.docker.com/desktop/setup/install/linux/ubuntu/), too. -- Now, follow the tutorials to install Docker versions of database systems [SystemDS](https://apache.github.io/systemds/site/docker), [PostgreSQL](https://hub.docker.com/_/postgres), .... +2. Now, follow the tutorials to install Docker versions of database systems [SystemDS](https://apache.github.io/systemds/site/docker), [PostgreSQL](https://hub.docker.com/_/postgres), .... If the example in the SystemDS link does not work, use that code line instead. Create a DML file, open its directory and execute the code. ``` -docker run -it --rm -v $PWD:/scripts apache/systemds -f /scripts/[file_name].dml +docker run -it --rm -v $PWD:/scripts apache/systemds:nightly -f /scripts/[file_name].dml # Example -docker run -it --rm -v $PWD:/scripts apache/systemds -f /scripts/hello.dml +docker run -it --rm -v $PWD:/scripts apache/systemds:nightly -f /scripts/hello.dml ``` ---- SSB ... +3. Clone the git repository of [ssb-dbgen (SSB data set generator)](https://github.com/eyalroz/ssb-dbgen/tree/master) and generate data with it. +``` +# Build the generator +cmake -B ./build && cmake --build ./build +# Run the generator (with -s ) +build/dbgen -b dists.dss -v -s 1 +``` +For more options look into the original documentation. ## General steps - Prepare the setup. diff --git a/scripts/ssb/queries/simple_join_example.dml b/scripts/ssb/queries/simple_join_example.dml new file mode 100644 index 00000000000..02bbeb0d97b --- /dev/null +++ b/scripts/ssb/queries/simple_join_example.dml @@ -0,0 +1,84 @@ +/* +docker run -it --rm -v $PWD:/scripts/ apache/systemds -f /scripts/queries/hello6.dml -nvargs input_dir="/scripts/data" +WARNING: Using incubator modules: jdk.incubator.vector +Hello SystemDS! +Hello, World! +Loading tables from directory: /scripts/data +SystemDS Statistics: +Total execution time: 1.992 sec. + + +An Error Occurred : + DMLRuntimeException -- org.apache.sysds.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 8 and 58 -- Error evaluating instruction: CP°FCall°./scripts/builtin/raJoin.dml°m_raJoin°true°5°1°A=_mVar88·MATRIX·FP64°colA=1·SCALAR·INT64·true°B=_mVar102·MATRIX·FP64°colB=1·SCALAR·INT64·true°method=hash·SCALAR·STRING·true°joined_matrix + DMLRuntimeException -- ERROR: Runtime error in program block generated from statement block between lines 8 and 58 -- Error evaluating instruction: CP°FCall°./scripts/builtin/raJoin.dml°m_raJoin°true°5°1°A=_mVar88·MATRIX·FP64°colA=1·SCALAR·INT64·true°B=_mVar102·MATRIX·FP64°colB=1·SCALAR·INT64·true°method=hash·SCALAR·STRING·true°joined_matrix + DMLRuntimeException -- error executing function ./scripts/builtin/raJoin.dml::m_raJoin + DMLRuntimeException -- ERROR: Runtime error in function program block generated from function statement block between lines 39 and 222 -- Error evaluating function program block + DMLRuntimeException -- ERROR: Runtime error in program block generated from statement block between lines 137 and 145 -- Error evaluating instruction: CP°ba+*°_mVar169·MATRIX·FP64°A·MATRIX·FP64°_mVar170·MATRIX·FP64°8 + RuntimeException -- Dimensions do not match for matrix multiplication (2496!=2557). + +*/ +#Start in systemds/scripts/ssb +#docker run -it -v $PWD:/scripts/ apache/systemds -f /scripts/queries/hello3.dml -nvargs input_dir="/scripts/data" + +#Run and delete the container immediately. +#docker run -it --rm -v $PWD:/scripts/ apache/systemds -f /scripts/queries/hello3.dml -nvargs input_dir="/scripts/data" + +print("Hello SystemDS!") +print("Hello, World!") + +/* DML-script implementing the ssb query Q1.1 in SystemDS. +**input_dir="/scripts/ssb/data" +SELECT COUNT(*) +FROM lineorder, date +WHERE + lo_orderdate = d_datekey + AND lo_quantity > 25; + +Usage: (We did not use here) +./bin/systemds scripts/ssb/queries/q1_1.dml -nvargs input_dir="/path/to/data" +./bin/systemds scripts/ssb/queries/q1_1.dml -nvargs input_dir="/Users/ghafekalsaho/Desktop/data" +or with explicit -f flag: +./bin/systemds -f scripts/ssb/queries/q1_1.dml -nvargs input_dir="/path/to/data" + +Parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ +# -- SOURCING THE RA-FUNCTIONS -- +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin + +# -- PARAMETER HANDLING -- +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); +#input_dir = ifdef($input_dir, "./data"); +#print("Loading tables from directory: " + input_dir); + +# -- READING INPUT FILES -- +# CSV TABLES +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +# -- PREPARING -- +# EXTRACTING MINIMAL DATE DATA TO OPTIMIZE RUNTIME => COL-1 : DATE-KEY | COL-5 : YEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# EXTRACTING MINIMAL LINEORDER DATA TO OPTIMIZE RUNTIME => COL-6 : LO_ORDERDATE | +# COL-9 : LO_QUANTITY +lineorder_csv_min = cbind(lineorder_csv[, 16], lineorder_csv[, 6], lineorder_csv[, 9]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# LO_QUANTITY < 25 +lo_quan_filt = raSel::m_raSelection(lineorder_matrix_min, col=3, op="<", val=25); + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- +# JOINING FILTERED LINEORDER TABLE WITH FILTERED DATE TABLE WHERE LO_ORDERDATE = D_DATEKEY +joined_matrix = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_quan_filt, colB=1, method="hash"); +print("LO-DATE JOINED."); + +count = nrow(joined_matrix[,1]) +#print("COUNT: " + count) +print("COUNT: " + as.integer(count)) + +#print("Hello6 finished.\n"); + diff --git a/scripts/ssb/sql/q1.1.sql b/scripts/ssb/sql/q1.1.sql new file mode 100644 index 00000000000..d8a2840ca72 --- /dev/null +++ b/scripts/ssb/sql/q1.1.sql @@ -0,0 +1,23 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE +FROM lineorder, dates +WHERE + lo_orderdate = d_datekey + AND d_year = 1993 + AND lo_discount BETWEEN 1 AND 3 + AND lo_quantity < 25; \ No newline at end of file diff --git a/scripts/ssb/sql/q1.2.sql b/scripts/ssb/sql/q1.2.sql new file mode 100644 index 00000000000..db6eb0c613a --- /dev/null +++ b/scripts/ssb/sql/q1.2.sql @@ -0,0 +1,23 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE +FROM lineorder, dates +WHERE + lo_orderdate = d_datekey + AND d_yearmonth = 'Jan1994' + AND lo_discount BETWEEN 4 AND 6 + AND lo_quantity BETWEEN 26 AND 35; \ No newline at end of file diff --git a/scripts/ssb/sql/q1.3.sql b/scripts/ssb/sql/q1.3.sql new file mode 100644 index 00000000000..dbb91b0c46f --- /dev/null +++ b/scripts/ssb/sql/q1.3.sql @@ -0,0 +1,25 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT + SUM(lo_extendedprice * lo_discount) AS REVENUE +FROM lineorder, dates +WHERE + lo_orderdate = d_datekey + AND d_weeknuminyear = 6 + AND d_year = 1994 + AND lo_discount BETWEEN 5 AND 7 + AND lo_quantity BETWEEN 26 AND 35; \ No newline at end of file diff --git a/scripts/ssb/sql/q2.1.sql b/scripts/ssb/sql/q2.1.sql new file mode 100644 index 00000000000..70a8de9d42e --- /dev/null +++ b/scripts/ssb/sql/q2.1.sql @@ -0,0 +1,26 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT SUM(lo_revenue), d_year, p_brand +FROM lineorder, dates, part, supplier +WHERE + lo_orderdate = d_datekey + AND lo_partkey = p_partkey + AND lo_suppkey = s_suppkey + AND p_category = 'MFGR#12' + AND s_region = 'AMERICA' +GROUP BY d_year, p_brand +ORDER BY p_brand; \ No newline at end of file diff --git a/scripts/ssb/sql/q2.2.sql b/scripts/ssb/sql/q2.2.sql new file mode 100644 index 00000000000..e283dbdb059 --- /dev/null +++ b/scripts/ssb/sql/q2.2.sql @@ -0,0 +1,26 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT SUM(lo_revenue), d_year, p_brand +FROM lineorder, dates, part, supplier +WHERE + lo_orderdate = d_datekey + AND lo_partkey = p_partkey + AND lo_suppkey = s_suppkey + AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228' + AND s_region = 'ASIA' +GROUP BY d_year, p_brand +ORDER BY d_year, p_brand; \ No newline at end of file diff --git a/scripts/ssb/sql/q2.3.sql b/scripts/ssb/sql/q2.3.sql new file mode 100644 index 00000000000..22d2419621c --- /dev/null +++ b/scripts/ssb/sql/q2.3.sql @@ -0,0 +1,26 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT SUM(lo_revenue), d_year, p_brand +FROM lineorder, dates, part, supplier +WHERE + lo_orderdate = d_datekey + AND lo_partkey = p_partkey + AND lo_suppkey = s_suppkey + AND p_brand = 'MFGR#2239' + AND s_region = 'EUROPE' +GROUP BY d_year, p_brand +ORDER BY d_year, p_brand; \ No newline at end of file diff --git a/scripts/ssb/sql/q3.1.sql b/scripts/ssb/sql/q3.1.sql new file mode 100644 index 00000000000..d6743379958 --- /dev/null +++ b/scripts/ssb/sql/q3.1.sql @@ -0,0 +1,32 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT + c_nation, + s_nation, + d_year, + SUM(lo_revenue) AS REVENUE +FROM customer, lineorder, supplier, dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_orderdate = d_datekey + AND c_region = 'ASIA' + AND s_region = 'ASIA' + AND d_year >= 1992 + AND d_year <= 1997 +GROUP BY c_nation, s_nation, d_year +ORDER BY d_year ASC, REVENUE DESC; \ No newline at end of file diff --git a/scripts/ssb/sql/q3.2.sql b/scripts/ssb/sql/q3.2.sql new file mode 100644 index 00000000000..2969efb1a2f --- /dev/null +++ b/scripts/ssb/sql/q3.2.sql @@ -0,0 +1,32 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT + c_city, + s_city, + d_year, + SUM(lo_revenue) AS REVENUE +FROM customer, lineorder, supplier, dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_orderdate = d_datekey + AND c_nation = 'UNITED STATES' + AND s_nation = 'UNITED STATES' + AND d_year >= 1992 + AND d_year <= 1997 +GROUP BY c_city, s_city, d_year +ORDER BY d_year ASC, REVENUE DESC; \ No newline at end of file diff --git a/scripts/ssb/sql/q3.3.sql b/scripts/ssb/sql/q3.3.sql new file mode 100644 index 00000000000..ac1cb324d09 --- /dev/null +++ b/scripts/ssb/sql/q3.3.sql @@ -0,0 +1,38 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT + c_city, + s_city, + d_year, + SUM(lo_revenue) AS REVENUE +FROM customer, lineorder, supplier, dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_orderdate = d_datekey + AND ( + c_city = 'UNITED KI1' + OR c_city = 'UNITED KI5' + ) + AND ( + s_city = 'UNITED KI1' + OR s_city = 'UNITED KI5' + ) + AND d_year >= 1992 + AND d_year <= 1997 +GROUP BY c_city, s_city, d_year +ORDER BY d_year ASC, REVENUE DESC; \ No newline at end of file diff --git a/scripts/ssb/sql/q3.4.sql b/scripts/ssb/sql/q3.4.sql new file mode 100644 index 00000000000..2be6a5cd70a --- /dev/null +++ b/scripts/ssb/sql/q3.4.sql @@ -0,0 +1,37 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT + c_city, + s_city, + d_year, + SUM(lo_revenue) AS REVENUE +FROM customer, lineorder, supplier, dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_orderdate = d_datekey + AND ( + c_city = 'UNITED KI1' + OR c_city = 'UNITED KI5' + ) + AND ( + s_city = 'UNITED KI1' + OR s_city = 'UNITED KI5' + ) + AND d_yearmonth = 'Dec1997' +GROUP BY c_city, s_city, d_year +ORDER BY d_year ASC, REVENUE DESC; \ No newline at end of file diff --git a/scripts/ssb/sql/q4.1.sql b/scripts/ssb/sql/q4.1.sql new file mode 100644 index 00000000000..d6efe570a37 --- /dev/null +++ b/scripts/ssb/sql/q4.1.sql @@ -0,0 +1,34 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT + d_year, + c_nation, + SUM(lo_revenue - lo_supplycost) AS PROFIT +FROM dates, customer, supplier, part, lineorder +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_partkey = p_partkey + AND lo_orderdate = d_datekey + AND c_region = 'AMERICA' + AND s_region = 'AMERICA' + AND ( + p_mfgr = 'MFGR#1' + OR p_mfgr = 'MFGR#2' + ) +GROUP BY d_year, c_nation +ORDER BY d_year, c_nation; diff --git a/scripts/ssb/sql/q4.2.sql b/scripts/ssb/sql/q4.2.sql new file mode 100644 index 00000000000..c2f1a0ffddd --- /dev/null +++ b/scripts/ssb/sql/q4.2.sql @@ -0,0 +1,39 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT + d_year, + s_nation, + p_category, + SUM(lo_revenue - lo_supplycost) AS PROFIT +FROM dates, customer, supplier, part, lineorder +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_partkey = p_partkey + AND lo_orderdate = d_datekey + AND c_region = 'AMERICA' + AND s_region = 'AMERICA' + AND ( + d_year = 1997 + OR d_year = 1998 + ) + AND ( + p_mfgr = 'MFGR#1' + OR p_mfgr = 'MFGR#2' + ) +GROUP BY d_year, s_nation, p_category +ORDER BY d_year, s_nation, p_category; diff --git a/scripts/ssb/sql/q4.3.sql b/scripts/ssb/sql/q4.3.sql new file mode 100644 index 00000000000..f593a10291b --- /dev/null +++ b/scripts/ssb/sql/q4.3.sql @@ -0,0 +1,35 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"); you may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +SELECT + d_year, + s_city, + p_brand, + SUM(lo_revenue - lo_supplycost) AS PROFIT +FROM dates, customer, supplier, part, lineorder +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_partkey = p_partkey + AND lo_orderdate = d_datekey + AND s_nation = 'UNITED STATES' + AND ( + d_year = 1997 + OR d_year = 1998 + ) + AND p_category = 'MFGR#14' +GROUP BY d_year, s_city, p_brand +ORDER BY d_year, s_city, p_brand; From 360e2ad4dcc4ae90d8074570c0deb9604a2504c2 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Tue, 18 Nov 2025 20:22:30 +0100 Subject: [PATCH 04/22] Basic Join example not working 1 --- scripts/ssb/queries/simple_join_example.dml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/scripts/ssb/queries/simple_join_example.dml b/scripts/ssb/queries/simple_join_example.dml index 02bbeb0d97b..a8014fcec85 100644 --- a/scripts/ssb/queries/simple_join_example.dml +++ b/scripts/ssb/queries/simple_join_example.dml @@ -1,5 +1,5 @@ /* -docker run -it --rm -v $PWD:/scripts/ apache/systemds -f /scripts/queries/hello6.dml -nvargs input_dir="/scripts/data" +docker run -it --rm -v $PWD:/scripts/ apache/systemds -f /scripts/queries/simple_join_example.dml -nvargs input_dir="/scripts/data" WARNING: Using incubator modules: jdk.incubator.vector Hello SystemDS! Hello, World! @@ -18,10 +18,10 @@ An Error Occurred : */ #Start in systemds/scripts/ssb -#docker run -it -v $PWD:/scripts/ apache/systemds -f /scripts/queries/hello3.dml -nvargs input_dir="/scripts/data" +#docker run -it -v $PWD:/scripts/ apache/systemds -f /scripts/queries/simple_join_example.dml -nvargs input_dir="/scripts/data" #Run and delete the container immediately. -#docker run -it --rm -v $PWD:/scripts/ apache/systemds -f /scripts/queries/hello3.dml -nvargs input_dir="/scripts/data" +#docker run -it --rm -v $PWD:/scripts/ apache/systemds -f /scripts/queries/simple_join_example.dml -nvargs input_dir="/scripts/data" print("Hello SystemDS!") print("Hello, World!") @@ -80,5 +80,5 @@ count = nrow(joined_matrix[,1]) #print("COUNT: " + count) print("COUNT: " + as.integer(count)) -#print("Hello6 finished.\n"); +#print("Helsimple_join_example finished.\n"); From 797214d3bbf3128743a02ccc640da0d0b32fdfd0 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Tue, 25 Nov 2025 16:02:39 +0100 Subject: [PATCH 05/22] q1_1 works on the dataset with scale factor 0.1. --- scripts/ssb/queries/q1_1.dml | 89 ++++++++++++++++++++++++++++++++++++ scripts/ssb/sql/q1.1.sql | 5 +- 2 files changed, 92 insertions(+), 2 deletions(-) create mode 100644 scripts/ssb/queries/q1_1.dml diff --git a/scripts/ssb/queries/q1_1.dml b/scripts/ssb/queries/q1_1.dml new file mode 100644 index 00000000000..a7a79ad5dc6 --- /dev/null +++ b/scripts/ssb/queries/q1_1.dml @@ -0,0 +1,89 @@ +/* DML-script implementing the ssb query Q1.1 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q1_1.dml -nvargs input_dir="/scripts/data/" + +SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE +FROM lineorder, date +WHERE + lo_orderdate = d_datekey + AND d_year = 1993 + AND lo_discount BETWEEN 1 AND 3 + AND lo_quantity < 25; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. + +*Based on the older implementation. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files from date and lineorder. +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +# -- Data preparation -- + +# Extract only the necessary columns from date and lineorder table. +# Extracted: COL-1 | COL-5 +# => d_datekey | d_year +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# Extracted: COL-6 | COL-9 | COL-10 | COL-12 +# => LO_ORDERDATE | LO_QUANTITY | LO_EXTPRICE | LO_DISCOUNT +lineorder_csv_min = cbind(lineorder_csv[, 6], lineorder_csv[, 9], lineorder_csv[, 10], lineorder_csv[, 12]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# -- Filter the data with RA-SELECTION function. + +# D_YEAR = 1993 +d_year_filt = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1993); + +# LO_QUANTITY < 25 +lo_quan_filt = raSel::m_raSelection(lineorder_matrix_min, col=2, op="<", val=25); + +# LO_DISCOUNT BETWEEN 1 AND 3 +lo_quan_disc_filt = raSel::m_raSelection(lo_quan_filt, col=4, op=">=", val=1); +lo_quan_disc_filt = raSel::m_raSelection(lo_quan_disc_filt, col=4, op="<=", val=3); + +# Minimize LO TABLE +# => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT +lo_quan_disc_filt = cbind(lo_quan_disc_filt[, 1], lo_quan_disc_filt[, 3], lo_quan_disc_filt[, 4]); + +# -- Join -- +# Join LINEORDER and DATE tables with RA-JOIN function +# WHERE LO_ORDERDATE = D_DATEKEY + +# => (D-KEY | D-YEAR) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) +joined_matrix = raJoin::m_raJoin(A=d_year_filt, colA=1, B=lo_quan_disc_filt, colB=1, method="hash2"); +print("LO-DATE JOINED."); + +# Print the first row. +#print(toString(joined_matrix[1,])) + +# -- Aggregation (SUM)-- + +# SUM(lo_extendedprice * lo_discount) AS REVENUE +# Use the joined_matrix with LO_EXTPRICE (COL-4), LO_DISCOUNT (COL-5) +lo_extprice = joined_matrix[, 4]; +lo_disc = joined_matrix[, 5]; +revenue = sum(lo_extprice * lo_disc); + +print("REVENUE: " + as.integer(revenue)); + +#print("Q1.1 finished.\n"); diff --git a/scripts/ssb/sql/q1.1.sql b/scripts/ssb/sql/q1.1.sql index d8a2840ca72..728c63121bc 100644 --- a/scripts/ssb/sql/q1.1.sql +++ b/scripts/ssb/sql/q1.1.sql @@ -1,4 +1,5 @@ -- Licensed to the Apache Software Foundation (ASF) under one +-- Licensed to the Apache Software Foundation (ASF) under one -- or more contributor license agreements. See the NOTICE file -- distributed with this work for additional information -- regarding copyright ownership. The ASF licenses this file @@ -15,9 +16,9 @@ -- specific language governing permissions and limitations -- under the License. SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE -FROM lineorder, dates +FROM lineorder, date --dates (Ssb-dbgen dataset uses "date" instead of "dates") WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 - AND lo_quantity < 25; \ No newline at end of file + AND lo_quantity < 25; From e9e099716862c253a5ff911ef7d8f9f9a4895129 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Tue, 25 Nov 2025 18:02:18 +0100 Subject: [PATCH 06/22] q1_1, q1_2, q1_3 work on the dataset with scale factor 0.1. Biggest change: From "sort-merge" to "hash2" join. --- scripts/ssb/queries/q1_1.dml | 13 ++-- scripts/ssb/queries/q1_2.dml | 112 +++++++++++++++++++++++++++++++++++ scripts/ssb/queries/q1_3.dml | 95 +++++++++++++++++++++++++++++ scripts/ssb/sql/q1.2.sql | 2 +- scripts/ssb/sql/q1.3.sql | 2 +- 5 files changed, 216 insertions(+), 8 deletions(-) create mode 100644 scripts/ssb/queries/q1_2.dml create mode 100644 scripts/ssb/queries/q1_3.dml diff --git a/scripts/ssb/queries/q1_1.dml b/scripts/ssb/queries/q1_1.dml index a7a79ad5dc6..ac3a279bd90 100644 --- a/scripts/ssb/queries/q1_1.dml +++ b/scripts/ssb/queries/q1_1.dml @@ -14,6 +14,7 @@ WHERE *Please run the original SQL query (eg. in Postgres) to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. *Based on the older implementation. https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml @@ -40,7 +41,7 @@ lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="cs # Extract only the necessary columns from date and lineorder table. # Extracted: COL-1 | COL-5 -# => d_datekey | d_year +# => D_DATEKEY | D_YEAR date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); date_matrix_min = as.matrix(date_csv_min); @@ -55,22 +56,22 @@ lineorder_matrix_min = as.matrix(lineorder_csv_min); d_year_filt = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1993); # LO_QUANTITY < 25 -lo_quan_filt = raSel::m_raSelection(lineorder_matrix_min, col=2, op="<", val=25); +lo_filt = raSel::m_raSelection(lineorder_matrix_min, col=2, op="<", val=25); # LO_DISCOUNT BETWEEN 1 AND 3 -lo_quan_disc_filt = raSel::m_raSelection(lo_quan_filt, col=4, op=">=", val=1); -lo_quan_disc_filt = raSel::m_raSelection(lo_quan_disc_filt, col=4, op="<=", val=3); +lo_filt = raSel::m_raSelection(lo_filt, col=4, op=">=", val=1); +lo_filt = raSel::m_raSelection(lo_filt, col=4, op="<=", val=3); # Minimize LO TABLE # => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT -lo_quan_disc_filt = cbind(lo_quan_disc_filt[, 1], lo_quan_disc_filt[, 3], lo_quan_disc_filt[, 4]); +lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); # -- Join -- # Join LINEORDER and DATE tables with RA-JOIN function # WHERE LO_ORDERDATE = D_DATEKEY # => (D-KEY | D-YEAR) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) -joined_matrix = raJoin::m_raJoin(A=d_year_filt, colA=1, B=lo_quan_disc_filt, colB=1, method="hash2"); +joined_matrix = raJoin::m_raJoin(A=d_year_filt, colA=1, B=lo_filt, colB=1, method="hash2"); print("LO-DATE JOINED."); # Print the first row. diff --git a/scripts/ssb/queries/q1_2.dml b/scripts/ssb/queries/q1_2.dml new file mode 100644 index 00000000000..56b5b264883 --- /dev/null +++ b/scripts/ssb/queries/q1_2.dml @@ -0,0 +1,112 @@ +/* DML-script implementing the ssb query Q1.1 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q1_1.dml -nvargs input_dir="/scripts/data/" + +SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE +FROM lineorder, date --dates +WHERE + lo_orderdate = d_datekey + AND d_yearmonth = 'Jan1994' + AND lo_discount BETWEEN 4 AND 6 + AND lo_quantity BETWEEN 26 AND 35; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. + +*Based on the older implementation. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_2.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. +A binary column of d_filt (date_filtered) was removed. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files from date and lineorder. +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +# -- Data preparation -- + +# Extract only the necessary columns from date and lineorder table. +# Extracted: COL-1 | COL-7 +# => D_DATEKEY | D_YEARMONTH + +date_keys_matrix = as.matrix(date_csv[, 1]); +d_yearmonth_col = date_csv[, 7] + +# Count "Jan1994" rows first to pre-allocate matrix efficiently. +date_nrows = nrow(date_keys_matrix); + +sel_ym_count = 0; +for (i in 1:date_nrows) { + yearmonth_val = as.scalar(d_yearmonth_col[i]); + if (yearmonth_val == "Jan1994") { + sel_ym_count = sel_ym_count + 1; + } +} +#print("jan1994_count: " + as.integer(sel_ym_count)); + +# Allocate final matrix. +d_filt = matrix(0, sel_ym_count, 1); +filtered_idx = 1; +for (i in 1:date_nrows) { + yearmonth_val = as.scalar(d_yearmonth_col[i]); + if (yearmonth_val == "Jan1994") { + d_filt[filtered_idx] = as.scalar(date_keys_matrix[i]); # date_key + filtered_idx = filtered_idx + 1; + } +} + +# Extracted: COL-6 | COL-9 | COL-10 | COL-12 +# => LO_ORDERDATE | LO_QUANTITY | LO_EXTPRICE | LO_DISCOUNT +lineorder_csv_min = cbind(lineorder_csv[, 6], lineorder_csv[, 9], lineorder_csv[, 10], lineorder_csv[, 12]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# -- Filter the data with RA-SELECTION function. + +# LO_DISCOUNT BETWEEN 4 AND 6 +lo_filt = raSel::m_raSelection(lineorder_matrix_min, col=4, op=">=", val=4); +lo_filt = raSel::m_raSelection(lo_filt, col=4, op="<=", val=6); + +# LO_QUANTITY BETWEEN 26 AND 35 +lo_filt = raSel::m_raSelection(lo_filt, col=2, op=">=", val=26); +lo_filt = raSel::m_raSelection(lo_filt, col=2, op="<=", val=35); + +# Minimize LO TABLE +# => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT +lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); + +# -- Join -- +# Join LINEORDER and DATE tables with RA-JOIN function +# WHERE LO_ORDERDATE = D_DATEKEY + +# => (D-DATEKEY) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) +joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_filt, colB=1, method="hash2"); +print("LO-DATE JOINED."); + +# Print the first row. +#print(toString(joined_matrix[1,])) + +# -- Aggregation (SUM)-- + +# SUM(lo_extendedprice * lo_discount) AS REVENUE +# Use the joined_matrix with LO_EXTPRICE (COL-4), LO_DISCOUNT (COL-5) +lo_extprice = joined_matrix[, 3]; +lo_disc = joined_matrix[, 4]; +revenue = sum(lo_extprice * lo_disc); + +print("REVENUE: " + as.integer(revenue)); + +#print("Q1.3 finished.\n"); \ No newline at end of file diff --git a/scripts/ssb/queries/q1_3.dml b/scripts/ssb/queries/q1_3.dml new file mode 100644 index 00000000000..c8ae38b9c8b --- /dev/null +++ b/scripts/ssb/queries/q1_3.dml @@ -0,0 +1,95 @@ +/* DML-script implementing the ssb query Q1.1 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q1_1.dml -nvargs input_dir="/scripts/data/" + +SELECT + SUM(lo_extendedprice * lo_discount) AS REVENUE +FROM lineorder, date +WHERE + lo_orderdate = d_datekey + AND d_weeknuminyear = 6 + AND d_year = 1994 + AND lo_discount BETWEEN 5 AND 7 + AND lo_quantity BETWEEN 26 AND 35; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. + +*Based on the older implementation. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files from date and lineorder. +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +# -- Data preparation -- + +# Extract only the necessary columns from date and lineorder table. +# Extracted: COL-1 | COL-5 | COL-12 +# => D_DATEKEY | D_YEAR | D_WEEKNUMINYEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5], date_csv[, 12]); +date_matrix_min = as.matrix(date_csv_min); + +# Extracted: COL-6 | COL-9 | COL-10 | COL-12 +# => LO_ORDERDATE | LO_QUANTITY | LO_EXTPRICE | LO_DISCOUNT +lineorder_csv_min = cbind(lineorder_csv[, 6], lineorder_csv[, 9], lineorder_csv[, 10], lineorder_csv[, 12]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# -- Filter the data with RA-SELECTION function. + +# D_YEAR = 1994 +d_filt = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1994); +# D_WEEKNUMINYEAR = 6 +d_filt = raSel::m_raSelection(d_filt, col=3, op="==", val=6); + +# LO_DISCOUNT BETWEEN 5 AND 7 +lo_filt = raSel::m_raSelection(lineorder_matrix_min, col=4, op=">=", val=5); +lo_filt = raSel::m_raSelection(lo_filt, col=4, op="<=", val=7); + +# LO_QUANTITY BETWEEN 26 AND 35 +lo_filt = raSel::m_raSelection(lo_filt, col=2, op=">=", val=26); +lo_filt = raSel::m_raSelection(lo_filt, col=2, op="<=", val=35); + +# Minimize LO TABLE +# => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT +lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); + +# -- Join -- +# Join LINEORDER and DATE tables with RA-JOIN function +# WHERE LO_ORDERDATE = D_DATEKEY +# Print the first row. +#print(toString(lo_filt[1,])) + +# => (D-DATEKEY | D-YEAR | D_WEEKNUMINYEAR) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) +joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_filt, colB=1, method="hash2"); +print("LO-DATE JOINED."); +#print(toString(joined_matrix[1,])) + +# -- Aggregation (SUM)-- + +# SUM(lo_extendedprice * lo_discount) AS REVENUE +# Use the joined_matrix with LO_EXTPRICE (COL-5), LO_DISCOUNT (COL-6) +lo_extprice = joined_matrix[, 5]; +lo_disc = joined_matrix[, 6]; +revenue = sum(lo_extprice * lo_disc); + +print("REVENUE: " + as.integer(revenue)); + +#print("Q1.3 finished.\n"); diff --git a/scripts/ssb/sql/q1.2.sql b/scripts/ssb/sql/q1.2.sql index db6eb0c613a..7445c53e4fc 100644 --- a/scripts/ssb/sql/q1.2.sql +++ b/scripts/ssb/sql/q1.2.sql @@ -15,7 +15,7 @@ -- specific language governing permissions and limitations -- under the License. SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE -FROM lineorder, dates +FROM lineorder, date --dates WHERE lo_orderdate = d_datekey AND d_yearmonth = 'Jan1994' diff --git a/scripts/ssb/sql/q1.3.sql b/scripts/ssb/sql/q1.3.sql index dbb91b0c46f..4f44b0d9f2f 100644 --- a/scripts/ssb/sql/q1.3.sql +++ b/scripts/ssb/sql/q1.3.sql @@ -16,7 +16,7 @@ -- under the License. SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE -FROM lineorder, dates +FROM lineorder, date --dates WHERE lo_orderdate = d_datekey AND d_weeknuminyear = 6 From fce84ccad8014e0469ddb1f81b2565f05c2ace6a Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Sun, 30 Nov 2025 20:32:51 +0100 Subject: [PATCH 07/22] 2_1 Part_query works, but sorting not (as in the older implementation). --- scripts/ssb/queries/q2_1_groupby.dml | 122 +++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 scripts/ssb/queries/q2_1_groupby.dml diff --git a/scripts/ssb/queries/q2_1_groupby.dml b/scripts/ssb/queries/q2_1_groupby.dml new file mode 100644 index 00000000000..9de17dda531 --- /dev/null +++ b/scripts/ssb/queries/q2_1_groupby.dml @@ -0,0 +1,122 @@ +/* DML-script implementing the ssb query Q1.1 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q2_1_groupby.dml -nvargs input_dir="/scripts/data/" + +SELECT SUM(lo_revenue), p_brand +FROM part, lineorder +WHERE + lo_partkey = p_partkey + AND p_category = 'MFGR#12' + GROUP BY p_brand + ORDER BY p_brand; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. + +*Based on the older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files from date and lineorder. +#date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from part table. +# Extracted: COL-4 | COL-13 +# => LO_PARTKEY | LO_REVENUE + +lineorder_csv_min = cbind(lineorder_csv[, 4], lineorder_csv[, 13]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# -- Filter the data with RA-SELECTION function. + +## Prepare PART on-the-fly encodings (only need p_brand encoding, filter by p_category string) +# We'll encode column 5 (p_brand) on-the-fly and later filter by category string 'MFGR#12'. +[part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); +#print(toString(part_brand_enc_f)); + +# Build filtered PART table (p_category == 'MFGR#12'), keeping key and encoded brand +part_filt_keys = matrix(0, rows=0, cols=1); +part_filt_brand = matrix(0, rows=0, cols=1); +for (i in 1:nrow(part_csv)) { + if (as.scalar(part_csv[i,4]) == "MFGR#12") { + key_val = as.double(as.scalar(part_csv[i,1])); + brand_code = as.double(as.scalar(part_brand_enc_f[i,1])); + part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); + part_filt_brand = rbind(part_filt_brand, matrix(brand_code, rows=1, cols=1)); + } +} +if (nrow(part_filt_keys) == 0) { + part_filt_keys = matrix(0, rows=1, cols=1); + part_filt_brand = matrix(0, rows=1, cols=1); +} +part_filt = cbind(part_filt_keys, part_filt_brand); + +# -- Join -- +# Join LINEORDER and DATE tables WHERE LO_PARTKEY = P_PARTKEY +# P_PARTKEY | P_BRAND | LO_PARTKEY | LO_REVENUE +lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +#print(toString(lo_part[1,])) + +#print(lo_part[1,]) +# -- GROUP-BY & AGGREGATION -- +#print(toString(p_brand_dec)) +#print("LO-PART JOINED."); + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +p_brand = lo_part[,2] +lo_revenue = lo_part[,4] + +# CALCULATING COMBINATION KEY WITH PRIORITY:P_BRAND +max_p_brand = max(p_brand); +p_brand_scale_f = ceil(max_p_brand) + 1; + +combined_key = p_brand; + +group_input = cbind(lo_revenue, combined_key) + +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + +# Aggregation (SUM) +key = agg_result[, 1]; +revenue = rowSums(agg_result[, 2:ncol(agg_result)]); +p_brand = round(key %% p_brand_scale_f); +result = cbind(p_brand, revenue); + +# -- Sorting -- -- Sorting not working!!! +# ORDER BY P_BRAND ASC +result_ordered = order(target=result, by=1, decreasing=FALSE, index.return=FALSE); + +p_brand_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=part_brand_meta); +result = cbind(p_brand_dec, as.frame(result_ordered[,2])); + +# Print result +print("p_brand | SUM(lo_revenue)") +print(result) + +#print("Q4.2 finished.\n"); From 83d9a7b0e114495cd0f8bda25daac4a6b889a37b Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Tue, 2 Dec 2025 18:37:39 +0100 Subject: [PATCH 08/22] q2_1 and 2_3 works on scale 0.1, but sorting string values still a problem. --- scripts/ssb/queries/q1_1.dml | 7 +- scripts/ssb/queries/q1_2.dml | 57 +++++------- scripts/ssb/queries/q1_3.dml | 11 +-- scripts/ssb/queries/q2_1.dml | 163 +++++++++++++++++++++++++++++++++ scripts/ssb/queries/q2_3.dml | 168 +++++++++++++++++++++++++++++++++++ scripts/ssb/sql/q2.1.sql | 2 +- scripts/ssb/sql/q2.3.sql | 2 +- 7 files changed, 365 insertions(+), 45 deletions(-) create mode 100644 scripts/ssb/queries/q2_1.dml create mode 100644 scripts/ssb/queries/q2_3.dml diff --git a/scripts/ssb/queries/q1_1.dml b/scripts/ssb/queries/q1_1.dml index ac3a279bd90..3c0a87839fa 100644 --- a/scripts/ssb/queries/q1_1.dml +++ b/scripts/ssb/queries/q1_1.dml @@ -72,7 +72,7 @@ lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); # => (D-KEY | D-YEAR) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) joined_matrix = raJoin::m_raJoin(A=d_year_filt, colA=1, B=lo_filt, colB=1, method="hash2"); -print("LO-DATE JOINED."); +print("LO-DATE JOINED.\n"); # Print the first row. #print(toString(joined_matrix[1,])) @@ -85,6 +85,7 @@ lo_extprice = joined_matrix[, 4]; lo_disc = joined_matrix[, 5]; revenue = sum(lo_extprice * lo_disc); -print("REVENUE: " + as.integer(revenue)); +print("REVENUE") +print(as.integer(revenue)); -#print("Q1.1 finished.\n"); +print("\nQ1.1 finished.\n"); diff --git a/scripts/ssb/queries/q1_2.dml b/scripts/ssb/queries/q1_2.dml index 56b5b264883..a909d4ddaa9 100644 --- a/scripts/ssb/queries/q1_2.dml +++ b/scripts/ssb/queries/q1_2.dml @@ -1,4 +1,4 @@ -/* DML-script implementing the ssb query Q1.1 in SystemDS. +/* DML-script implementing the ssb query Q1.2 in SystemDS. **input_dir="/scripts/ssb/data" * Run with docker: @@ -17,6 +17,8 @@ to verify the correctness of DML version. *Based on the older implementation. https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_2.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml In comparison to older version the join method was changed from sort-merge to hash2 to improve the performance. A binary column of d_filt (date_filtered) was removed. @@ -40,34 +42,6 @@ lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="cs # -- Data preparation -- # Extract only the necessary columns from date and lineorder table. -# Extracted: COL-1 | COL-7 -# => D_DATEKEY | D_YEARMONTH - -date_keys_matrix = as.matrix(date_csv[, 1]); -d_yearmonth_col = date_csv[, 7] - -# Count "Jan1994" rows first to pre-allocate matrix efficiently. -date_nrows = nrow(date_keys_matrix); - -sel_ym_count = 0; -for (i in 1:date_nrows) { - yearmonth_val = as.scalar(d_yearmonth_col[i]); - if (yearmonth_val == "Jan1994") { - sel_ym_count = sel_ym_count + 1; - } -} -#print("jan1994_count: " + as.integer(sel_ym_count)); - -# Allocate final matrix. -d_filt = matrix(0, sel_ym_count, 1); -filtered_idx = 1; -for (i in 1:date_nrows) { - yearmonth_val = as.scalar(d_yearmonth_col[i]); - if (yearmonth_val == "Jan1994") { - d_filt[filtered_idx] = as.scalar(date_keys_matrix[i]); # date_key - filtered_idx = filtered_idx + 1; - } -} # Extracted: COL-6 | COL-9 | COL-10 | COL-12 # => LO_ORDERDATE | LO_QUANTITY | LO_EXTPRICE | LO_DISCOUNT @@ -75,7 +49,6 @@ lineorder_csv_min = cbind(lineorder_csv[, 6], lineorder_csv[, 9], lineorder_csv[ lineorder_matrix_min = as.matrix(lineorder_csv_min); # -- Filter the data with RA-SELECTION function. - # LO_DISCOUNT BETWEEN 4 AND 6 lo_filt = raSel::m_raSelection(lineorder_matrix_min, col=4, op=">=", val=4); lo_filt = raSel::m_raSelection(lo_filt, col=4, op="<=", val=6); @@ -88,13 +61,26 @@ lo_filt = raSel::m_raSelection(lo_filt, col=2, op="<=", val=35); # => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); +# -- Filter tables over string values. +# Build filtered SUPPLIER table (s_region == 'AMERICA') +date_filt = matrix(0, rows=0, cols=1); +for (i in 1:nrow(date_csv)) { + if (as.scalar(date_csv[i,7]) == "Jan1994") { + key_val = as.double(as.scalar(date_csv[i,1])); + date_filt = rbind(date_filt, matrix(key_val, rows=1, cols=1)); + } +} +if (nrow(date_filt) == 0) { + date_filt = matrix(0, rows=1, cols=1); +} + # -- Join -- # Join LINEORDER and DATE tables with RA-JOIN function # WHERE LO_ORDERDATE = D_DATEKEY -# => (D-DATEKEY) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) -joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_filt, colB=1, method="hash2"); -print("LO-DATE JOINED."); +# => (D_DATEKEY) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) +joined_matrix = raJoin::m_raJoin(A=date_filt, colA=1, B=lo_filt, colB=1, method="hash2"); +print("LO-DATE JOINED.\n"); # Print the first row. #print(toString(joined_matrix[1,])) @@ -107,6 +93,7 @@ lo_extprice = joined_matrix[, 3]; lo_disc = joined_matrix[, 4]; revenue = sum(lo_extprice * lo_disc); -print("REVENUE: " + as.integer(revenue)); +print("REVENUE") +print(as.integer(revenue)); -#print("Q1.3 finished.\n"); \ No newline at end of file +print("\nQ1.2 finished.\n"); \ No newline at end of file diff --git a/scripts/ssb/queries/q1_3.dml b/scripts/ssb/queries/q1_3.dml index c8ae38b9c8b..6ac2dc5a4dc 100644 --- a/scripts/ssb/queries/q1_3.dml +++ b/scripts/ssb/queries/q1_3.dml @@ -1,4 +1,4 @@ -/* DML-script implementing the ssb query Q1.1 in SystemDS. +/* DML-script implementing the ssb query Q1.3 in SystemDS. **input_dir="/scripts/ssb/data" * Run with docker: @@ -77,9 +77,9 @@ lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); # Print the first row. #print(toString(lo_filt[1,])) -# => (D-DATEKEY | D-YEAR | D_WEEKNUMINYEAR) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) +# => (D_DATEKEY | D_YEAR | D_WEEKNUMINYEAR) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_filt, colB=1, method="hash2"); -print("LO-DATE JOINED."); +print("LO-DATE JOINED.\n"); #print(toString(joined_matrix[1,])) # -- Aggregation (SUM)-- @@ -90,6 +90,7 @@ lo_extprice = joined_matrix[, 5]; lo_disc = joined_matrix[, 6]; revenue = sum(lo_extprice * lo_disc); -print("REVENUE: " + as.integer(revenue)); +print("REVENUE") +print(as.integer(revenue)); -#print("Q1.3 finished.\n"); +print("\nQ1.3 finished.\n"); diff --git a/scripts/ssb/queries/q2_1.dml b/scripts/ssb/queries/q2_1.dml new file mode 100644 index 00000000000..9b94883877a --- /dev/null +++ b/scripts/ssb/queries/q2_1.dml @@ -0,0 +1,163 @@ +/* DML-script implementing the ssb query Q2.1 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q2_1.dml -nvargs input_dir="/scripts/data/" + +SELECT SUM(lo_revenue), d_year, p_brand +FROM lineorder, date, part, supplier +WHERE + lo_orderdate = d_datekey + AND lo_partkey = p_partkey + AND lo_suppkey = s_suppkey + AND p_category = 'MFGR#12' + AND s_region = 'AMERICA' + GROUP BY d_year, p_brand + ORDER BY p_brand; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. +-> Sorting does not work. + +*Based on older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files from date and lineorder. +#date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from tables. +# Extracted: COL-4 | COL-5 | COL-6 | COL_COL-13 +# => LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE +lineorder_csv_min = cbind(lineorder_csv[, 4], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# Extracted: COL-1 | COL-5 +# => D_DATEKEY | D_YEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# -- Filter tables over string values. + +# Prepare PART table on-the-fly encodings +# (only need p_brand encoding, filter by p_category string) +[part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); +#print(toString(part_brand_enc_f)); + +# Build filtered PART table (p_category == 'MFGR#12'), keeping key and encoded brand +part_filt_keys = matrix(0, rows=0, cols=1); +part_filt_brand = matrix(0, rows=0, cols=1); +for (i in 1:nrow(part_csv)) { + if (as.scalar(part_csv[i,4]) == "MFGR#12") { + key_val = as.double(as.scalar(part_csv[i,1])); + brand_code = as.double(as.scalar(part_brand_enc_f[i,1])); + part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); + part_filt_brand = rbind(part_filt_brand, matrix(brand_code, rows=1, cols=1)); + } +} +if (nrow(part_filt_keys) == 0) { + part_filt_keys = matrix(0, rows=1, cols=1); + part_filt_brand = matrix(0, rows=1, cols=1); +} +part_filt = cbind(part_filt_keys, part_filt_brand); + +# Build filtered SUPPLIER table (s_region == 'AMERICA') +supp_filt = matrix(0, rows=0, cols=1); +for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "AMERICA") { + key_val = as.double(as.scalar(supp_csv[i,1])); + supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); + } +} +if (nrow(supp_filt) == 0) { + supp_filt = matrix(0, rows=1, cols=1); +} +#print(toString(supp_filt[1,])) + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- + +# Join LINEORDER table with PART, SUPPLIER, DATE tables (star schema) +# Join order does matter! +# LINEORDER table with DATE, PART, SUPPLIER is much slower! +# WHERE LO_ORDERDATE = P_PARTKEY +# (P_PARTKEY | P_BRAND) | (LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) +lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +# WHERE LO_SUPPKEY = S_SUPPKEY +# (S_SUPPKEY) | (P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) +lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); +# WHERE LO_PARTKEY = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) +# Example: +# 19920325.000 1992.000 17.000 608.000 381.000 608.000 17.000 19920325.000 5702508.000 +lo_part_supp_date = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); +#print(toString(lo_part_supp_date[1,])) + +# -- GROUP-BY & AGGREGATION -- + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +d_year = lo_part_supp_date[,2] +p_brand = lo_part_supp_date[,5] +lo_revenue = lo_part_supp_date[,9] + +# CALCULATING COMBINATION KEY WITH PRIORITY:P_BRAND + +max_p_brand = max(p_brand); +max_d_year = max(d_year); + +p_brand_scale_f = ceil(max_p_brand) + 1; +d_year_scale_f = ceil(max_d_year) + 1; + +combined_key = d_year * p_brand_scale_f + p_brand; + +group_input = cbind(lo_revenue, combined_key) + +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + +# Aggregation (SUM) +key = agg_result[, 1]; +revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + +# EXTRACTING D_YEAR, P_BRAND +d_year = round(floor(key / (p_brand_scale_f))); +p_brand = round(key %% p_brand_scale_f); +result = cbind(revenue, d_year, p_brand, key); + +# -- Sorting -- -- Sorting int columns works, but string does not. +# ORDER BY P_BRAND ASC +result_ordered = order(target=result, by=3, decreasing=FALSE, index.return=FALSE); + +p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); +res = cbind(as.frame(result_ordered[,1]), as.frame(result_ordered[,2]), p_brand_dec) ; + +# Print result +print("SUM(lo_revenue) | d_year | p_brand") +print(res) + +print("\nQ2.1 finished.\n"); diff --git a/scripts/ssb/queries/q2_3.dml b/scripts/ssb/queries/q2_3.dml new file mode 100644 index 00000000000..53c8d72b855 --- /dev/null +++ b/scripts/ssb/queries/q2_3.dml @@ -0,0 +1,168 @@ +/* DML-script implementing the ssb query Q2.3 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q2_3.dml -nvargs input_dir="/scripts/data/" + +SELECT SUM(lo_revenue), d_year, p_brand +FROM lineorder, date, part, supplier --dates +WHERE + lo_orderdate = d_datekey + AND lo_partkey = p_partkey + AND lo_suppkey = s_suppkey + AND p_brand = 'MFGR#2239' + AND s_region = 'EUROPE' +GROUP BY d_year, p_brand +ORDER BY d_year, p_brand; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. +-> Sorting does not work. + +*Based on older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files from date and lineorder. +#date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from tables. +# Extracted: COL-4 | COL-5 | COL-6 | COL_COL-13 +# => LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE +lineorder_csv_min = cbind(lineorder_csv[, 4], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# Extracted: COL-1 | COL-5 +# => D_DATEKEY | D_YEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# -- Filter tables over string values. + +# Prepare PART table on-the-fly encodings +# (only need p_brand encoding, filter by p_category string) +[part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); +#print(toString(part_brand_enc_f)); + +# Build filtered PART table (p_brand == 'MFGR#2239'), keeping key and encoded brand +part_filt_keys = matrix(0, rows=0, cols=1); +part_filt_brand = matrix(0, rows=0, cols=1); +for (i in 1:nrow(part_csv)) { + if (as.scalar(part_csv[i,5]) == "MFGR#2239") { + key_val = as.double(as.scalar(part_csv[i,1])); + brand_code = as.double(as.scalar(part_brand_enc_f[i,1])); + part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); + part_filt_brand = rbind(part_filt_brand, matrix(brand_code, rows=1, cols=1)); + } +} +if (nrow(part_filt_keys) == 0) { + part_filt_keys = matrix(0, rows=1, cols=1); + part_filt_brand = matrix(0, rows=1, cols=1); +} +part_filt = cbind(part_filt_keys, part_filt_brand); +print(part_filt[1,]) + +# Build filtered SUPPLIER table (s_region == 'EUROPE') +supp_filt = matrix(0, rows=0, cols=1); +for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "EUROPE") { + key_val = as.double(as.scalar(supp_csv[i,1])); + supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); + } +} +if (nrow(supp_filt) == 0) { + supp_filt = matrix(0, rows=1, cols=1); +} +#print("LO,DATE,PART,SUPP") +#print(toString(lineorder_matrix_min[1,])) +#print(toString(date_matrix_min[1,])) +#print(toString(part_filt[1,])) +#print(toString(supp_filt[1,])) + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- + +# Join LINEORDER table with PART, SUPPLIER, DATE tables (star schema) +# Join order does matter! +# LINEORDER table with DATE, PART, SUPPLIER is much slower! +# WHERE LO_ORDERDATE = P_PARTKEY +# (P_PARTKEY | P_BRAND) | (LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) +lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +# WHERE LO_SUPPKEY = S_SUPPKEY +# (S_SUPPKEY) | (P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) +lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); +# WHERE LO_PARTKEY = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) +# Example: +# 19920325.000 1992.000 17.000 608.000 381.000 608.000 17.000 19920325.000 5702508.000 +lo_part_supp_date = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); +#print(toString(lo_part_supp_date[1,])) + +# -- GROUP-BY & AGGREGATION -- + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +d_year = lo_part_supp_date[,2] +p_brand = lo_part_supp_date[,5] +lo_revenue = lo_part_supp_date[,9] + +# CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 P_BRAND + +max_p_brand = max(p_brand); +max_d_year = max(d_year); + +p_brand_scale_f = ceil(max_p_brand) + 1; +d_year_scale_f = ceil(max_d_year) + 1; + +combined_key = d_year * p_brand_scale_f + p_brand; + +group_input = cbind(lo_revenue, combined_key) + +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + +# Aggregation (SUM) +key = agg_result[, 1]; +revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + +# EXTRACTING D_YEAR, P_BRAND +d_year = round(floor(key / (p_brand_scale_f))); +p_brand = round(key %% p_brand_scale_f); +result = cbind(revenue, d_year, p_brand, key); + +# -- Sorting -- -- Sorting int columns works, but string does not. +# ORDER BY P_BRAND ASC +result_ordered = order(target=result, by=4, decreasing=FALSE, index.return=FALSE); + +p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); +res = cbind(as.frame(result_ordered[,1]), as.frame(result_ordered[,2]), p_brand_dec) ; + +# Print result +print("SUM(lo_revenue) | d_year | p_brand") +print(res) + +print("\nQ2.3 finished.\n"); diff --git a/scripts/ssb/sql/q2.1.sql b/scripts/ssb/sql/q2.1.sql index 70a8de9d42e..785327bbddd 100644 --- a/scripts/ssb/sql/q2.1.sql +++ b/scripts/ssb/sql/q2.1.sql @@ -15,7 +15,7 @@ -- specific language governing permissions and limitations -- under the License. SELECT SUM(lo_revenue), d_year, p_brand -FROM lineorder, dates, part, supplier +FROM lineorder, date, part, supplier --dates WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey diff --git a/scripts/ssb/sql/q2.3.sql b/scripts/ssb/sql/q2.3.sql index 22d2419621c..deeb6e64448 100644 --- a/scripts/ssb/sql/q2.3.sql +++ b/scripts/ssb/sql/q2.3.sql @@ -15,7 +15,7 @@ -- specific language governing permissions and limitations -- under the License. SELECT SUM(lo_revenue), d_year, p_brand -FROM lineorder, dates, part, supplier +FROM lineorder, date, part, supplier --dates WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey From b51f4a29cd91f59e2537d88bbec4f8316b9bbc8c Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Sat, 6 Dec 2025 21:48:24 +0100 Subject: [PATCH 09/22] q4_1, q4_2, q4_3 works on scale 0.1, joining order matters. --- scripts/ssb/queries/q1_2.dml | 4 +- scripts/ssb/queries/q1_3.dml | 8 +- scripts/ssb/queries/q2_1.dml | 29 +++-- scripts/ssb/queries/q2_2.dml | 167 ++++++++++++++++++++++++++ scripts/ssb/queries/q2_3.dml | 36 +++--- scripts/ssb/queries/q4_1.dml | 200 +++++++++++++++++++++++++++++++ scripts/ssb/queries/q4_2.dml | 220 +++++++++++++++++++++++++++++++++++ scripts/ssb/queries/q4_3.dml | 208 +++++++++++++++++++++++++++++++++ scripts/ssb/sql/q2.2.sql | 2 +- scripts/ssb/sql/q3.1.sql | 2 +- scripts/ssb/sql/q3.3.sql | 2 +- scripts/ssb/sql/q3.4.sql | 2 +- scripts/ssb/sql/q4.1.sql | 2 +- scripts/ssb/sql/q4.2.sql | 2 +- scripts/ssb/sql/q4.3.sql | 2 +- 15 files changed, 839 insertions(+), 47 deletions(-) create mode 100644 scripts/ssb/queries/q2_2.dml create mode 100644 scripts/ssb/queries/q4_1.dml create mode 100644 scripts/ssb/queries/q4_2.dml create mode 100644 scripts/ssb/queries/q4_3.dml diff --git a/scripts/ssb/queries/q1_2.dml b/scripts/ssb/queries/q1_2.dml index a909d4ddaa9..f4a8ce0d212 100644 --- a/scripts/ssb/queries/q1_2.dml +++ b/scripts/ssb/queries/q1_2.dml @@ -62,7 +62,9 @@ lo_filt = raSel::m_raSelection(lo_filt, col=2, op="<=", val=35); lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); # -- Filter tables over string values. -# Build filtered SUPPLIER table (s_region == 'AMERICA') +# Extracted: COL-1 | COL-7 +# D_DATEKEY | D_YEARMONTH +# Build filtered DATE table (D_YEARMONTH = 'Jan1994') date_filt = matrix(0, rows=0, cols=1); for (i in 1:nrow(date_csv)) { if (as.scalar(date_csv[i,7]) == "Jan1994") { diff --git a/scripts/ssb/queries/q1_3.dml b/scripts/ssb/queries/q1_3.dml index 6ac2dc5a4dc..b5b45aa0cb4 100644 --- a/scripts/ssb/queries/q1_3.dml +++ b/scripts/ssb/queries/q1_3.dml @@ -54,16 +54,16 @@ lineorder_matrix_min = as.matrix(lineorder_csv_min); # -- Filter the data with RA-SELECTION function. -# D_YEAR = 1994 +# WHERE D_YEAR = 1994 d_filt = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1994); -# D_WEEKNUMINYEAR = 6 +# WHERE D_WEEKNUMINYEAR = 6 d_filt = raSel::m_raSelection(d_filt, col=3, op="==", val=6); -# LO_DISCOUNT BETWEEN 5 AND 7 +# WHERE LO_DISCOUNT BETWEEN 5 AND 7 lo_filt = raSel::m_raSelection(lineorder_matrix_min, col=4, op=">=", val=5); lo_filt = raSel::m_raSelection(lo_filt, col=4, op="<=", val=7); -# LO_QUANTITY BETWEEN 26 AND 35 +# WHERE LO_QUANTITY BETWEEN 26 AND 35 lo_filt = raSel::m_raSelection(lo_filt, col=2, op=">=", val=26); lo_filt = raSel::m_raSelection(lo_filt, col=2, op="<=", val=35); diff --git a/scripts/ssb/queries/q2_1.dml b/scripts/ssb/queries/q2_1.dml index 9b94883877a..fd2340adfec 100644 --- a/scripts/ssb/queries/q2_1.dml +++ b/scripts/ssb/queries/q2_1.dml @@ -40,8 +40,7 @@ source("./scripts/builtin/raGroupby.dml") as raGrp input_dir = ifdef($input_dir, "./data"); print("Loading tables from directory: " + input_dir); -# Read and load input CSV files from date and lineorder. -#date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# Read and load input CSV files from lineorder, date, part, supplier. lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); @@ -53,7 +52,7 @@ general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; # Extract only the necessary columns from tables. # Extracted: COL-4 | COL-5 | COL-6 | COL_COL-13 -# => LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE +# => LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE lineorder_csv_min = cbind(lineorder_csv[, 4], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13]); lineorder_matrix_min = as.matrix(lineorder_csv_min); @@ -64,12 +63,14 @@ date_matrix_min = as.matrix(date_csv_min); # -- Filter tables over string values. -# Prepare PART table on-the-fly encodings -# (only need p_brand encoding, filter by p_category string) +# Prepare PART table on-the-fly encodings +# Extracted: COL-1 | COL-4 | COL-5 +# P_PARTKEY | P_CATEGORY | P_BRAND +# (only need P_BRAND encoding, filter by P_CATEGORY string) [part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); #print(toString(part_brand_enc_f)); -# Build filtered PART table (p_category == 'MFGR#12'), keeping key and encoded brand +# Build filtered PART table (P_CATEGORY = 'MFGR#12'), keeping key and encoded brand part_filt_keys = matrix(0, rows=0, cols=1); part_filt_brand = matrix(0, rows=0, cols=1); for (i in 1:nrow(part_csv)) { @@ -86,7 +87,9 @@ if (nrow(part_filt_keys) == 0) { } part_filt = cbind(part_filt_keys, part_filt_brand); -# Build filtered SUPPLIER table (s_region == 'AMERICA') +# Extracted: COL-1 | COL-6 +# S_SUPPKEY | S_REGION +# Build filtered SUPPLIER table (S_REGION = 'AMERICA') supp_filt = matrix(0, rows=0, cols=1); for (i in 1:nrow(supp_csv)) { if (as.scalar(supp_csv[i,6]) == "AMERICA") { @@ -104,21 +107,17 @@ if (nrow(supp_filt) == 0) { # Join LINEORDER table with PART, SUPPLIER, DATE tables (star schema) # Join order does matter! # LINEORDER table with DATE, PART, SUPPLIER is much slower! -# WHERE LO_ORDERDATE = P_PARTKEY -# (P_PARTKEY | P_BRAND) | (LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) +# WHERE LO_PARTKEY = P_PARTKEY lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); # WHERE LO_SUPPKEY = S_SUPPKEY -# (S_SUPPKEY) | (P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); -# WHERE LO_PARTKEY = D_DATEKEY -# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) +# WHERE LO_ORDERDATE = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) # Example: # 19920325.000 1992.000 17.000 608.000 381.000 608.000 17.000 19920325.000 5702508.000 lo_part_supp_date = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); #print(toString(lo_part_supp_date[1,])) -# -- GROUP-BY & AGGREGATION -- - # -- Group-By and Aggregation (SUM)-- # Group-By @@ -126,7 +125,7 @@ d_year = lo_part_supp_date[,2] p_brand = lo_part_supp_date[,5] lo_revenue = lo_part_supp_date[,9] -# CALCULATING COMBINATION KEY WITH PRIORITY:P_BRAND +# CALCULATING COMBINATION KEY D_YEAR, P_BRAND max_p_brand = max(p_brand); max_d_year = max(d_year); diff --git a/scripts/ssb/queries/q2_2.dml b/scripts/ssb/queries/q2_2.dml new file mode 100644 index 00000000000..ab041323386 --- /dev/null +++ b/scripts/ssb/queries/q2_2.dml @@ -0,0 +1,167 @@ +/* DML-script implementing the ssb query Q2.2 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q2_2.dml -nvargs input_dir="/scripts/data/" + +SELECT SUM(lo_revenue), d_year, p_brand +FROM lineorder, date, part, supplier --dates +WHERE + lo_orderdate = d_datekey + AND lo_partkey = p_partkey + AND lo_suppkey = s_suppkey + AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228' + AND s_region = 'ASIA' +GROUP BY d_year, p_brand +ORDER BY d_year, p_brand; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. +-> Sorting does not work. + +*Based on older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files from lineorder, date, part, supplier. +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from tables. +# Extracted: COL-4 | COL-5 | COL-6 | COL_COL-13 +# => LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE +lineorder_csv_min = cbind(lineorder_csv[, 4], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# Extracted: COL-1 | COL-5 +# => D_DATEKEY | D_YEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# -- Filter tables over string values. + +# Prepare PART table on-the-fly encodings +# Extracted: COL-1 | COL-5 +# P_PARTKEY | P_BRAND +# (only need P_BRAND encoding, filter by P_BRAND string itself) +[part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); +#print(toString(part_brand_enc_f)); + +# Build filtered PART table (P_BRAND BETWEEN 'MFGR#2221' AND 'MFGR#2228'), keeping key and encoded brand +part_filt_keys = matrix(0, rows=0, cols=1); +part_filt_brand = matrix(0, rows=0, cols=1); +for (i in 1:nrow(part_csv)) { + p_elem = as.scalar(part_csv[i,5]) + if ( p_elem >= "MFGR#2221" & p_elem <= "MFGR#2228") { + key_val = as.double(as.scalar(part_csv[i,1])); + brand_code = as.double(as.scalar(part_brand_enc_f[i,1])); + part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); + part_filt_brand = rbind(part_filt_brand, matrix(brand_code, rows=1, cols=1)); + } +} +if (nrow(part_filt_keys) == 0) { + part_filt_keys = matrix(0, rows=1, cols=1); + part_filt_brand = matrix(0, rows=1, cols=1); +} +part_filt = cbind(part_filt_keys, part_filt_brand); + +# Extracted: COL-1 | COL-6 +# S_SUPPKEY | S_REGION +# Build filtered SUPPLIER table (S_REGION = 'ASIA') +supp_filt = matrix(0, rows=0, cols=1); +for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "ASIA") { + key_val = as.double(as.scalar(supp_csv[i,1])); + supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); + } +} +if (nrow(supp_filt) == 0) { + supp_filt = matrix(0, rows=1, cols=1); +} +#print("LO,DATE,PART,SUPP") +#print(toString(lineorder_matrix_min[1,])) +#print(toString(date_matrix_min[1,])) +#print(toString(part_filt[1,])) +#print(toString(supp_filt[1,])) + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- + +# Join LINEORDER table with PART, SUPPLIER, DATE tables (star schema) +# Join order does matter! +# WHERE LO_PARTKEY = P_PARTKEY +lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +# WHERE LO_SUPPKEY = S_SUPPKEY +lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); +# WHERE LO_ORDERDATE = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) + +lo_part_supp_date = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); +#print(toString(lo_part_supp_date[1,])) + +# -- GROUP-BY & AGGREGATION -- + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +d_year = lo_part_supp_date[,2] +p_brand = lo_part_supp_date[,5] +lo_revenue = lo_part_supp_date[,9] + +# CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 P_BRAND + +max_p_brand = max(p_brand); +max_d_year = max(d_year); + +p_brand_scale_f = ceil(max_p_brand) + 1; +d_year_scale_f = ceil(max_d_year) + 1; + +combined_key = d_year * p_brand_scale_f + p_brand; + +group_input = cbind(lo_revenue, combined_key) + +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + +# Aggregation (SUM) +key = agg_result[, 1]; +revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + +# EXTRACTING D_YEAR, P_BRAND +d_year = round(floor(key / (p_brand_scale_f))); +p_brand = round(key %% p_brand_scale_f); +result = cbind(revenue, d_year, p_brand, key); + +# -- Sorting -- -- Sorting int columns works, but string does not. +# ORDER BY D_YEAR, P_BRAND ASC +result_ordered = order(target=result, by=4, decreasing=FALSE, index.return=FALSE); + +p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); +res = cbind(as.frame(result_ordered[,1]), as.frame(result_ordered[,2]), p_brand_dec) ; + +# Print result +print("SUM(lo_revenue) | d_year | p_brand") +print(res) + +print("\nQ2.2 finished.\n"); diff --git a/scripts/ssb/queries/q2_3.dml b/scripts/ssb/queries/q2_3.dml index 53c8d72b855..a1291485b39 100644 --- a/scripts/ssb/queries/q2_3.dml +++ b/scripts/ssb/queries/q2_3.dml @@ -40,8 +40,7 @@ source("./scripts/builtin/raGroupby.dml") as raGrp input_dir = ifdef($input_dir, "./data"); print("Loading tables from directory: " + input_dir); -# Read and load input CSV files from date and lineorder. -#date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# Read and load input CSV files from lineorder, date, part, supplier. lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); @@ -53,7 +52,7 @@ general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; # Extract only the necessary columns from tables. # Extracted: COL-4 | COL-5 | COL-6 | COL_COL-13 -# => LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE +# => LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE lineorder_csv_min = cbind(lineorder_csv[, 4], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13]); lineorder_matrix_min = as.matrix(lineorder_csv_min); @@ -64,12 +63,14 @@ date_matrix_min = as.matrix(date_csv_min); # -- Filter tables over string values. -# Prepare PART table on-the-fly encodings -# (only need p_brand encoding, filter by p_category string) +# Prepare PART table on-the-fly encodings +# Extracted: COL-1 | COL-5 +# P_PARTKEY | P_BRAND +# (only need P_BRAND encoding, filter by P_BRAND string itself) [part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); #print(toString(part_brand_enc_f)); -# Build filtered PART table (p_brand == 'MFGR#2239'), keeping key and encoded brand +# Build filtered PART table (P_BRAND = 'MFGR#2239'), keeping key and encoded brand part_filt_keys = matrix(0, rows=0, cols=1); part_filt_brand = matrix(0, rows=0, cols=1); for (i in 1:nrow(part_csv)) { @@ -85,9 +86,10 @@ if (nrow(part_filt_keys) == 0) { part_filt_brand = matrix(0, rows=1, cols=1); } part_filt = cbind(part_filt_keys, part_filt_brand); -print(part_filt[1,]) -# Build filtered SUPPLIER table (s_region == 'EUROPE') +# Extracted: COL-1 | COL-6 +# S_SUPPKEY | S_REGION +# Build filtered SUPPLIER table (s_region = 'EUROPE') supp_filt = matrix(0, rows=0, cols=1); for (i in 1:nrow(supp_csv)) { if (as.scalar(supp_csv[i,6]) == "EUROPE") { @@ -109,21 +111,15 @@ if (nrow(supp_filt) == 0) { # Join LINEORDER table with PART, SUPPLIER, DATE tables (star schema) # Join order does matter! # LINEORDER table with DATE, PART, SUPPLIER is much slower! -# WHERE LO_ORDERDATE = P_PARTKEY -# (P_PARTKEY | P_BRAND) | (LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) +# WHERE LO_PARTKEY = P_PARTKEY lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); # WHERE LO_SUPPKEY = S_SUPPKEY -# (S_SUPPKEY) | (P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); -# WHERE LO_PARTKEY = D_DATEKEY -# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_DATEKEY | LO_REVENUE) -# Example: -# 19920325.000 1992.000 17.000 608.000 381.000 608.000 17.000 19920325.000 5702508.000 +# WHERE LO_ORDERDATE = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) lo_part_supp_date = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); #print(toString(lo_part_supp_date[1,])) -# -- GROUP-BY & AGGREGATION -- - # -- Group-By and Aggregation (SUM)-- # Group-By @@ -155,14 +151,14 @@ p_brand = round(key %% p_brand_scale_f); result = cbind(revenue, d_year, p_brand, key); # -- Sorting -- -- Sorting int columns works, but string does not. -# ORDER BY P_BRAND ASC +# ORDER BY D_YEAR, P_BRAND ASC result_ordered = order(target=result, by=4, decreasing=FALSE, index.return=FALSE); p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); res = cbind(as.frame(result_ordered[,1]), as.frame(result_ordered[,2]), p_brand_dec) ; # Print result -print("SUM(lo_revenue) | d_year | p_brand") -print(res) +print("SUM(lo_revenue) | d_year | p_brand"); +print(res); print("\nQ2.3 finished.\n"); diff --git a/scripts/ssb/queries/q4_1.dml b/scripts/ssb/queries/q4_1.dml new file mode 100644 index 00000000000..ac05ebdf6f9 --- /dev/null +++ b/scripts/ssb/queries/q4_1.dml @@ -0,0 +1,200 @@ +/* DML-script implementing the ssb query Q4.2 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q4_2.dml -nvargs input_dir="/scripts/data/" + +SELECT + d_year, + c_nation, + SUM(lo_revenue - lo_supplycost) AS PROFIT +FROM date, customer, supplier, part, lineorder -- dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_partkey = p_partkey + AND lo_orderdate = d_datekey + AND c_region = 'AMERICA' + AND s_region = 'AMERICA' + AND ( + p_mfgr = 'MFGR#1' + OR p_mfgr = 'MFGR#2' + ) +GROUP BY d_year, c_nation +ORDER BY d_year, c_nation; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. +-> Sorting does not work. + +*Based on older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files. +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from tables. +# Extracted: COL-3 | COL-4 | COL-5 | COL-6 | COL-13 | COL-14 +# => LO_CUSTKEY | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | +# LO_REVENUE | LO_SUPPLYCOST +lineorder_csv_min = cbind(lineorder_csv[, 3], lineorder_csv[, 4], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13], lineorder_csv[, 14]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# Extracted: COL-1 | COL-5 +# => D_DATEKEY | D_YEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# -- Filter tables over string values. + +# Prepare PART table on-the-fly encodings +# Extracted: COL-1 | COL-3 +# P_PARTKEY | P_MFGR + +# Build filtered PART table (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2'), keeping key +part_filt = matrix(0, rows=0, cols=1); +for (i in 1:nrow(part_csv)) { + p_elem = as.scalar(part_csv[i,3]) + if ( p_elem == "MFGR#1" | p_elem == "MFGR#2" ) { + key_val = as.double(as.scalar(part_csv[i,1])); + part_filt = rbind(part_filt, matrix(key_val, rows=1, cols=1)); + } +} +if (nrow(part_filt) == 0) { + part_filt = matrix(0, rows=1, cols=1); +} + +# Extracted: COL-1 | COL-6 +# S_SUPPKEY | S_REGION +# Build filtered SUPPLIER table (S_NATION = 'AMERICA') +supp_filt = matrix(0, rows=0, cols=1); +for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "AMERICA") { + key_val = as.double(as.scalar(supp_csv[i,1])); + supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); + } +} +if (nrow(supp_filt) == 0) { + supp_filt = matrix(0, rows=1, cols=1); +} + +# Prepare CUSTOMER table on-the-fly encodings +# Extracted: COL-1 | COL-5 | COL-6 +# C_CUSTKEY | C_NATION | C_REGION +# (only need C_NATION encoding, filter by C_REGION string) +[cust_nat_enc_f, cust_nat_meta] = transformencode(target=cust_csv[,5], spec=general_spec); + +# Build filtered CUSTOMER table (C_NATION = 'AMERICA') +cust_filt_keys = matrix(0, rows=0, cols=1); +cust_filt_nat = matrix(0, rows=0, cols=1); +for (i in 1:nrow(cust_csv)) { + if (as.scalar(cust_csv[i,6]) == "AMERICA") { + key_val = as.double(as.scalar(cust_csv[i,1])); + nat_code = as.double(as.scalar(cust_nat_enc_f[i,1])); + cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); + cust_filt_nat = rbind(cust_filt_nat, matrix(nat_code, rows=1, cols=1)); + } +} +if (nrow(cust_filt_keys) == 0) { + cust_filt_keys = matrix(0, rows=1, cols=1); + cust_filt_nat = matrix(0, rows=1, cols=1); +} +cust_filt = cbind(cust_filt_keys,cust_filt_nat); + +#print("LO,DATE,CUST,PART,SUPP") +#print(toString(lineorder_matrix_min[1,])) +#print(toString(date_matrix_min[1,])) +#print(toString(cust_filt[1,])) +#print(toString(part_filt[1,])) +#print(toString(supp_filt[1,])) + + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- + +# Join LINEORDER table with CUST, SUPPLIER, PART, DATE tables (star schema) +# Join order does matter! +# WHERE LO_CUSTKEY = C_CUSTKEY +lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +# WHERE LO_SUPPKEY = S_SUPPKEY +lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=5, method="hash2"); +# WHERE LO_PARTKEY = P_PARTKEY +lo_cust_supp_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lo_cust_supp, colB=5, method="hash2"); +# WHERE LO_ORDERDATE = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (P_PARTKEY | S_SUPPKEY | C_CUSTKEY | C_NATION | +# LO_CUSTKEY | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE | LO_SUPPLYCOST) +joined_matrix = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_cust_supp_part, colB=8, method="hash2"); + +#print(toString(joined_matrix[1,])) + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +c_nat = joined_matrix[,6] +d_year = joined_matrix[,2] +lo_revenue = joined_matrix[,11] +lo_supplycost = joined_matrix[,12] +profit = lo_revenue - lo_supplycost; + +# CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 S_NATION +max_d_year = max(d_year); +max_c_nat= max(c_nat); + +d_year_scale_f = ceil(max_d_year) + 1; +c_nat_scale_f = ceil(max_c_nat) + 1; + +combined_key = d_year * c_nat_scale_f + c_nat; + +group_input = cbind(profit, combined_key) + +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); +#print(toString(agg_result[1,])); + +# Aggregation (SUM) +key = agg_result[, 1]; +profit = rowSums(agg_result[, 2:ncol(agg_result)]); + +# EXTRACTING D_YEAR, C_NATION +d_year = round(floor(key / (c_nat_scale_f))); +c_nat = round(floor((key %% (c_nat_scale_f)))); + +result = cbind(d_year, c_nat, profit, key); + +# -- Sorting -- -- Sorting int columns works, but strings do not. +# ORDER BY D_YEAR, C_NATION ASC +result_ordered = order(target=result, by=4, decreasing=FALSE, index.return=FALSE); + +c_nat_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=cust_nat_meta); + +res = cbind(as.frame(result_ordered[,1]), c_nat_dec, as.frame(result_ordered[,3])) ; + +# Print result +print("d_year | c_nation | PROFIT") +print(res) + +print("\nQ4.1 finished.\n"); \ No newline at end of file diff --git a/scripts/ssb/queries/q4_2.dml b/scripts/ssb/queries/q4_2.dml new file mode 100644 index 00000000000..418cfbec00e --- /dev/null +++ b/scripts/ssb/queries/q4_2.dml @@ -0,0 +1,220 @@ +/* DML-script implementing the ssb query Q4.2 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q4_2.dml -nvargs input_dir="/scripts/data/" + +SELECT + d_year, + s_nation, + p_category, + SUM(lo_revenue - lo_supplycost) AS PROFIT +FROM date, customer, supplier, part, lineorder --dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_partkey = p_partkey + AND lo_orderdate = d_datekey + AND c_region = 'AMERICA' + AND s_region = 'AMERICA' + AND ( + d_year = 1997 + OR d_year = 1998 + ) + AND ( + p_mfgr = 'MFGR#1' + OR p_mfgr = 'MFGR#2' + ) +GROUP BY d_year, s_nation, p_category +ORDER BY d_year, s_nation, p_category; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. +-> Sorting does not work. + +*Based on older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files. +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from tables. +# Extracted: COL-3 | COL-4 | COL-5 | COL-6 | COL-13 | COL-14 +# => LO_CUSTKEY | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | +# LO_REVENUE | LO_SUPPLYCOST +lineorder_csv_min = cbind(lineorder_csv[, 3], lineorder_csv[, 4], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13], lineorder_csv[, 14]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# Extracted: COL-1 | COL-5 +# => D_DATEKEY | D_YEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# -- Filter tables over string values. + +# WHERE D_YEAR = 1997 OR D_YEAR = 1998 +d_filtA = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1997); +d_filtB = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1998); + +d_filt = rbind(d_filtA,d_filtB) + +# Prepare PART table on-the-fly encodings +# Extracted: COL-1 | COL-3 | COL-4 +# P_PARTKEY | P_MFGR | P_CATEGORY +# (only need P_CATEGORY encoding, filter by P_MFGR string) +[part_cat_enc_f, part_cat_meta] = transformencode(target=part_csv[,4], spec=general_spec); + +# Build filtered PART table (p_category == 'MFGR#1' OR p_category == 'MFGR#2'), keeping key and encoded category +part_filt_keys = matrix(0, rows=0, cols=1); +part_filt_cat = matrix(0, rows=0, cols=1); +for (i in 1:nrow(part_csv)) { + p_elem = as.scalar(part_csv[i,3]) + if ( p_elem == "MFGR#1" | p_elem == "MFGR#2" ) { + key_val = as.double(as.scalar(part_csv[i,1])); + cat_code = as.double(as.scalar(part_cat_enc_f[i,1])); + part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); + part_filt_cat = rbind(part_filt_cat, matrix(cat_code, rows=1, cols=1)); + } +} +if (nrow(part_filt_keys) == 0) { + part_filt_keys = matrix(0, rows=1, cols=1); + part_filt_cat = matrix(0, rows=1, cols=1); +} +part_filt = cbind(part_filt_keys, part_filt_cat); + +# Prepare SUPPLIER table on-the-fly encodings +# Extracted: COL-1 | COL-5 | COL-6 +# S_SUPPKEY | S_NATION | S_REGION +# (only need S_NATION encoding, filter by S_REGION string) +[supp_nat_enc_f, supp_nat_meta] = transformencode(target=supp_csv[,5], spec=general_spec); + +# Build filtered SUPPLIER table (s_nation == 'AMERICA') +supp_filt_keys = matrix(0, rows=0, cols=1); +supp_filt_nat = matrix(0, rows=0, cols=1); +for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "AMERICA") { + key_val = as.double(as.scalar(supp_csv[i,1])); + nat_code = as.double(as.scalar(supp_nat_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_nat = rbind(supp_filt_nat, matrix(nat_code, rows=1, cols=1)); + } +} +if (nrow(supp_filt_keys) == 0) { + supp_filt_keys = matrix(0, rows=1, cols=1); + supp_filt_nat = matrix(0, rows=1, cols=1); +} +supp_filt = cbind(supp_filt_keys, supp_filt_nat); + +# Extracted: COL-1 | COL-6 +# C_CUSTKEY | C_REGION +# Build filtered CUSTOMER table (c_nation == 'AMERICA') +cust_filt = matrix(0, rows=0, cols=1); +for (i in 1:nrow(cust_csv)) { + if (as.scalar(cust_csv[i,6]) == "AMERICA") { + key_val = as.double(as.scalar(cust_csv[i,1])); + cust_filt = rbind(cust_filt, matrix(key_val, rows=1, cols=1)); + } +} +if (nrow(cust_filt) == 0) { + cust_filt = matrix(0, rows=1, cols=1); +} + +#print("LO,DATE,CUST,PART,SUPP") +#print(toString(lineorder_matrix_min[1,])) +#print(toString(date_matrix_min[1,])) +#print(toString(cust_filt[1,])) +#print(toString(part_filt[1,])) +#print(toString(supp_filt[1,])) + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- + +# Join LINEORDER table with CUST, SUPPLIER, PART, DATE tables (star schema) +# Join order does matter! +# WHERE LO_CUSTKEY = C_CUSTKEY +lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +# WHERE LO_SUPPKEY = S_SUPPKEY +lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); +# WHERE LO_PARTKEY = P_PARTKEY +lo_cust_supp_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lo_cust_supp, colB=5, method="hash2"); +# WHERE LO_ORDERDATE = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (P_PARTKEY | P_CATEGORY | (S_SUPPKEY | S_NATION | C_CUSTKEY | +# LO_CUSTKEY | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE | LO_SUPPLYCOST) +joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp_part, colB=9, method="hash2"); + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +d_year = joined_matrix[,2] +p_cat = joined_matrix[,4] +s_nat = joined_matrix[,6] +lo_revenue = joined_matrix[,12] +lo_supplycost = joined_matrix[,13] +profit = lo_revenue - lo_supplycost; + +# CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 S_NATION, 3 P_CATEGORY +max_d_year = max(d_year); +max_s_nat= max(s_nat); +max_p_cat = max(p_cat); + +d_year_scale_f = ceil(max_d_year) + 1; +s_nat_scale_f = ceil(max_s_nat) + 1; +p_cat_scale_f = ceil(max_p_cat) + 1; + +combined_key = d_year * s_nat_scale_f * p_cat_scale_f + s_nat * p_cat_scale_f + p_cat; + +group_input = cbind(profit, combined_key) + +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); +#print(toString(agg_result[1,])); + +# Aggregation (SUM) +key = agg_result[, 1]; +profit = rowSums(agg_result[, 2:ncol(agg_result)]); + +# EXTRACTING D_YEAR, S_NATION, P_CATEGORY +d_year = round(floor(key / (s_nat_scale_f * p_cat_scale_f))); +s_nat = round(floor((key %% (s_nat_scale_f * p_cat_scale_f)) / p_cat_scale_f)); +p_cat = round(key %% p_cat_scale_f); + +result = cbind(d_year, s_nat, p_cat, profit, key); + +# -- Sorting -- -- Sorting int columns works, but strings do not. +# ORDER BY D_YEAR, S_NATION, P_CATEGORY ASC +result_ordered = order(target=result, by=5, decreasing=FALSE, index.return=FALSE); + +s_nat_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_nat_meta); +p_cat_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_cat_meta); + +res = cbind(as.frame(result_ordered[,1]), s_nat_dec, p_cat_dec, as.frame(result_ordered[,4])) ; + +# Print result +print("d_year | s_nation | p_category | PROFIT") +print(res) + +print("\nQ4.2 finished.\n"); \ No newline at end of file diff --git a/scripts/ssb/queries/q4_3.dml b/scripts/ssb/queries/q4_3.dml new file mode 100644 index 00000000000..605f153893a --- /dev/null +++ b/scripts/ssb/queries/q4_3.dml @@ -0,0 +1,208 @@ +/* DML-script implementing the ssb query Q4.3 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q4_3.dml -nvargs input_dir="/scripts/data/" + +SELECT + d_year, + s_city, + p_brand, + SUM(lo_revenue - lo_supplycost) AS PROFIT +FROM date, customer, supplier, part, lineorder -- dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_partkey = p_partkey + AND lo_orderdate = d_datekey + AND s_nation = 'UNITED STATES' + AND ( + d_year = 1997 + OR d_year = 1998 + ) + AND p_category = 'MFGR#14' +GROUP BY d_year, s_city, p_brand +ORDER BY d_year, s_city, p_brand; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. +-> Sorting does not work. + +*Based on older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files from date and lineorder. +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from tables. +# Extracted: COL-3 | COL-4 | COL-5 | COL-6 | COL-13 | COL-14 +# => LO_CUSTKEY | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | +# LO_REVENUE | LO_SUPPLYCOST +lineorder_csv_min = cbind(lineorder_csv[, 3], lineorder_csv[, 4], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13], lineorder_csv[, 14]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# Extracted: COL-1 | COL-5 +# => D_DATEKEY | D_YEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# Extracted: COL-1 +# => C_CUSTKEY +cust_matrix_min = as.matrix(cust_csv[, 1]); + +# -- Filter tables over string values. + +# WHERE D_YEAR = 1997 OR D_YEAR = 1998 +d_filtA = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1997); +d_filtB = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1998); + +d_filt = rbind(d_filtA,d_filtB) + +# Prepare PART table on-the-fly encodings +# Extracted: COL-1 | COL-5 +# P_PARTKEY | P_BRAND +# (only need p_brand encoding, filter by p_category string) +[part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); +#print(toString(part_brand_enc_f)); + +# Build filtered PART table (p_brand == 'MFGR#14'), keeping key and encoded brand +part_filt_keys = matrix(0, rows=0, cols=1); +part_filt_brand = matrix(0, rows=0, cols=1); +for (i in 1:nrow(part_csv)) { + p_elem = as.scalar(part_csv[i,4]) + if ( p_elem == "MFGR#14" ) { + key_val = as.double(as.scalar(part_csv[i,1])); + brand_code = as.double(as.scalar(part_brand_enc_f[i,1])); + part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); + part_filt_brand = rbind(part_filt_brand, matrix(brand_code, rows=1, cols=1)); + } +} +if (nrow(part_filt_keys) == 0) { + part_filt_keys = matrix(0, rows=1, cols=1); + part_filt_brand = matrix(0, rows=1, cols=1); +} +part_filt = cbind(part_filt_keys, part_filt_brand); +#print(part_filt[1,]) + +# Prepare SUPPLIER table on-the-fly encodings +# Extracted: COL-1 | COL-4 | COL-5 +# S_SUPPKEY | S_CITY | S_NATION +# (only need S_CITY encoding, filter by S_NATION string) +[supp_city_enc_f, supp_city_meta] = transformencode(target=supp_csv[,4], spec=general_spec); + +# Build filtered SUPPLIER table (S_NATION == 'UNITED STATES') +supp_filt_keys = matrix(0, rows=0, cols=1); +supp_filt_city = matrix(0, rows=0, cols=1); +for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,5]) == "UNITED STATES") { + key_val = as.double(as.scalar(supp_csv[i,1])); + city_code = as.double(as.scalar(supp_city_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); + } +} +if (nrow(supp_filt_keys) == 0) { + supp_filt_keys = matrix(0, rows=1, cols=1); + supp_filt_city = matrix(0, rows=1, cols=1); +} +supp_filt = cbind(supp_filt_keys, supp_filt_city); + +#print("LO,DATE,CUST,PART,SUPP") +#print(toString(lineorder_matrix_min[1,])) +#print(toString(date_matrix_min[1,])) +#print(toString(cust_matrix_min[1,])) +#print(toString(part_filt[1,])) +#print(toString(supp_filt[1,])) + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- + +# Join LINEORDER table with PART, SUPPLIER, DATE, CUST tables (star schema) +# Join order does matter! +# WHERE LO_PARTKEY = P_PARTKEY +lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=2, method="hash2"); +# WHERE LO_SUPPKEY = S_SUPPKEY +lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=5, method="hash2"); +# WHERE LO_ORDERDATE = D_DATEKEY +lo_part_supp_date = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_part_supp, colB=8, method="hash2"); +# WHERE LO_CUSTKEY = C_CUSTKEY +# (C_CUSTKEY) | (D_DATEKEY | D_YEAR | S_SUPPKEY | S_CITY | P_PARTKEY | P_BRAND | +# LO_CUSTKEY | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE | LO_SUPPLYCOST) +joined_matrix = raJoin::m_raJoin(A=cust_matrix_min, colA=1, B=lo_part_supp_date, colB=7, method="hash2"); +#print(toString(joined_matrix[1,])) + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +d_year = joined_matrix[,3] +s_city = joined_matrix[,5] +p_brand = joined_matrix[,7] +lo_revenue = joined_matrix[,12] +lo_supplycost = joined_matrix[,13] +profit = lo_revenue - lo_supplycost; + +# CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 S_CITY, 3 P_BRAND + +max_p_brand = max(p_brand); +max_s_city= max(s_city); +max_d_year = max(d_year); + +p_brand_scale_f = ceil(max_p_brand) + 1; +s_city_scale_f = ceil(max_s_city) + 1; +d_year_scale_f = ceil(max_d_year) + 1; + +combined_key = d_year * s_city_scale_f * p_brand_scale_f + s_city * p_brand_scale_f + p_brand; + +group_input = cbind(profit, combined_key) +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + +# Aggregation (SUM) +key = agg_result[, 1]; +profit = rowSums(agg_result[, 2:ncol(agg_result)]); + +# EXTRACTING D_YEAR, S_CITY, P_BRAND +d_year = round(floor(key / (s_city_scale_f * p_brand_scale_f))); +s_city = round(floor((key %% (s_city_scale_f * p_brand_scale_f)) / p_brand_scale_f)); +p_brand = round(key %% p_brand_scale_f); + +result = cbind(d_year, s_city, p_brand, profit, key); + +# -- Sorting -- -- Sorting int columns works, but strings do not. +# ORDER BY D_YEAR, S_CITY, P_BRAND ASC +result_ordered = order(target=result, by=5, decreasing=FALSE, index.return=FALSE); + +s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); +p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); + +res = cbind(as.frame(result_ordered[,1]), s_city_dec, p_brand_dec, as.frame(result_ordered[,4])) ; + +# Print result +print("d_year | s_city | p_brand | PROFIT"); +print(res); + +print("\nQ4.3 finished.\n"); diff --git a/scripts/ssb/sql/q2.2.sql b/scripts/ssb/sql/q2.2.sql index e283dbdb059..739459b4980 100644 --- a/scripts/ssb/sql/q2.2.sql +++ b/scripts/ssb/sql/q2.2.sql @@ -15,7 +15,7 @@ -- specific language governing permissions and limitations -- under the License. SELECT SUM(lo_revenue), d_year, p_brand -FROM lineorder, dates, part, supplier +FROM lineorder, date, part, supplier --dates WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey diff --git a/scripts/ssb/sql/q3.1.sql b/scripts/ssb/sql/q3.1.sql index d6743379958..62ef25f4351 100644 --- a/scripts/ssb/sql/q3.1.sql +++ b/scripts/ssb/sql/q3.1.sql @@ -19,7 +19,7 @@ SELECT s_nation, d_year, SUM(lo_revenue) AS REVENUE -FROM customer, lineorder, supplier, dates +FROM customer, lineorder, supplier, date --dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey diff --git a/scripts/ssb/sql/q3.3.sql b/scripts/ssb/sql/q3.3.sql index ac1cb324d09..9cabdcc3164 100644 --- a/scripts/ssb/sql/q3.3.sql +++ b/scripts/ssb/sql/q3.3.sql @@ -19,7 +19,7 @@ SELECT s_city, d_year, SUM(lo_revenue) AS REVENUE -FROM customer, lineorder, supplier, dates +FROM customer, lineorder, supplier, date --dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey diff --git a/scripts/ssb/sql/q3.4.sql b/scripts/ssb/sql/q3.4.sql index 2be6a5cd70a..093e01c42e5 100644 --- a/scripts/ssb/sql/q3.4.sql +++ b/scripts/ssb/sql/q3.4.sql @@ -19,7 +19,7 @@ SELECT s_city, d_year, SUM(lo_revenue) AS REVENUE -FROM customer, lineorder, supplier, dates +FROM customer, lineorder, supplier, date -- dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey diff --git a/scripts/ssb/sql/q4.1.sql b/scripts/ssb/sql/q4.1.sql index d6efe570a37..6c4dbeb4f21 100644 --- a/scripts/ssb/sql/q4.1.sql +++ b/scripts/ssb/sql/q4.1.sql @@ -18,7 +18,7 @@ SELECT d_year, c_nation, SUM(lo_revenue - lo_supplycost) AS PROFIT -FROM dates, customer, supplier, part, lineorder +FROM date, customer, supplier, part, lineorder -- dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey diff --git a/scripts/ssb/sql/q4.2.sql b/scripts/ssb/sql/q4.2.sql index c2f1a0ffddd..6183b75ee04 100644 --- a/scripts/ssb/sql/q4.2.sql +++ b/scripts/ssb/sql/q4.2.sql @@ -19,7 +19,7 @@ SELECT s_nation, p_category, SUM(lo_revenue - lo_supplycost) AS PROFIT -FROM dates, customer, supplier, part, lineorder +FROM date, customer, supplier, part, lineorder --dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey diff --git a/scripts/ssb/sql/q4.3.sql b/scripts/ssb/sql/q4.3.sql index f593a10291b..20692b043c7 100644 --- a/scripts/ssb/sql/q4.3.sql +++ b/scripts/ssb/sql/q4.3.sql @@ -19,7 +19,7 @@ SELECT s_city, p_brand, SUM(lo_revenue - lo_supplycost) AS PROFIT -FROM dates, customer, supplier, part, lineorder +FROM date, customer, supplier, part, lineorder -- dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey From 0d4baafac8a14dd2e5e64e626b5aba5ad98eec09 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Sun, 7 Dec 2025 00:57:05 +0100 Subject: [PATCH 10/22] q3_1,q3_2,q3_3 works on scale 0.1, q3_4 to test on larger data for verification. TO DO: Dealing with empty tables. --- scripts/ssb/queries/q3_1.dml | 194 ++++++++++++++++++++++++++++++ scripts/ssb/queries/q3_2.dml | 191 ++++++++++++++++++++++++++++++ scripts/ssb/queries/q3_3.dml | 201 +++++++++++++++++++++++++++++++ scripts/ssb/queries/q3_4.dml | 221 +++++++++++++++++++++++++++++++++++ scripts/ssb/queries/q4_1.dml | 4 +- scripts/ssb/queries/q4_3.dml | 9 +- scripts/ssb/sql/q3.2.sql | 2 +- 7 files changed, 814 insertions(+), 8 deletions(-) create mode 100644 scripts/ssb/queries/q3_1.dml create mode 100644 scripts/ssb/queries/q3_2.dml create mode 100644 scripts/ssb/queries/q3_3.dml create mode 100644 scripts/ssb/queries/q3_4.dml diff --git a/scripts/ssb/queries/q3_1.dml b/scripts/ssb/queries/q3_1.dml new file mode 100644 index 00000000000..f45a12d2c56 --- /dev/null +++ b/scripts/ssb/queries/q3_1.dml @@ -0,0 +1,194 @@ +/* DML-script implementing the ssb query Q3.1 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q3_1.dml -nvargs input_dir="/scripts/data/" + +SELECT + c_nation, + s_nation, + d_year, + SUM(lo_revenue) AS REVENUE +FROM customer, lineorder, supplier, date --dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_orderdate = d_datekey + AND c_region = 'ASIA' + AND s_region = 'ASIA' + AND d_year >= 1992 + AND d_year <= 1997 +GROUP BY c_nation, s_nation, d_year +ORDER BY d_year ASC, REVENUE DESC; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. +-> Sorting does not work. + +*Based on older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files. +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from tables. +# Extracted: COL-3 | COL-5 | COL-6 | COL-13 +# => LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE +lineorder_csv_min = cbind(lineorder_csv[, 3], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# Extracted: COL-1 | COL-5 +# => D_DATEKEY | D_YEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# -- Filter tables over string values. + +# WHERE D_YEAR >= 1992 AND D_YEAR <= 1997 +d_filt = raSel::m_raSelection(date_matrix_min, col=2, op=">=", val=1992); +d_filt = raSel::m_raSelection(d_filt, col=2, op="<=", val=1997); + +# Prepare SUPPLIER table on-the-fly encodings +# Extracted: COL-1 | COL-5 | COL-6 +# S_SUPPKEY | S_NATION | S_REGION +# (only need S_NATION encoding, filter by S_REGION string) +[supp_nat_enc_f, supp_nat_meta] = transformencode(target=supp_csv[,5], spec=general_spec); + +# Build filtered SUPPLIER table (S_REGION == 'ASIA') +supp_filt_keys = matrix(0, rows=0, cols=1); +supp_filt_nat = matrix(0, rows=0, cols=1); +for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "ASIA") { + key_val = as.double(as.scalar(supp_csv[i,1])); + nat_code = as.double(as.scalar(supp_nat_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_nat = rbind(supp_filt_nat, matrix(nat_code, rows=1, cols=1)); + } +} +if (nrow(supp_filt_keys) == 0) { + supp_filt_keys = matrix(0, rows=1, cols=1); + supp_filt_nat = matrix(0, rows=1, cols=1); +} +supp_filt = cbind(supp_filt_keys, supp_filt_nat); + +# Prepare CUSTOMER table on-the-fly encodings +# Extracted: COL-1 | COL-5 | COL-6 +# C_CUSTKEY | C_NATION | C_REGION +# (only need C_NATION encoding, filter by C_REGION string) +[cust_nat_enc_f, cust_nat_meta] = transformencode(target=cust_csv[,5], spec=general_spec); + +# Build filtered CUSTOMER table (C_REGION = 'ASIA') +cust_filt_keys = matrix(0, rows=0, cols=1); +cust_filt_nat = matrix(0, rows=0, cols=1); +for (i in 1:nrow(cust_csv)) { + if (as.scalar(cust_csv[i,6]) == "ASIA") { + key_val = as.double(as.scalar(cust_csv[i,1])); + nat_code = as.double(as.scalar(cust_nat_enc_f[i,1])); + cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); + cust_filt_nat = rbind(cust_filt_nat, matrix(nat_code, rows=1, cols=1)); + } +} +if (nrow(cust_filt_keys) == 0) { + cust_filt_keys = matrix(0, rows=1, cols=1); + cust_filt_nat = matrix(0, rows=1, cols=1); +} +cust_filt = cbind(cust_filt_keys,cust_filt_nat); + +#print("LO,DATE,CUST,SUPP") +#print(toString(lineorder_matrix_min[1,])) +#print(toString(date_matrix_min[1,])) +#print(toString(cust_filt[1,])) +#print(toString(supp_filt[1,])) + + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- + +# Join LINEORDER table with CUST, SUPPLIER, DATE tables (star schema) +# Join order does matter! +# WHERE LO_CUSTKEY = C_CUSTKEY +lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +# WHERE LO_SUPPKEY = S_SUPPKEY +lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); +print(toString(lo_cust_supp[1,])) + +# WHERE LO_ORDERDATE = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | S_NATION | C_CUSTKEY | C_NATION | +# LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) +joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); +#print(toString(joined_matrix[1,])) + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +d_year = joined_matrix[,2]; +s_nat = joined_matrix[,4]; +c_nat = joined_matrix[,6]; +revenue = joined_matrix[,10]; + +# CALCULATING COMBINATION KEY WITH PRIORITY:1 C_NATION, 2 S_NATION, D_YEAR +max_c_nat= max(c_nat); +max_s_nat= max(s_nat); +max_d_year = max(d_year); + +c_nat_scale_f = ceil(max_c_nat) + 1; +s_nat_scale_f = ceil(max_s_nat) + 1; +d_year_scale_f = ceil(max_d_year) + 1; + +combined_key = c_nat * s_nat_scale_f * d_year_scale_f + s_nat * d_year_scale_f + d_year; + +group_input = cbind(revenue, combined_key) + +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); +#print(toString(agg_result[1,])); + +# Aggregation (SUM) +key = agg_result[, 1]; +revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + +# EXTRACTING C_NATION, S_NATION, D_YEAR +c_nat = round(floor(key / (s_nat_scale_f * d_year_scale_f))); +s_nat = round(floor((key %% (s_nat_scale_f * d_year_scale_f)) / d_year_scale_f)); +d_year = round(key %% d_year_scale_f); + +result = cbind(c_nat, s_nat, d_year, revenue, key) + +# -- Sorting -- -- Sorting int columns works, but strings do not. +# ORDER BY D_YEAR ASC, REVENUE DESC +result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); +result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); + +c_nat_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_nat_meta); +s_nat_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_nat_meta); + +res = cbind(c_nat_dec, s_nat_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; + +# Print result +print("c_nation | s_nation | d_year | REVENUE") +print(res) + +print("\nQ3.1 finished.\n"); diff --git a/scripts/ssb/queries/q3_2.dml b/scripts/ssb/queries/q3_2.dml new file mode 100644 index 00000000000..b40ff5abb71 --- /dev/null +++ b/scripts/ssb/queries/q3_2.dml @@ -0,0 +1,191 @@ +/* DML-script implementing the ssb query Q3.2 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q3_2.dml -nvargs input_dir="/scripts/data/" + +SELECT + c_city, + s_city, + d_year, + SUM(lo_revenue) AS REVENUE +FROM customer, lineorder, supplier, date -- dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_orderdate = d_datekey + AND c_nation = 'UNITED STATES' + AND s_nation = 'UNITED STATES' + AND d_year >= 1992 + AND d_year <= 1997 +GROUP BY c_city, s_city, d_year +ORDER BY d_year ASC, REVENUE DESC; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. +-> Sorting does not work. + +*Based on older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files. +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from tables. +# Extracted: COL-3 | COL-5 | COL-6 | COL-13 +# => LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE +lineorder_csv_min = cbind(lineorder_csv[, 3], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# Extracted: COL-1 | COL-5 +# => D_DATEKEY | D_YEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# -- Filter tables over string values. + +# WHERE D_YEAR >= 1992 AND D_YEAR <= 1997 +d_filt = raSel::m_raSelection(date_matrix_min, col=2, op=">=", val=1992); +d_filt = raSel::m_raSelection(d_filt, col=2, op="<=", val=1997); + +# Prepare SUPPLIER table on-the-fly encodings +# Extracted: COL-1 | COL-4 | COL-5 +# S_SUPPKEY | S_CITY | S_REGION +# (only need S_CITY encoding, filter by S_NATION string) +[supp_city_enc_f, supp_city_meta] = transformencode(target=supp_csv[,4], spec=general_spec); + +# Build filtered SUPPLIER table (C_NATION = 'UNITED STATES') +supp_filt_keys = matrix(0, rows=0, cols=1); +supp_filt_city = matrix(0, rows=0, cols=1); +for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,5]) == "UNITED STATES") { + key_val = as.double(as.scalar(supp_csv[i,1])); + city_code = as.double(as.scalar(supp_city_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); + } +} +if (nrow(supp_filt_keys) == 0) { + supp_filt_keys = matrix(0, rows=1, cols=1); + supp_filt_city = matrix(0, rows=1, cols=1); +} +supp_filt = cbind(supp_filt_keys, supp_filt_city); + +# Prepare CUSTOMER table on-the-fly encodings +# Extracted: COL-1 | COL-5 | COL-6 +# C_CUSTKEY | C_CITY | C_NATION +# (only need C_CITY encoding, filter by C_NATION string) +[cust_city_enc_f, cust_city_meta] = transformencode(target=cust_csv[,4], spec=general_spec); + +# Build filtered CUSTOMER table (C_NATION = 'UNITED STATES') +cust_filt_keys = matrix(0, rows=0, cols=1); +cust_filt_city = matrix(0, rows=0, cols=1); +for (i in 1:nrow(cust_csv)) { + if (as.scalar(cust_csv[i,5]) == "UNITED STATES") { + key_val = as.double(as.scalar(cust_csv[i,1])); + city_code = as.double(as.scalar(cust_city_enc_f[i,1])); + cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); + cust_filt_city = rbind(cust_filt_city, matrix(city_code, rows=1, cols=1)); + } +} +if (nrow(cust_filt_keys) == 0) { + cust_filt_keys = matrix(0, rows=1, cols=1); + cust_filt_city = matrix(0, rows=1, cols=1); +} +cust_filt = cbind(cust_filt_keys,cust_filt_city); + +#print("LO,DATE,CUST,SUPP") +#print(toString(lineorder_matrix_min[1,])) +#print(toString(date_matrix_min[1,])) +#print(toString(cust_filt[1,])) +#print(toString(supp_filt[1,])) + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- + +# Join LINEORDER table with CUST, SUPPLIER, DATE tables (star schema) +# Join order does matter! +# WHERE LO_CUSTKEY = C_CUSTKEY +lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +# WHERE LO_SUPPKEY = S_SUPPKEY +lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); +# WHERE LO_ORDERDATE = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | S_CITY | C_CUSTKEY | C_CITY | +# LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) +joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); +print(toString(joined_matrix[1,])) + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +d_year = joined_matrix[,2]; +s_city = joined_matrix[,4]; +c_city = joined_matrix[,6]; +revenue = joined_matrix[,10]; + +# CALCULATING COMBINATION KEY WITH PRIORITY:1 C_CITY, 2 S_CITY, D_YEAR +max_c_city= max(c_city); +max_s_city= max(s_city); +max_d_year = max(d_year); + +c_city_scale_f = ceil(max_c_city) + 1; +s_city_scale_f = ceil(max_s_city) + 1; +d_year_scale_f = ceil(max_d_year) + 1; + +combined_key = c_city * s_city_scale_f * d_year_scale_f + s_city * d_year_scale_f + d_year; + +group_input = cbind(revenue, combined_key) + +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); +#print(toString(agg_result[1,])); + +# Aggregation (SUM) +key = agg_result[, 1]; +revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + +# EXTRACTING C_CITY, S_CITY, D_YEAR +c_city = round(floor(key / (s_city_scale_f * d_year_scale_f))); +s_city = round(floor((key %% (s_city_scale_f * d_year_scale_f)) / d_year_scale_f)); +d_year = round(key %% d_year_scale_f); + +result = cbind(c_city, s_city, d_year, revenue, key) + +# -- Sorting -- -- Sorting int columns works, but strings do not. +# ORDER BY D_YEAR ASC, REVENUE DESC +result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); +result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); + +c_city_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_city_meta); +s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); + +res = cbind(c_city_dec, s_city_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; + +# Print result +print("c_city | s_city | d_year | REVENUE") +print(res) + +print("\nQ3.2 finished.\n"); diff --git a/scripts/ssb/queries/q3_3.dml b/scripts/ssb/queries/q3_3.dml new file mode 100644 index 00000000000..1570a3336d0 --- /dev/null +++ b/scripts/ssb/queries/q3_3.dml @@ -0,0 +1,201 @@ +/* DML-script implementing the ssb query Q3.3 in SystemDS. +**input_dir="/scripts/ssb/data" + +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q3_2.dml -nvargs input_dir="/scripts/data/" + +SELECT + c_city, + s_city, + d_year, + SUM(lo_revenue) AS REVENUE +FROM customer, lineorder, supplier, date --dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_orderdate = d_datekey + AND ( + c_city = 'UNITED KI1' + OR c_city = 'UNITED KI5' + ) + AND ( + s_city = 'UNITED KI1' + OR s_city = 'UNITED KI5' + ) + AND d_year >= 1992 + AND d_year <= 1997 +GROUP BY c_city, s_city, d_year +ORDER BY d_year ASC, REVENUE DESC; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. +-> Sorting does not work. + +*Based on older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files. +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from tables. +# Extracted: COL-3 | COL-5 | COL-6 | COL-13 +# => LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE +lineorder_csv_min = cbind(lineorder_csv[, 3], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# Extracted: COL-1 | COL-5 +# => D_DATEKEY | D_YEAR +date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); +date_matrix_min = as.matrix(date_csv_min); + +# -- Filter tables over string values. + +# WHERE D_YEAR >= 1992 AND D_YEAR <= 1997 +d_filt = raSel::m_raSelection(date_matrix_min, col=2, op=">=", val=1992); +d_filt = raSel::m_raSelection(d_filt, col=2, op="<=", val=1997); + +# Prepare SUPPLIER table on-the-fly encodings +# Extracted: COL-1 | COL-4 +# S_SUPPKEY | S_CITY +# (only need S_CITY encoding, filter by S_CITY string itself) +[supp_city_enc_f, supp_city_meta] = transformencode(target=supp_csv[,4], spec=general_spec); + +# Build filtered SUPPLIER table (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') +supp_filt_keys = matrix(0, rows=0, cols=1); +supp_filt_city = matrix(0, rows=0, cols=1); +for (i in 1:nrow(supp_csv)) { + s_elem = as.scalar(supp_csv[i,4]) + if (s_elem == "UNITED KI1" | s_elem == "UNITED KI5") { + key_val = as.double(as.scalar(supp_csv[i,1])); + city_code = as.double(as.scalar(supp_city_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); + } +} +if (nrow(supp_filt_keys) == 0) { + supp_filt_keys = matrix(0, rows=1, cols=1); + supp_filt_city = matrix(0, rows=1, cols=1); +} +supp_filt = cbind(supp_filt_keys, supp_filt_city); + +# Prepare CUSTOMER table on-the-fly encodings +# Extracted: COL-1 | COL-4 +# C_CUSTKEY | C_CITY +# (only need C_CITY encoding, filter by C_CITY string itself) +[cust_city_enc_f, cust_city_meta] = transformencode(target=cust_csv[,4], spec=general_spec); + +# Build filtered CUSTOMER table (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') +cust_filt_keys = matrix(0, rows=0, cols=1); +cust_filt_city = matrix(0, rows=0, cols=1); +for (i in 1:nrow(cust_csv)) { + c_elem = as.scalar(cust_csv[i,4]) + if (c_elem == "UNITED KI1" | c_elem == "UNITED KI5") { + key_val = as.double(as.scalar(cust_csv[i,1])); + city_code = as.double(as.scalar(cust_city_enc_f[i,1])); + cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); + cust_filt_city = rbind(cust_filt_city, matrix(city_code, rows=1, cols=1)); + } +} +if (nrow(cust_filt_keys) == 0) { + cust_filt_keys = matrix(0, rows=1, cols=1); + cust_filt_city = matrix(0, rows=1, cols=1); +} +cust_filt = cbind(cust_filt_keys,cust_filt_city); + +#print("LO,DATE,CUST,SUPP") +#print(toString(lineorder_matrix_min[1,])) +#print(toString(date_matrix_min[1,])) +#print(toString(cust_filt[1,])) +#print(toString(supp_filt[1,])) + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- + +# Join LINEORDER table with CUST, SUPPLIER, DATE tables (star schema) +# Join order does matter! +# WHERE LO_CUSTKEY = C_CUSTKEY +lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +# WHERE LO_SUPPKEY = S_SUPPKEY + +lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); +# WHERE LO_ORDERDATE = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | S_CITY | C_CUSTKEY | C_CITY | +# LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) + +joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); +#print(toString(joined_matrix[1,])) + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +d_year = joined_matrix[,2]; +s_city = joined_matrix[,4]; +c_city = joined_matrix[,6]; +revenue = joined_matrix[,10]; + +# CALCULATING COMBINATION KEY WITH PRIORITY:1 C_CITY, 2 S_CITY, D_YEAR +max_c_city= max(c_city); +max_s_city= max(s_city); +max_d_year = max(d_year); + +c_city_scale_f = ceil(max_c_city) + 1; +s_city_scale_f = ceil(max_s_city) + 1; +d_year_scale_f = ceil(max_d_year) + 1; + +combined_key = c_city * s_city_scale_f * d_year_scale_f + s_city * d_year_scale_f + d_year; + +group_input = cbind(revenue, combined_key) + +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); +#print(toString(agg_result[1,])); + +# Aggregation (SUM) +key = agg_result[, 1]; +revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + +# EXTRACTING C_CITY, S_CITY, D_YEAR +c_city = round(floor(key / (s_city_scale_f * d_year_scale_f))); +s_city = round(floor((key %% (s_city_scale_f * d_year_scale_f)) / d_year_scale_f)); +d_year = round(key %% d_year_scale_f); + +result = cbind(c_city, s_city, d_year, revenue, key) + +# -- Sorting -- -- Sorting int columns works, but strings do not. +# ORDER BY D_YEAR ASC, REVENUE DESC +result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); +result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); + +c_city_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_city_meta); +s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); + +res = cbind(c_city_dec, s_city_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; + +# Print result +print("c_city | s_city | d_year | REVENUE") +print(res) + +print("\nQ3.3 finished.\n"); diff --git a/scripts/ssb/queries/q3_4.dml b/scripts/ssb/queries/q3_4.dml new file mode 100644 index 00000000000..5174dfe1da4 --- /dev/null +++ b/scripts/ssb/queries/q3_4.dml @@ -0,0 +1,221 @@ +/* DML-script implementing the ssb query Q3.3 in SystemDS. +**input_dir="/scripts/ssb/data" + +##TO DO +TO CHECK ON EMPTY TABLES with nrows else out of bounds statements. +Expecially for q3_2, q3_3, q3_4. +* Run with docker: +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q3_2.dml -nvargs input_dir="/scripts/data/" + +SELECT + c_city, + s_city, + d_year, + SUM(lo_revenue) AS REVENUE +FROM customer, lineorder, supplier, date --dates +WHERE + lo_custkey = c_custkey + AND lo_suppkey = s_suppkey + AND lo_orderdate = d_datekey + AND ( + c_city = 'UNITED KI1' + OR c_city = 'UNITED KI5' + ) + AND ( + s_city = 'UNITED KI1' + OR s_city = 'UNITED KI5' + ) + AND d_yearmonth = 'Dec1997' +GROUP BY c_city, s_city, d_year +ORDER BY d_year ASC, REVENUE DESC; + +*Please run the original SQL query (eg. in Postgres) +to verify the correctness of DML version. +-> First tests: Works on the dataset with scale factor 0.1. +-> Sorting does not work. + +*Based on older implementations. +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml +*Especially: +https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml +In comparison to older version the join method was changed +from sort-merge to hash2 to improve the performance. + +Input parameters: +input_dir - Path to input directory containing the table files (e.g., ./data) +*/ + +# Call ra-modules with ra-functions. +source("./scripts/builtin/raSelection.dml") as raSel +source("./scripts/builtin/raJoin.dml") as raJoin +source("./scripts/builtin/raGroupby.dml") as raGrp + +# Set input parameters. +input_dir = ifdef($input_dir, "./data"); +print("Loading tables from directory: " + input_dir); + +# Read and load input CSV files. +lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); + +general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; + +# -- Data preparation -- + +# Extract only the necessary columns from tables. +# Extracted: COL-3 | COL-5 | COL-6 | COL-13 +# => LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE +lineorder_csv_min = cbind(lineorder_csv[, 3], lineorder_csv[, 5], lineorder_csv[, 6], lineorder_csv[, 13]); +lineorder_matrix_min = as.matrix(lineorder_csv_min); + +# -- Filter tables over string values. + +# Extracted: COL-1 | COL-5 | COL-7 +# D_DATEKEY | D_YEAR | D_YEARMONTH +# (only need D_DATEKEY & D_YEAR, filter by D_YEARMONTH string) +# Build filtered DATE table (D_YEARMONTH = 'Dec1997') +d_filt_keys = matrix(0, rows=0, cols=1); +d_filt_year = matrix(0, rows=0, cols=1); +for (i in 1:nrow(date_csv)) { + if (as.scalar(date_csv[i,7]) == "Dec1997") { + key_val = as.double(as.scalar(date_csv[i,1])); + year_val = as.double(as.scalar(date_csv[i,5])); + d_filt_keys = rbind(d_filt_keys, matrix(key_val, rows=1, cols=1)); + d_filt_year = rbind(d_filt_year, matrix(year_val, rows=1, cols=1)); + } +} +if (nrow(d_filt_keys) == 0) { + d_filt_keys = matrix(0, rows=1, cols=1); + d_filt_year = matrix(0, rows=1, cols=1); +} +d_filt = cbind(d_filt_keys, d_filt_year); + +# Prepare SUPPLIER table on-the-fly encodings +# Extracted: COL-1 | COL-4 +# S_SUPPKEY | S_CITY +# (only need S_CITY encoding, filter by S_CITY string itself) +[supp_city_enc_f, supp_city_meta] = transformencode(target=supp_csv[,4], spec=general_spec); + +# Build filtered SUPPLIER table (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') +supp_filt_keys = matrix(0, rows=0, cols=1); +supp_filt_city = matrix(0, rows=0, cols=1); +for (i in 1:nrow(supp_csv)) { + s_elem = as.scalar(supp_csv[i,4]) + if (s_elem == "UNITED KI1" | s_elem == "UNITED KI5") { + key_val = as.double(as.scalar(supp_csv[i,1])); + city_code = as.double(as.scalar(supp_city_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); + } +} +if (nrow(supp_filt_keys) == 0) { + supp_filt_keys = matrix(0, rows=1, cols=1); + supp_filt_city = matrix(0, rows=1, cols=1); +} +supp_filt = cbind(supp_filt_keys, supp_filt_city); + +# Prepare CUSTOMER table on-the-fly encodings +# Extracted: COL-1 | COL-4 +# C_CUSTKEY | C_CITY +# (only need C_CITY encoding, filter by C_CITY string itself) +[cust_city_enc_f, cust_city_meta] = transformencode(target=cust_csv[,4], spec=general_spec); + +# Build filtered CUSTOMER table (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') +cust_filt_keys = matrix(0, rows=0, cols=1); +cust_filt_city = matrix(0, rows=0, cols=1); +for (i in 1:nrow(cust_csv)) { + c_elem = as.scalar(cust_csv[i,4]) + if (c_elem == "UNITED KI1" | c_elem == "UNITED KI5") { + key_val = as.double(as.scalar(cust_csv[i,1])); + city_code = as.double(as.scalar(cust_city_enc_f[i,1])); + cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); + cust_filt_city = rbind(cust_filt_city, matrix(city_code, rows=1, cols=1)); + } +} +if (nrow(cust_filt_keys) == 0) { + cust_filt_keys = matrix(0, rows=1, cols=1); + cust_filt_city = matrix(0, rows=1, cols=1); +} +cust_filt = cbind(cust_filt_keys,cust_filt_city); + +#print("LO,DATE,CUST,SUPP") +#print(toString(lineorder_matrix_min[1,])) +#print(toString(date_matrix_min[1,])) +#print(toString(cust_filt[1,])) +#print(toString(supp_filt[1,])) + +# -- JOIN TABLES WITH RA-JOIN FUNCTION -- + +# Join LINEORDER table with CUST, SUPPLIER, DATE tables (star schema) +# Join order does matter! +# WHERE LO_CUSTKEY = C_CUSTKEY +lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +# WHERE LO_SUPPKEY = S_SUPPKEY + +lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); +# WHERE LO_ORDERDATE = D_DATEKEY +# (D_DATEKEY | D_YEAR) | (S_SUPPKEY | S_CITY | C_CUSTKEY | C_CITY | +# LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) + +joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); +#print(toString(joined_matrix[1,])); + +if(nrow(joined_matrix[,1]) == 0){ + print("c_city | s_city | d_year | REVENUE"); + print("The result table has 0 rows."); +} +else{ + +# -- Group-By and Aggregation (SUM)-- + +# Group-By +d_year = joined_matrix[,2]; +s_city = joined_matrix[,4]; +c_city = joined_matrix[,6]; +revenue = joined_matrix[,10]; + +# CALCULATING COMBINATION KEY WITH PRIORITY:1 C_CITY, 2 S_CITY, D_YEAR +max_c_city= max(c_city); +max_s_city= max(s_city); +max_d_year = max(d_year); + +c_city_scale_f = ceil(max_c_city) + 1; +s_city_scale_f = ceil(max_s_city) + 1; +d_year_scale_f = ceil(max_d_year) + 1; + +combined_key = c_city * s_city_scale_f * d_year_scale_f + s_city * d_year_scale_f + d_year; + +group_input = cbind(revenue, combined_key) + +agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); +#print(toString(agg_result[1,])); + +# Aggregation (SUM) +key = agg_result[, 1]; +revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + +# EXTRACTING C_CITY, S_CITY, D_YEAR +c_city = round(floor(key / (s_city_scale_f * d_year_scale_f))); +s_city = round(floor((key %% (s_city_scale_f * d_year_scale_f)) / d_year_scale_f)); +d_year = round(key %% d_year_scale_f); + +result = cbind(c_city, s_city, d_year, revenue, key) + +# -- Sorting -- -- Sorting int columns works, but strings do not. +# ORDER BY D_YEAR ASC, REVENUE DESC +result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); +result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); + +c_city_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_city_meta); +s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); + +res = cbind(c_city_dec, s_city_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; + +# Print result +print("c_city | s_city | d_year | REVENUE") +print(res) + +print("\nQ3.3 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q4_1.dml b/scripts/ssb/queries/q4_1.dml index ac05ebdf6f9..1ecfc586ab7 100644 --- a/scripts/ssb/queries/q4_1.dml +++ b/scripts/ssb/queries/q4_1.dml @@ -92,7 +92,7 @@ if (nrow(part_filt) == 0) { # Extracted: COL-1 | COL-6 # S_SUPPKEY | S_REGION -# Build filtered SUPPLIER table (S_NATION = 'AMERICA') +# Build filtered SUPPLIER table (S_REGION = 'AMERICA') supp_filt = matrix(0, rows=0, cols=1); for (i in 1:nrow(supp_csv)) { if (as.scalar(supp_csv[i,6]) == "AMERICA") { @@ -110,7 +110,7 @@ if (nrow(supp_filt) == 0) { # (only need C_NATION encoding, filter by C_REGION string) [cust_nat_enc_f, cust_nat_meta] = transformencode(target=cust_csv[,5], spec=general_spec); -# Build filtered CUSTOMER table (C_NATION = 'AMERICA') +# Build filtered CUSTOMER table (C_REGION = 'AMERICA') cust_filt_keys = matrix(0, rows=0, cols=1); cust_filt_nat = matrix(0, rows=0, cols=1); for (i in 1:nrow(cust_csv)) { diff --git a/scripts/ssb/queries/q4_3.dml b/scripts/ssb/queries/q4_3.dml index 605f153893a..583f0db901f 100644 --- a/scripts/ssb/queries/q4_3.dml +++ b/scripts/ssb/queries/q4_3.dml @@ -167,14 +167,13 @@ lo_supplycost = joined_matrix[,13] profit = lo_revenue - lo_supplycost; # CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 S_CITY, 3 P_BRAND - -max_p_brand = max(p_brand); -max_s_city= max(s_city); max_d_year = max(d_year); +max_s_city= max(s_city); +max_p_brand = max(p_brand); -p_brand_scale_f = ceil(max_p_brand) + 1; -s_city_scale_f = ceil(max_s_city) + 1; d_year_scale_f = ceil(max_d_year) + 1; +s_city_scale_f = ceil(max_s_city) + 1; +p_brand_scale_f = ceil(max_p_brand) + 1; combined_key = d_year * s_city_scale_f * p_brand_scale_f + s_city * p_brand_scale_f + p_brand; diff --git a/scripts/ssb/sql/q3.2.sql b/scripts/ssb/sql/q3.2.sql index 2969efb1a2f..c961d612e43 100644 --- a/scripts/ssb/sql/q3.2.sql +++ b/scripts/ssb/sql/q3.2.sql @@ -19,7 +19,7 @@ SELECT s_city, d_year, SUM(lo_revenue) AS REVENUE -FROM customer, lineorder, supplier, dates +FROM customer, lineorder, supplier, date -- dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey From aa3f534db5ee8eb82d720eceddba16c6eabbfef4 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Mon, 8 Dec 2025 00:44:53 +0100 Subject: [PATCH 11/22] Updated version for empty tables. --- scripts/ssb/queries/q1_1.dml | 68 ++++++++---- scripts/ssb/queries/q1_2.dml | 83 +++++++++----- scripts/ssb/queries/q1_3.dml | 75 ++++++++----- scripts/ssb/queries/q2_1.dml | 124 +++++++++++++-------- scripts/ssb/queries/q2_2.dml | 127 ++++++++++++++-------- scripts/ssb/queries/q2_3.dml | 125 +++++++++++++-------- scripts/ssb/queries/q3_1.dml | 171 +++++++++++++++++------------ scripts/ssb/queries/q3_2.dml | 175 ++++++++++++++++++------------ scripts/ssb/queries/q3_3.dml | 179 ++++++++++++++++++------------ scripts/ssb/queries/q3_4.dml | 189 ++++++++++++++++++-------------- scripts/ssb/queries/q4_1.dml | 156 +++++++++++++++++---------- scripts/ssb/queries/q4_2.dml | 204 +++++++++++++++++++++-------------- scripts/ssb/queries/q4_3.dml | 177 ++++++++++++++++++------------ 13 files changed, 1155 insertions(+), 698 deletions(-) diff --git a/scripts/ssb/queries/q1_1.dml b/scripts/ssb/queries/q1_1.dml index 3c0a87839fa..8a55acef472 100644 --- a/scripts/ssb/queries/q1_1.dml +++ b/scripts/ssb/queries/q1_1.dml @@ -37,6 +37,9 @@ print("Loading tables from directory: " + input_dir); date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. +hasRows = 1; # If hasRows = 0, the result table is empty. + # -- Data preparation -- # Extract only the necessary columns from date and lineorder table. @@ -54,38 +57,61 @@ lineorder_matrix_min = as.matrix(lineorder_csv_min); # D_YEAR = 1993 d_year_filt = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1993); - +if( as.scalar(d_year_filt[1,1]) == 0){ + hasRows = 0; +} # LO_QUANTITY < 25 -lo_filt = raSel::m_raSelection(lineorder_matrix_min, col=2, op="<", val=25); - +if(hasRows){ + lo_filt = raSel::m_raSelection(lineorder_matrix_min, col=2, op="<", val=25); + if( as.scalar(lo_filt[1,1]) == 0){ + hasRows = 0; + } +} # LO_DISCOUNT BETWEEN 1 AND 3 -lo_filt = raSel::m_raSelection(lo_filt, col=4, op=">=", val=1); -lo_filt = raSel::m_raSelection(lo_filt, col=4, op="<=", val=3); - -# Minimize LO TABLE -# => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT -lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); +if(hasRows){ + lo_filt = raSel::m_raSelection(lo_filt, col=4, op=">=", val=1); + lo_filt = raSel::m_raSelection(lo_filt, col=4, op="<=", val=3); + if( as.scalar(lo_filt[1,1]) == 0){ + hasRows = 0; + } + else{ + # Minimize LO TABLE + # => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT + lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); + } +} # -- Join -- # Join LINEORDER and DATE tables with RA-JOIN function +joined_matrix = matrix(0, rows=0, cols=1); # WHERE LO_ORDERDATE = D_DATEKEY - # => (D-KEY | D-YEAR) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) -joined_matrix = raJoin::m_raJoin(A=d_year_filt, colA=1, B=lo_filt, colB=1, method="hash2"); -print("LO-DATE JOINED.\n"); - +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=d_year_filt, colA=1, B=lo_filt, colB=1, method="hash2"); + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} # Print the first row. #print(toString(joined_matrix[1,])) # -- Aggregation (SUM)-- -# SUM(lo_extendedprice * lo_discount) AS REVENUE -# Use the joined_matrix with LO_EXTPRICE (COL-4), LO_DISCOUNT (COL-5) -lo_extprice = joined_matrix[, 4]; -lo_disc = joined_matrix[, 5]; -revenue = sum(lo_extprice * lo_disc); +if(hasRows){ + # SUM(lo_extendedprice * lo_discount) AS REVENUE + # Use the joined_matrix with LO_EXTPRICE (COL-4), LO_DISCOUNT (COL-5) + lo_extprice = joined_matrix[, 4]; + lo_disc = joined_matrix[, 5]; + revenue = sum(lo_extprice * lo_disc); + + print("REVENUE") + print(as.integer(revenue)); -print("REVENUE") -print(as.integer(revenue)); + print("\nQ1.1 finished.\n"); +} +else{ + print("REVENUE") + print("The result table has 0 rows.") + print("\nQ1.1 finished.\n"); +} -print("\nQ1.1 finished.\n"); diff --git a/scripts/ssb/queries/q1_2.dml b/scripts/ssb/queries/q1_2.dml index f4a8ce0d212..3bf9c533579 100644 --- a/scripts/ssb/queries/q1_2.dml +++ b/scripts/ssb/queries/q1_2.dml @@ -39,6 +39,9 @@ print("Loading tables from directory: " + input_dir); date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. +hasRows = 1; # If hasRows = 0, the result table is empty. + # -- Data preparation -- # Extract only the necessary columns from date and lineorder table. @@ -49,53 +52,75 @@ lineorder_csv_min = cbind(lineorder_csv[, 6], lineorder_csv[, 9], lineorder_csv[ lineorder_matrix_min = as.matrix(lineorder_csv_min); # -- Filter the data with RA-SELECTION function. + # LO_DISCOUNT BETWEEN 4 AND 6 lo_filt = raSel::m_raSelection(lineorder_matrix_min, col=4, op=">=", val=4); lo_filt = raSel::m_raSelection(lo_filt, col=4, op="<=", val=6); - +if( as.scalar(lo_filt[1,1]) == 0){ + hasRows = 0; +} # LO_QUANTITY BETWEEN 26 AND 35 -lo_filt = raSel::m_raSelection(lo_filt, col=2, op=">=", val=26); -lo_filt = raSel::m_raSelection(lo_filt, col=2, op="<=", val=35); - -# Minimize LO TABLE -# => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT -lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); +if(hasRows){ + lo_filt = raSel::m_raSelection(lo_filt, col=2, op=">=", val=26); + lo_filt = raSel::m_raSelection(lo_filt, col=2, op="<=", val=35); + if( as.scalar(lo_filt[1,1]) == 0){ + hasRows = 0; + } + else{ + # Minimize LO TABLE + # => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT + lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); + } +} -# -- Filter tables over string values. +# -- Filter table over string values. # Extracted: COL-1 | COL-7 # D_DATEKEY | D_YEARMONTH -# Build filtered DATE table (D_YEARMONTH = 'Jan1994') -date_filt = matrix(0, rows=0, cols=1); -for (i in 1:nrow(date_csv)) { - if (as.scalar(date_csv[i,7]) == "Jan1994") { - key_val = as.double(as.scalar(date_csv[i,1])); - date_filt = rbind(date_filt, matrix(key_val, rows=1, cols=1)); +d_filt = matrix(0, rows=0, cols=1); +if(hasRows){ + # Build filtered DATE table (D_YEARMONTH = 'Jan1994') + for (i in 1:nrow(date_csv)) { + if (as.scalar(date_csv[i,7]) == "Jan1994") { + key_val = as.double(as.scalar(date_csv[i,1])); + d_filt = rbind(d_filt, matrix(key_val, rows=1, cols=1)); + } + } + if (nrow(d_filt) == 0) { + hasRows = 0; } -} -if (nrow(date_filt) == 0) { - date_filt = matrix(0, rows=1, cols=1); } + # -- Join -- # Join LINEORDER and DATE tables with RA-JOIN function +joined_matrix = matrix(0, rows=0, cols=1); # WHERE LO_ORDERDATE = D_DATEKEY - # => (D_DATEKEY) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) -joined_matrix = raJoin::m_raJoin(A=date_filt, colA=1, B=lo_filt, colB=1, method="hash2"); -print("LO-DATE JOINED.\n"); +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_filt, colB=1, method="hash2"); + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} # Print the first row. #print(toString(joined_matrix[1,])) # -- Aggregation (SUM)-- +if(hasRows){ + # SUM(lo_extendedprice * lo_discount) AS REVENUE + # Use the joined_matrix with LO_EXTPRICE (COL-4), LO_DISCOUNT (COL-5) + lo_extprice = joined_matrix[, 3]; + lo_disc = joined_matrix[, 4]; + revenue = sum(lo_extprice * lo_disc); -# SUM(lo_extendedprice * lo_discount) AS REVENUE -# Use the joined_matrix with LO_EXTPRICE (COL-4), LO_DISCOUNT (COL-5) -lo_extprice = joined_matrix[, 3]; -lo_disc = joined_matrix[, 4]; -revenue = sum(lo_extprice * lo_disc); - -print("REVENUE") -print(as.integer(revenue)); + print("REVENUE") + print(as.integer(revenue)); -print("\nQ1.2 finished.\n"); \ No newline at end of file + print("\nQ1.2 finished.\n"); +} +else{ + print("REVENUE") + print("The result table has 0 rows.") + print("\nQ1.2 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q1_3.dml b/scripts/ssb/queries/q1_3.dml index b5b45aa0cb4..c4cf18e2a97 100644 --- a/scripts/ssb/queries/q1_3.dml +++ b/scripts/ssb/queries/q1_3.dml @@ -39,6 +39,9 @@ print("Loading tables from directory: " + input_dir); date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. +hasRows = 1; # If hasRows = 0, the result table is empty. + # -- Data preparation -- # Extract only the necessary columns from date and lineorder table. @@ -58,39 +61,61 @@ lineorder_matrix_min = as.matrix(lineorder_csv_min); d_filt = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1994); # WHERE D_WEEKNUMINYEAR = 6 d_filt = raSel::m_raSelection(d_filt, col=3, op="==", val=6); - +if( as.scalar(d_filt[1,1]) == 0){ + hasRows = 0; +} # WHERE LO_DISCOUNT BETWEEN 5 AND 7 -lo_filt = raSel::m_raSelection(lineorder_matrix_min, col=4, op=">=", val=5); -lo_filt = raSel::m_raSelection(lo_filt, col=4, op="<=", val=7); - +if(hasRows){ + lo_filt = raSel::m_raSelection(lineorder_matrix_min, col=4, op=">=", val=5); + lo_filt = raSel::m_raSelection(lo_filt, col=4, op="<=", val=7); + if( as.scalar(lo_filt[1,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_QUANTITY BETWEEN 26 AND 35 -lo_filt = raSel::m_raSelection(lo_filt, col=2, op=">=", val=26); -lo_filt = raSel::m_raSelection(lo_filt, col=2, op="<=", val=35); - -# Minimize LO TABLE -# => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT -lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); +if(hasRows){ + lo_filt = raSel::m_raSelection(lo_filt, col=2, op=">=", val=26); + lo_filt = raSel::m_raSelection(lo_filt, col=2, op="<=", val=35); + if( as.scalar(lo_filt[1,1]) == 0){ + hasRows = 0; + } + else{ + # Minimize LO TABLE + # => LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT + lo_filt = cbind(lo_filt[, 1], lo_filt[, 3], lo_filt[, 4]); + } +} +#print(toString(lo_filt[1,])) # -- Join -- # Join LINEORDER and DATE tables with RA-JOIN function +joined_matrix = matrix(0, rows=0, cols=1); # WHERE LO_ORDERDATE = D_DATEKEY # Print the first row. -#print(toString(lo_filt[1,])) - # => (D_DATEKEY | D_YEAR | D_WEEKNUMINYEAR) | (LO_ORDERDATE | LO_EXTPRICE | LO_DISCOUNT) -joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_filt, colB=1, method="hash2"); -print("LO-DATE JOINED.\n"); +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_filt, colB=1, method="hash2"); + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} #print(toString(joined_matrix[1,])) # -- Aggregation (SUM)-- - -# SUM(lo_extendedprice * lo_discount) AS REVENUE -# Use the joined_matrix with LO_EXTPRICE (COL-5), LO_DISCOUNT (COL-6) -lo_extprice = joined_matrix[, 5]; -lo_disc = joined_matrix[, 6]; -revenue = sum(lo_extprice * lo_disc); - -print("REVENUE") -print(as.integer(revenue)); - -print("\nQ1.3 finished.\n"); +if(hasRows){ + # SUM(lo_extendedprice * lo_discount) AS REVENUE + # Use the joined_matrix with LO_EXTPRICE (COL-5), LO_DISCOUNT (COL-6) + lo_extprice = joined_matrix[, 5]; + lo_disc = joined_matrix[, 6]; + revenue = sum(lo_extprice * lo_disc); + + print("REVENUE") + print(as.integer(revenue)); + + print("\nQ1.3 finished.\n"); +} +else{ + print("REVENUE") + print("The result table has 0 rows.") + print("\nQ1.3 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q2_1.dml b/scripts/ssb/queries/q2_1.dml index fd2340adfec..3e1c73730c7 100644 --- a/scripts/ssb/queries/q2_1.dml +++ b/scripts/ssb/queries/q2_1.dml @@ -46,7 +46,9 @@ date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; +hasRows = 1; # If hasRows = 0, the result table is empty. # -- Data preparation -- @@ -70,9 +72,11 @@ date_matrix_min = as.matrix(date_csv_min); [part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); #print(toString(part_brand_enc_f)); -# Build filtered PART table (P_CATEGORY = 'MFGR#12'), keeping key and encoded brand part_filt_keys = matrix(0, rows=0, cols=1); part_filt_brand = matrix(0, rows=0, cols=1); +part_filt = matrix(0, rows=0, cols=1); + +# Build filtered PART table (P_CATEGORY = 'MFGR#12'), keeping key and encoded brand for (i in 1:nrow(part_csv)) { if (as.scalar(part_csv[i,4]) == "MFGR#12") { key_val = as.double(as.scalar(part_csv[i,1])); @@ -82,23 +86,27 @@ for (i in 1:nrow(part_csv)) { } } if (nrow(part_filt_keys) == 0) { - part_filt_keys = matrix(0, rows=1, cols=1); - part_filt_brand = matrix(0, rows=1, cols=1); + hasRows = 0; +} +else{ + part_filt = cbind(part_filt_keys, part_filt_brand); } -part_filt = cbind(part_filt_keys, part_filt_brand); # Extracted: COL-1 | COL-6 # S_SUPPKEY | S_REGION -# Build filtered SUPPLIER table (S_REGION = 'AMERICA') supp_filt = matrix(0, rows=0, cols=1); -for (i in 1:nrow(supp_csv)) { - if (as.scalar(supp_csv[i,6]) == "AMERICA") { - key_val = as.double(as.scalar(supp_csv[i,1])); - supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); + +if(hasRows){ + # Build filtered SUPPLIER table (S_REGION = 'AMERICA') + for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "AMERICA") { + key_val = as.double(as.scalar(supp_csv[i,1])); + supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); + } + } + if (nrow(supp_filt) == 0) { + hasRows = 0; } -} -if (nrow(supp_filt) == 0) { - supp_filt = matrix(0, rows=1, cols=1); } #print(toString(supp_filt[1,])) @@ -106,57 +114,85 @@ if (nrow(supp_filt) == 0) { # Join LINEORDER table with PART, SUPPLIER, DATE tables (star schema) # Join order does matter! +lo_part = matrix(0, rows=0, cols=1); +lo_part_supp = matrix(0, rows=0, cols=1); +joined_matrix = matrix(0, rows=0, cols=1); # LINEORDER table with DATE, PART, SUPPLIER is much slower! # WHERE LO_PARTKEY = P_PARTKEY -lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +if(hasRows){ + lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); + if(nrow(lo_part[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_SUPPKEY = S_SUPPKEY -lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); +if(hasRows){ + lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); + if(nrow(lo_part_supp[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_ORDERDATE = D_DATEKEY # (D_DATEKEY | D_YEAR) | (S_SUPPKEY | P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) # Example: # 19920325.000 1992.000 17.000 608.000 381.000 608.000 17.000 19920325.000 5702508.000 -lo_part_supp_date = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); -#print(toString(lo_part_supp_date[1,])) +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} +#print(toString(joined_matrix[1,])) # -- Group-By and Aggregation (SUM)-- -# Group-By -d_year = lo_part_supp_date[,2] -p_brand = lo_part_supp_date[,5] -lo_revenue = lo_part_supp_date[,9] +if(hasRows){ + # Group-By + d_year = joined_matrix[,2] + p_brand = joined_matrix[,5] + lo_revenue = joined_matrix[,9] -# CALCULATING COMBINATION KEY D_YEAR, P_BRAND + # CALCULATING COMBINATION KEY D_YEAR, P_BRAND -max_p_brand = max(p_brand); -max_d_year = max(d_year); + max_p_brand = max(p_brand); + max_d_year = max(d_year); -p_brand_scale_f = ceil(max_p_brand) + 1; -d_year_scale_f = ceil(max_d_year) + 1; + p_brand_scale_f = ceil(max_p_brand) + 1; + d_year_scale_f = ceil(max_d_year) + 1; -combined_key = d_year * p_brand_scale_f + p_brand; + combined_key = d_year * p_brand_scale_f + p_brand; -group_input = cbind(lo_revenue, combined_key) + group_input = cbind(lo_revenue, combined_key) -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); -# Aggregation (SUM) -key = agg_result[, 1]; -revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + # Aggregation (SUM) + key = agg_result[, 1]; + revenue = rowSums(agg_result[, 2:ncol(agg_result)]); -# EXTRACTING D_YEAR, P_BRAND -d_year = round(floor(key / (p_brand_scale_f))); -p_brand = round(key %% p_brand_scale_f); -result = cbind(revenue, d_year, p_brand, key); + # EXTRACTING D_YEAR, P_BRAND + d_year = round(floor(key / (p_brand_scale_f))); + p_brand = round(key %% p_brand_scale_f); + result = cbind(revenue, d_year, p_brand, key); -# -- Sorting -- -- Sorting int columns works, but string does not. -# ORDER BY P_BRAND ASC -result_ordered = order(target=result, by=3, decreasing=FALSE, index.return=FALSE); + # -- Sorting -- -- Sorting int columns works, but string does not. + # ORDER BY P_BRAND ASC + result_ordered = order(target=result, by=3, decreasing=FALSE, index.return=FALSE); -p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); -res = cbind(as.frame(result_ordered[,1]), as.frame(result_ordered[,2]), p_brand_dec) ; + p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); + res = cbind(as.frame(result_ordered[,1]), as.frame(result_ordered[,2]), p_brand_dec) ; -# Print result -print("SUM(lo_revenue) | d_year | p_brand") -print(res) + # Print result + print("SUM(lo_revenue) | d_year | p_brand") + print(res) -print("\nQ2.1 finished.\n"); + print("\nQ2.1 finished.\n"); +} +else{ + # If the result table has 0 rows, skip group-by and aggregation. + # Print result + print("SUM(lo_revenue) | d_year | p_brand") + print("The result table has 0 rows.") + + print("\nQ2.1 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q2_2.dml b/scripts/ssb/queries/q2_2.dml index ab041323386..05981cf7370 100644 --- a/scripts/ssb/queries/q2_2.dml +++ b/scripts/ssb/queries/q2_2.dml @@ -46,7 +46,9 @@ date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; +hasRows = 1; # If hasRows = 0, the result table is empty. # -- Data preparation -- @@ -70,9 +72,11 @@ date_matrix_min = as.matrix(date_csv_min); [part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); #print(toString(part_brand_enc_f)); -# Build filtered PART table (P_BRAND BETWEEN 'MFGR#2221' AND 'MFGR#2228'), keeping key and encoded brand part_filt_keys = matrix(0, rows=0, cols=1); part_filt_brand = matrix(0, rows=0, cols=1); +part_filt = matrix(0, rows=0, cols=1); + +# Build filtered PART table (P_BRAND BETWEEN 'MFGR#2221' AND 'MFGR#2228'), keeping key and encoded brand for (i in 1:nrow(part_csv)) { p_elem = as.scalar(part_csv[i,5]) if ( p_elem >= "MFGR#2221" & p_elem <= "MFGR#2228") { @@ -83,24 +87,28 @@ for (i in 1:nrow(part_csv)) { } } if (nrow(part_filt_keys) == 0) { - part_filt_keys = matrix(0, rows=1, cols=1); - part_filt_brand = matrix(0, rows=1, cols=1); + hasRows = 0; +} +else{ + part_filt = cbind(part_filt_keys, part_filt_brand); } -part_filt = cbind(part_filt_keys, part_filt_brand); # Extracted: COL-1 | COL-6 # S_SUPPKEY | S_REGION -# Build filtered SUPPLIER table (S_REGION = 'ASIA') supp_filt = matrix(0, rows=0, cols=1); -for (i in 1:nrow(supp_csv)) { - if (as.scalar(supp_csv[i,6]) == "ASIA") { - key_val = as.double(as.scalar(supp_csv[i,1])); - supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); +if(hasRows){ + # Build filtered SUPPLIER table (S_REGION = 'ASIA') + for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "ASIA") { + key_val = as.double(as.scalar(supp_csv[i,1])); + supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); + } + } + if (nrow(supp_filt) == 0) { + hasRows = 0; } } -if (nrow(supp_filt) == 0) { - supp_filt = matrix(0, rows=1, cols=1); -} + #print("LO,DATE,PART,SUPP") #print(toString(lineorder_matrix_min[1,])) #print(toString(date_matrix_min[1,])) @@ -110,58 +118,85 @@ if (nrow(supp_filt) == 0) { # -- JOIN TABLES WITH RA-JOIN FUNCTION -- # Join LINEORDER table with PART, SUPPLIER, DATE tables (star schema) -# Join order does matter! +# Join order does matter! +lo_part = matrix(0, rows=0, cols=1); +lo_part_supp = matrix(0, rows=0, cols=1); +joined_matrix = matrix(0, rows=0, cols=1); # WHERE LO_PARTKEY = P_PARTKEY -lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +if(hasRows){ + lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); + if(nrow(lo_part[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_SUPPKEY = S_SUPPKEY -lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); +if(hasRows){ + lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); + if(nrow(lo_part_supp[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_ORDERDATE = D_DATEKEY # (D_DATEKEY | D_YEAR) | (S_SUPPKEY | P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) - -lo_part_supp_date = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); -#print(toString(lo_part_supp_date[1,])) +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} +#print(toString(joined_matrix[1,])) # -- GROUP-BY & AGGREGATION -- # -- Group-By and Aggregation (SUM)-- -# Group-By -d_year = lo_part_supp_date[,2] -p_brand = lo_part_supp_date[,5] -lo_revenue = lo_part_supp_date[,9] +if(hasRows){ + # Group-By + d_year = joined_matrix[,2] + p_brand = joined_matrix[,5] + lo_revenue = joined_matrix[,9] -# CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 P_BRAND + # CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 P_BRAND -max_p_brand = max(p_brand); -max_d_year = max(d_year); + max_p_brand = max(p_brand); + max_d_year = max(d_year); -p_brand_scale_f = ceil(max_p_brand) + 1; -d_year_scale_f = ceil(max_d_year) + 1; + p_brand_scale_f = ceil(max_p_brand) + 1; + d_year_scale_f = ceil(max_d_year) + 1; -combined_key = d_year * p_brand_scale_f + p_brand; + combined_key = d_year * p_brand_scale_f + p_brand; -group_input = cbind(lo_revenue, combined_key) + group_input = cbind(lo_revenue, combined_key) -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); -# Aggregation (SUM) -key = agg_result[, 1]; -revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + # Aggregation (SUM) + key = agg_result[, 1]; + revenue = rowSums(agg_result[, 2:ncol(agg_result)]); -# EXTRACTING D_YEAR, P_BRAND -d_year = round(floor(key / (p_brand_scale_f))); -p_brand = round(key %% p_brand_scale_f); -result = cbind(revenue, d_year, p_brand, key); + # EXTRACTING D_YEAR, P_BRAND + d_year = round(floor(key / (p_brand_scale_f))); + p_brand = round(key %% p_brand_scale_f); + result = cbind(revenue, d_year, p_brand, key); -# -- Sorting -- -- Sorting int columns works, but string does not. -# ORDER BY D_YEAR, P_BRAND ASC -result_ordered = order(target=result, by=4, decreasing=FALSE, index.return=FALSE); + # -- Sorting -- -- Sorting int columns works, but string does not. + # ORDER BY D_YEAR, P_BRAND ASC + result_ordered = order(target=result, by=4, decreasing=FALSE, index.return=FALSE); -p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); -res = cbind(as.frame(result_ordered[,1]), as.frame(result_ordered[,2]), p_brand_dec) ; + p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); + res = cbind(as.frame(result_ordered[,1]), as.frame(result_ordered[,2]), p_brand_dec) ; -# Print result -print("SUM(lo_revenue) | d_year | p_brand") -print(res) + # Print result + print("SUM(lo_revenue) | d_year | p_brand") + print(res) -print("\nQ2.2 finished.\n"); + print("\nQ2.2 finished.\n"); +} +else{ + # If the result table has 0 rows, skip group-by and aggregation. + # Print result + print("SUM(lo_revenue) | d_year | p_brand") + print("The result table has 0 rows.") + + print("\nQ2.2 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q2_3.dml b/scripts/ssb/queries/q2_3.dml index a1291485b39..35f08f3ef07 100644 --- a/scripts/ssb/queries/q2_3.dml +++ b/scripts/ssb/queries/q2_3.dml @@ -46,7 +46,9 @@ date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; +hasRows = 1; # If hasRows = 0, the result table is empty. # -- Data preparation -- @@ -70,9 +72,11 @@ date_matrix_min = as.matrix(date_csv_min); [part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); #print(toString(part_brand_enc_f)); -# Build filtered PART table (P_BRAND = 'MFGR#2239'), keeping key and encoded brand part_filt_keys = matrix(0, rows=0, cols=1); part_filt_brand = matrix(0, rows=0, cols=1); +part_filt = matrix(0, rows=0, cols=1); + +# Build filtered PART table (P_BRAND = 'MFGR#2239'), keeping key and encoded brand for (i in 1:nrow(part_csv)) { if (as.scalar(part_csv[i,5]) == "MFGR#2239") { key_val = as.double(as.scalar(part_csv[i,1])); @@ -82,23 +86,26 @@ for (i in 1:nrow(part_csv)) { } } if (nrow(part_filt_keys) == 0) { - part_filt_keys = matrix(0, rows=1, cols=1); - part_filt_brand = matrix(0, rows=1, cols=1); + hasRows = 0; +} +else{ + part_filt = cbind(part_filt_keys, part_filt_brand); } -part_filt = cbind(part_filt_keys, part_filt_brand); # Extracted: COL-1 | COL-6 # S_SUPPKEY | S_REGION -# Build filtered SUPPLIER table (s_region = 'EUROPE') supp_filt = matrix(0, rows=0, cols=1); -for (i in 1:nrow(supp_csv)) { - if (as.scalar(supp_csv[i,6]) == "EUROPE") { - key_val = as.double(as.scalar(supp_csv[i,1])); - supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); +if(hasRows){ + # Build filtered SUPPLIER table (s_region = 'EUROPE') + for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "EUROPE") { + key_val = as.double(as.scalar(supp_csv[i,1])); + supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); + } + } + if (nrow(supp_filt) == 0) { + hasRows = 0; } -} -if (nrow(supp_filt) == 0) { - supp_filt = matrix(0, rows=1, cols=1); } #print("LO,DATE,PART,SUPP") #print(toString(lineorder_matrix_min[1,])) @@ -109,56 +116,82 @@ if (nrow(supp_filt) == 0) { # -- JOIN TABLES WITH RA-JOIN FUNCTION -- # Join LINEORDER table with PART, SUPPLIER, DATE tables (star schema) -# Join order does matter! +# Join order does matter! +lo_part = matrix(0, rows=0, cols=1); +lo_part_supp = matrix(0, rows=0, cols=1); +joined_matrix = matrix(0, rows=0, cols=1); # LINEORDER table with DATE, PART, SUPPLIER is much slower! # WHERE LO_PARTKEY = P_PARTKEY -lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +if(hasRows){ + lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); + if(nrow(lo_part[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_SUPPKEY = S_SUPPKEY -lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); +if(hasRows){ + lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=4, method="hash2"); + if(nrow(lo_part_supp[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_ORDERDATE = D_DATEKEY # (D_DATEKEY | D_YEAR) | (S_SUPPKEY | P_PARTKEY | P_BRAND | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) -lo_part_supp_date = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); -#print(toString(lo_part_supp_date[1,])) +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_part_supp, colB=6, method="hash2"); + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +}#print(toString(joined_matrix[1,])) # -- Group-By and Aggregation (SUM)-- +if(hasRows){ + # Group-By + d_year = joined_matrix[,2] + p_brand = joined_matrix[,5] + lo_revenue = joined_matrix[,9] -# Group-By -d_year = lo_part_supp_date[,2] -p_brand = lo_part_supp_date[,5] -lo_revenue = lo_part_supp_date[,9] - -# CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 P_BRAND + # CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 P_BRAND -max_p_brand = max(p_brand); -max_d_year = max(d_year); + max_p_brand = max(p_brand); + max_d_year = max(d_year); -p_brand_scale_f = ceil(max_p_brand) + 1; -d_year_scale_f = ceil(max_d_year) + 1; + p_brand_scale_f = ceil(max_p_brand) + 1; + d_year_scale_f = ceil(max_d_year) + 1; -combined_key = d_year * p_brand_scale_f + p_brand; + combined_key = d_year * p_brand_scale_f + p_brand; -group_input = cbind(lo_revenue, combined_key) + group_input = cbind(lo_revenue, combined_key) -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); -# Aggregation (SUM) -key = agg_result[, 1]; -revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + # Aggregation (SUM) + key = agg_result[, 1]; + revenue = rowSums(agg_result[, 2:ncol(agg_result)]); -# EXTRACTING D_YEAR, P_BRAND -d_year = round(floor(key / (p_brand_scale_f))); -p_brand = round(key %% p_brand_scale_f); -result = cbind(revenue, d_year, p_brand, key); + # EXTRACTING D_YEAR, P_BRAND + d_year = round(floor(key / (p_brand_scale_f))); + p_brand = round(key %% p_brand_scale_f); + result = cbind(revenue, d_year, p_brand, key); -# -- Sorting -- -- Sorting int columns works, but string does not. -# ORDER BY D_YEAR, P_BRAND ASC -result_ordered = order(target=result, by=4, decreasing=FALSE, index.return=FALSE); + # -- Sorting -- -- Sorting int columns works, but string does not. + # ORDER BY D_YEAR, P_BRAND ASC + result_ordered = order(target=result, by=4, decreasing=FALSE, index.return=FALSE); -p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); -res = cbind(as.frame(result_ordered[,1]), as.frame(result_ordered[,2]), p_brand_dec) ; + p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); + res = cbind(as.frame(result_ordered[,1]), as.frame(result_ordered[,2]), p_brand_dec) ; -# Print result -print("SUM(lo_revenue) | d_year | p_brand"); -print(res); + # Print result + print("SUM(lo_revenue) | d_year | p_brand"); + print(res); -print("\nQ2.3 finished.\n"); + print("\nQ2.3 finished.\n"); +} +else{ + # If the result table has 0 rows, skip group-by and aggregation. + # Print result + print("SUM(lo_revenue) | d_year | p_brand") + print("The result table has 0 rows.") + + print("\nQ2.3 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q3_1.dml b/scripts/ssb/queries/q3_1.dml index f45a12d2c56..d7a224dd26e 100644 --- a/scripts/ssb/queries/q3_1.dml +++ b/scripts/ssb/queries/q3_1.dml @@ -52,7 +52,9 @@ cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", he date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; +hasRows = 1; # If hasRows = 0, the result table is empty. # -- Data preparation -- @@ -72,6 +74,9 @@ date_matrix_min = as.matrix(date_csv_min); # WHERE D_YEAR >= 1992 AND D_YEAR <= 1997 d_filt = raSel::m_raSelection(date_matrix_min, col=2, op=">=", val=1992); d_filt = raSel::m_raSelection(d_filt, col=2, op="<=", val=1997); +if( as.scalar(d_filt[1,1]) == 0){ + hasRows = 0; +} # Prepare SUPPLIER table on-the-fly encodings # Extracted: COL-1 | COL-5 | COL-6 @@ -79,22 +84,27 @@ d_filt = raSel::m_raSelection(d_filt, col=2, op="<=", val=1997); # (only need S_NATION encoding, filter by S_REGION string) [supp_nat_enc_f, supp_nat_meta] = transformencode(target=supp_csv[,5], spec=general_spec); -# Build filtered SUPPLIER table (S_REGION == 'ASIA') supp_filt_keys = matrix(0, rows=0, cols=1); supp_filt_nat = matrix(0, rows=0, cols=1); -for (i in 1:nrow(supp_csv)) { - if (as.scalar(supp_csv[i,6]) == "ASIA") { - key_val = as.double(as.scalar(supp_csv[i,1])); - nat_code = as.double(as.scalar(supp_nat_enc_f[i,1])); - supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); - supp_filt_nat = rbind(supp_filt_nat, matrix(nat_code, rows=1, cols=1)); +supp_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered SUPPLIER table (S_REGION == 'ASIA') + for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "ASIA") { + key_val = as.double(as.scalar(supp_csv[i,1])); + nat_code = as.double(as.scalar(supp_nat_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_nat = rbind(supp_filt_nat, matrix(nat_code, rows=1, cols=1)); + } + } + if (nrow(supp_filt_keys) == 0) { + hasRows = 0; + } + else{ + supp_filt = cbind(supp_filt_keys, supp_filt_nat); } } -if (nrow(supp_filt_keys) == 0) { - supp_filt_keys = matrix(0, rows=1, cols=1); - supp_filt_nat = matrix(0, rows=1, cols=1); -} -supp_filt = cbind(supp_filt_keys, supp_filt_nat); # Prepare CUSTOMER table on-the-fly encodings # Extracted: COL-1 | COL-5 | COL-6 @@ -102,23 +112,27 @@ supp_filt = cbind(supp_filt_keys, supp_filt_nat); # (only need C_NATION encoding, filter by C_REGION string) [cust_nat_enc_f, cust_nat_meta] = transformencode(target=cust_csv[,5], spec=general_spec); -# Build filtered CUSTOMER table (C_REGION = 'ASIA') cust_filt_keys = matrix(0, rows=0, cols=1); cust_filt_nat = matrix(0, rows=0, cols=1); -for (i in 1:nrow(cust_csv)) { - if (as.scalar(cust_csv[i,6]) == "ASIA") { - key_val = as.double(as.scalar(cust_csv[i,1])); - nat_code = as.double(as.scalar(cust_nat_enc_f[i,1])); - cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); - cust_filt_nat = rbind(cust_filt_nat, matrix(nat_code, rows=1, cols=1)); +cust_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered CUSTOMER table (C_REGION = 'ASIA') + for (i in 1:nrow(cust_csv)) { + if (as.scalar(cust_csv[i,6]) == "ASIA") { + key_val = as.double(as.scalar(cust_csv[i,1])); + nat_code = as.double(as.scalar(cust_nat_enc_f[i,1])); + cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); + cust_filt_nat = rbind(cust_filt_nat, matrix(nat_code, rows=1, cols=1)); + } + } + if (nrow(cust_filt_keys) == 0) { + hasRows = 0; + } + else{ + cust_filt = cbind(cust_filt_keys,cust_filt_nat); } } -if (nrow(cust_filt_keys) == 0) { - cust_filt_keys = matrix(0, rows=1, cols=1); - cust_filt_nat = matrix(0, rows=1, cols=1); -} -cust_filt = cbind(cust_filt_keys,cust_filt_nat); - #print("LO,DATE,CUST,SUPP") #print(toString(lineorder_matrix_min[1,])) #print(toString(date_matrix_min[1,])) @@ -130,65 +144,90 @@ cust_filt = cbind(cust_filt_keys,cust_filt_nat); # Join LINEORDER table with CUST, SUPPLIER, DATE tables (star schema) # Join order does matter! +lo_cust = matrix(0, rows=0, cols=1); +lo_cust_supp = matrix(0, rows=0, cols=1); +joined_matrix = matrix(0, rows=0, cols=1); # WHERE LO_CUSTKEY = C_CUSTKEY -lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +if(hasRows){ + lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); + if(nrow(lo_cust[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_SUPPKEY = S_SUPPKEY -lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); -print(toString(lo_cust_supp[1,])) - +if(hasRows){ + lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); + if(nrow(lo_cust_supp[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_ORDERDATE = D_DATEKEY # (D_DATEKEY | D_YEAR) | (S_SUPPKEY | S_NATION | C_CUSTKEY | C_NATION | # LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) -joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); + #print(toString(joined_matrix[1,])) + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} #print(toString(joined_matrix[1,])) # -- Group-By and Aggregation (SUM)-- +if(hasRows){ + # Group-By + d_year = joined_matrix[,2]; + s_nat = joined_matrix[,4]; + c_nat = joined_matrix[,6]; + revenue = joined_matrix[,10]; -# Group-By -d_year = joined_matrix[,2]; -s_nat = joined_matrix[,4]; -c_nat = joined_matrix[,6]; -revenue = joined_matrix[,10]; + # CALCULATING COMBINATION KEY WITH PRIORITY:1 C_NATION, 2 S_NATION, D_YEAR + max_c_nat= max(c_nat); + max_s_nat= max(s_nat); + max_d_year = max(d_year); -# CALCULATING COMBINATION KEY WITH PRIORITY:1 C_NATION, 2 S_NATION, D_YEAR -max_c_nat= max(c_nat); -max_s_nat= max(s_nat); -max_d_year = max(d_year); + c_nat_scale_f = ceil(max_c_nat) + 1; + s_nat_scale_f = ceil(max_s_nat) + 1; + d_year_scale_f = ceil(max_d_year) + 1; -c_nat_scale_f = ceil(max_c_nat) + 1; -s_nat_scale_f = ceil(max_s_nat) + 1; -d_year_scale_f = ceil(max_d_year) + 1; + combined_key = c_nat * s_nat_scale_f * d_year_scale_f + s_nat * d_year_scale_f + d_year; -combined_key = c_nat * s_nat_scale_f * d_year_scale_f + s_nat * d_year_scale_f + d_year; + group_input = cbind(revenue, combined_key) -group_input = cbind(revenue, combined_key) + agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + #print(toString(agg_result[1,])); -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); -#print(toString(agg_result[1,])); + # Aggregation (SUM) + key = agg_result[, 1]; + revenue = rowSums(agg_result[, 2:ncol(agg_result)]); -# Aggregation (SUM) -key = agg_result[, 1]; -revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + # EXTRACTING C_NATION, S_NATION, D_YEAR + c_nat = round(floor(key / (s_nat_scale_f * d_year_scale_f))); + s_nat = round(floor((key %% (s_nat_scale_f * d_year_scale_f)) / d_year_scale_f)); + d_year = round(key %% d_year_scale_f); -# EXTRACTING C_NATION, S_NATION, D_YEAR -c_nat = round(floor(key / (s_nat_scale_f * d_year_scale_f))); -s_nat = round(floor((key %% (s_nat_scale_f * d_year_scale_f)) / d_year_scale_f)); -d_year = round(key %% d_year_scale_f); + result = cbind(c_nat, s_nat, d_year, revenue, key) -result = cbind(c_nat, s_nat, d_year, revenue, key) + # -- Sorting -- -- Sorting int columns works, but strings do not. + # ORDER BY D_YEAR ASC, REVENUE DESC + result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); + result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); -# -- Sorting -- -- Sorting int columns works, but strings do not. -# ORDER BY D_YEAR ASC, REVENUE DESC -result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); -result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); + c_nat_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_nat_meta); + s_nat_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_nat_meta); -c_nat_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_nat_meta); -s_nat_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_nat_meta); + res = cbind(c_nat_dec, s_nat_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; -res = cbind(c_nat_dec, s_nat_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; + # Print result + print("c_nation | s_nation | d_year | REVENUE") + print(res) -# Print result -print("c_nation | s_nation | d_year | REVENUE") -print(res) - -print("\nQ3.1 finished.\n"); + print("\nQ3.1 finished.\n"); +} +else{ + # If the result table has 0 rows, skip group-by and aggregation. + # Print result + print("c_nation | s_nation | d_year | REVENUE") + print("The result table has 0 rows.") + print("\nQ3.1 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q3_2.dml b/scripts/ssb/queries/q3_2.dml index b40ff5abb71..0cd6c7deba5 100644 --- a/scripts/ssb/queries/q3_2.dml +++ b/scripts/ssb/queries/q3_2.dml @@ -52,7 +52,9 @@ cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", he date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; +hasRows = 1; # If hasRows = 0, the result table is empty. # -- Data preparation -- @@ -72,6 +74,9 @@ date_matrix_min = as.matrix(date_csv_min); # WHERE D_YEAR >= 1992 AND D_YEAR <= 1997 d_filt = raSel::m_raSelection(date_matrix_min, col=2, op=">=", val=1992); d_filt = raSel::m_raSelection(d_filt, col=2, op="<=", val=1997); +if( as.scalar(d_filt[1,1]) == 0){ + hasRows = 0; +} # Prepare SUPPLIER table on-the-fly encodings # Extracted: COL-1 | COL-4 | COL-5 @@ -79,22 +84,27 @@ d_filt = raSel::m_raSelection(d_filt, col=2, op="<=", val=1997); # (only need S_CITY encoding, filter by S_NATION string) [supp_city_enc_f, supp_city_meta] = transformencode(target=supp_csv[,4], spec=general_spec); -# Build filtered SUPPLIER table (C_NATION = 'UNITED STATES') supp_filt_keys = matrix(0, rows=0, cols=1); supp_filt_city = matrix(0, rows=0, cols=1); -for (i in 1:nrow(supp_csv)) { - if (as.scalar(supp_csv[i,5]) == "UNITED STATES") { - key_val = as.double(as.scalar(supp_csv[i,1])); - city_code = as.double(as.scalar(supp_city_enc_f[i,1])); - supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); - supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); +supp_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered SUPPLIER table (C_NATION = 'UNITED STATES') + for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,5]) == "UNITED STATES") { + key_val = as.double(as.scalar(supp_csv[i,1])); + city_code = as.double(as.scalar(supp_city_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); + } } -} -if (nrow(supp_filt_keys) == 0) { - supp_filt_keys = matrix(0, rows=1, cols=1); - supp_filt_city = matrix(0, rows=1, cols=1); -} -supp_filt = cbind(supp_filt_keys, supp_filt_city); + if (nrow(supp_filt_keys) == 0) { + hasRows = 0; + } + else{ + supp_filt = cbind(supp_filt_keys, supp_filt_city); + } +} # Prepare CUSTOMER table on-the-fly encodings # Extracted: COL-1 | COL-5 | COL-6 @@ -102,90 +112,123 @@ supp_filt = cbind(supp_filt_keys, supp_filt_city); # (only need C_CITY encoding, filter by C_NATION string) [cust_city_enc_f, cust_city_meta] = transformencode(target=cust_csv[,4], spec=general_spec); -# Build filtered CUSTOMER table (C_NATION = 'UNITED STATES') cust_filt_keys = matrix(0, rows=0, cols=1); cust_filt_city = matrix(0, rows=0, cols=1); -for (i in 1:nrow(cust_csv)) { - if (as.scalar(cust_csv[i,5]) == "UNITED STATES") { - key_val = as.double(as.scalar(cust_csv[i,1])); - city_code = as.double(as.scalar(cust_city_enc_f[i,1])); - cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); - cust_filt_city = rbind(cust_filt_city, matrix(city_code, rows=1, cols=1)); +cust_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered CUSTOMER table (C_NATION = 'UNITED STATES') + for (i in 1:nrow(cust_csv)) { + if (as.scalar(cust_csv[i,5]) == "UNITED STATES") { + key_val = as.double(as.scalar(cust_csv[i,1])); + city_code = as.double(as.scalar(cust_city_enc_f[i,1])); + cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); + cust_filt_city = rbind(cust_filt_city, matrix(city_code, rows=1, cols=1)); + } + } + if (nrow(cust_filt_keys) == 0) { + hasRows = 0; + } + else{ + cust_filt = cbind(cust_filt_keys,cust_filt_city); } } -if (nrow(cust_filt_keys) == 0) { - cust_filt_keys = matrix(0, rows=1, cols=1); - cust_filt_city = matrix(0, rows=1, cols=1); -} -cust_filt = cbind(cust_filt_keys,cust_filt_city); #print("LO,DATE,CUST,SUPP") #print(toString(lineorder_matrix_min[1,])) -#print(toString(date_matrix_min[1,])) +#print(toString(d_filt[1,])) #print(toString(cust_filt[1,])) #print(toString(supp_filt[1,])) # -- JOIN TABLES WITH RA-JOIN FUNCTION -- # Join LINEORDER table with CUST, SUPPLIER, DATE tables (star schema) -# Join order does matter! +# Join order does matter! +lo_cust = matrix(0, rows=0, cols=1); +lo_cust_supp = matrix(0, rows=0, cols=1); +joined_matrix = matrix(0, rows=0, cols=1); # WHERE LO_CUSTKEY = C_CUSTKEY -lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +if(hasRows){ + lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); + if(nrow(lo_cust[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_SUPPKEY = S_SUPPKEY -lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); +if(hasRows){ + lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); + if(nrow(lo_cust_supp[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_ORDERDATE = D_DATEKEY # (D_DATEKEY | D_YEAR) | (S_SUPPKEY | S_CITY | C_CUSTKEY | C_CITY | # LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) -joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); -print(toString(joined_matrix[1,])) +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); + #print(toString(joined_matrix[1,])) + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} +#print(toString(joined_matrix[1,])) # -- Group-By and Aggregation (SUM)-- -# Group-By -d_year = joined_matrix[,2]; -s_city = joined_matrix[,4]; -c_city = joined_matrix[,6]; -revenue = joined_matrix[,10]; +if(hasRows){ + # Group-By + d_year = joined_matrix[,2]; + s_city = joined_matrix[,4]; + c_city = joined_matrix[,6]; + revenue = joined_matrix[,10]; -# CALCULATING COMBINATION KEY WITH PRIORITY:1 C_CITY, 2 S_CITY, D_YEAR -max_c_city= max(c_city); -max_s_city= max(s_city); -max_d_year = max(d_year); + # CALCULATING COMBINATION KEY WITH PRIORITY:1 C_CITY, 2 S_CITY, D_YEAR + max_c_city= max(c_city); + max_s_city= max(s_city); + max_d_year = max(d_year); -c_city_scale_f = ceil(max_c_city) + 1; -s_city_scale_f = ceil(max_s_city) + 1; -d_year_scale_f = ceil(max_d_year) + 1; + c_city_scale_f = ceil(max_c_city) + 1; + s_city_scale_f = ceil(max_s_city) + 1; + d_year_scale_f = ceil(max_d_year) + 1; -combined_key = c_city * s_city_scale_f * d_year_scale_f + s_city * d_year_scale_f + d_year; + combined_key = c_city * s_city_scale_f * d_year_scale_f + s_city * d_year_scale_f + d_year; -group_input = cbind(revenue, combined_key) + group_input = cbind(revenue, combined_key) -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); -#print(toString(agg_result[1,])); + agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + #print(toString(agg_result[1,])); -# Aggregation (SUM) -key = agg_result[, 1]; -revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + # Aggregation (SUM) + key = agg_result[, 1]; + revenue = rowSums(agg_result[, 2:ncol(agg_result)]); -# EXTRACTING C_CITY, S_CITY, D_YEAR -c_city = round(floor(key / (s_city_scale_f * d_year_scale_f))); -s_city = round(floor((key %% (s_city_scale_f * d_year_scale_f)) / d_year_scale_f)); -d_year = round(key %% d_year_scale_f); + # EXTRACTING C_CITY, S_CITY, D_YEAR + c_city = round(floor(key / (s_city_scale_f * d_year_scale_f))); + s_city = round(floor((key %% (s_city_scale_f * d_year_scale_f)) / d_year_scale_f)); + d_year = round(key %% d_year_scale_f); -result = cbind(c_city, s_city, d_year, revenue, key) + result = cbind(c_city, s_city, d_year, revenue, key) -# -- Sorting -- -- Sorting int columns works, but strings do not. -# ORDER BY D_YEAR ASC, REVENUE DESC -result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); -result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); + # -- Sorting -- -- Sorting int columns works, but strings do not. + # ORDER BY D_YEAR ASC, REVENUE DESC + result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); + result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); -c_city_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_city_meta); -s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); + c_city_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_city_meta); + s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); -res = cbind(c_city_dec, s_city_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; + res = cbind(c_city_dec, s_city_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; -# Print result -print("c_city | s_city | d_year | REVENUE") -print(res) + # Print result + print("c_city | s_city | d_year | REVENUE") + print(res) -print("\nQ3.2 finished.\n"); + print("\nQ3.2 finished.\n"); +} +else{ + # If the result table has 0 rows, skip group-by and aggregation. + # Print result + print("c_city | s_city | d_year | REVENUE") + print("The result table has 0 rows.") + print("\nQ3.2 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q3_3.dml b/scripts/ssb/queries/q3_3.dml index 1570a3336d0..28f75d950f7 100644 --- a/scripts/ssb/queries/q3_3.dml +++ b/scripts/ssb/queries/q3_3.dml @@ -58,7 +58,9 @@ cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", he date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; +hasRows = 1; # If hasRows = 0, the result table is empty. # -- Data preparation -- @@ -78,6 +80,9 @@ date_matrix_min = as.matrix(date_csv_min); # WHERE D_YEAR >= 1992 AND D_YEAR <= 1997 d_filt = raSel::m_raSelection(date_matrix_min, col=2, op=">=", val=1992); d_filt = raSel::m_raSelection(d_filt, col=2, op="<=", val=1997); +if( as.scalar(d_filt[1,1]) == 0){ + hasRows = 0; +} # Prepare SUPPLIER table on-the-fly encodings # Extracted: COL-1 | COL-4 @@ -85,23 +90,28 @@ d_filt = raSel::m_raSelection(d_filt, col=2, op="<=", val=1997); # (only need S_CITY encoding, filter by S_CITY string itself) [supp_city_enc_f, supp_city_meta] = transformencode(target=supp_csv[,4], spec=general_spec); -# Build filtered SUPPLIER table (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') supp_filt_keys = matrix(0, rows=0, cols=1); supp_filt_city = matrix(0, rows=0, cols=1); -for (i in 1:nrow(supp_csv)) { - s_elem = as.scalar(supp_csv[i,4]) - if (s_elem == "UNITED KI1" | s_elem == "UNITED KI5") { - key_val = as.double(as.scalar(supp_csv[i,1])); - city_code = as.double(as.scalar(supp_city_enc_f[i,1])); - supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); - supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); +supp_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered SUPPLIER table (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') + for (i in 1:nrow(supp_csv)) { + s_elem = as.scalar(supp_csv[i,4]) + if (s_elem == "UNITED KI1" | s_elem == "UNITED KI5") { + key_val = as.double(as.scalar(supp_csv[i,1])); + city_code = as.double(as.scalar(supp_city_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); + } + } + if (nrow(supp_filt_keys) == 0) { + hasRows = 0; + } + else{ + supp_filt = cbind(supp_filt_keys, supp_filt_city); } } -if (nrow(supp_filt_keys) == 0) { - supp_filt_keys = matrix(0, rows=1, cols=1); - supp_filt_city = matrix(0, rows=1, cols=1); -} -supp_filt = cbind(supp_filt_keys, supp_filt_city); # Prepare CUSTOMER table on-the-fly encodings # Extracted: COL-1 | COL-4 @@ -109,93 +119,122 @@ supp_filt = cbind(supp_filt_keys, supp_filt_city); # (only need C_CITY encoding, filter by C_CITY string itself) [cust_city_enc_f, cust_city_meta] = transformencode(target=cust_csv[,4], spec=general_spec); -# Build filtered CUSTOMER table (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') cust_filt_keys = matrix(0, rows=0, cols=1); cust_filt_city = matrix(0, rows=0, cols=1); -for (i in 1:nrow(cust_csv)) { - c_elem = as.scalar(cust_csv[i,4]) - if (c_elem == "UNITED KI1" | c_elem == "UNITED KI5") { - key_val = as.double(as.scalar(cust_csv[i,1])); - city_code = as.double(as.scalar(cust_city_enc_f[i,1])); - cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); - cust_filt_city = rbind(cust_filt_city, matrix(city_code, rows=1, cols=1)); +cust_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered CUSTOMER table (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') + for (i in 1:nrow(cust_csv)) { + c_elem = as.scalar(cust_csv[i,4]) + if (c_elem == "UNITED KI1" | c_elem == "UNITED KI5") { + key_val = as.double(as.scalar(cust_csv[i,1])); + city_code = as.double(as.scalar(cust_city_enc_f[i,1])); + cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); + cust_filt_city = rbind(cust_filt_city, matrix(city_code, rows=1, cols=1)); + } + } + if (nrow(cust_filt_keys) == 0) { + hasRows = 0; + } + else{ + cust_filt = cbind(cust_filt_keys,cust_filt_city); } } -if (nrow(cust_filt_keys) == 0) { - cust_filt_keys = matrix(0, rows=1, cols=1); - cust_filt_city = matrix(0, rows=1, cols=1); -} -cust_filt = cbind(cust_filt_keys,cust_filt_city); #print("LO,DATE,CUST,SUPP") #print(toString(lineorder_matrix_min[1,])) -#print(toString(date_matrix_min[1,])) +#print(toString(d_filt[1,])) #print(toString(cust_filt[1,])) #print(toString(supp_filt[1,])) # -- JOIN TABLES WITH RA-JOIN FUNCTION -- - -# Join LINEORDER table with CUST, SUPPLIER, DATE tables (star schema) # Join order does matter! +lo_cust = matrix(0, rows=0, cols=1); +lo_cust_supp = matrix(0, rows=0, cols=1); +joined_matrix = matrix(0, rows=0, cols=1); +# Join LINEORDER table with CUST, SUPPLIER, DATE tables (star schema) # WHERE LO_CUSTKEY = C_CUSTKEY -lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +if(hasRows){ + lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); + if(nrow(lo_cust[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_SUPPKEY = S_SUPPKEY - -lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); +if(hasRows){ + lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); + if(nrow(lo_cust_supp[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_ORDERDATE = D_DATEKEY # (D_DATEKEY | D_YEAR) | (S_SUPPKEY | S_CITY | C_CUSTKEY | C_CITY | # LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) - -joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); -#print(toString(joined_matrix[1,])) +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); + #print(toString(joined_matrix[1,])) + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} # -- Group-By and Aggregation (SUM)-- -# Group-By -d_year = joined_matrix[,2]; -s_city = joined_matrix[,4]; -c_city = joined_matrix[,6]; -revenue = joined_matrix[,10]; +if(hasRows){ + # Group-By + d_year = joined_matrix[,2]; + s_city = joined_matrix[,4]; + c_city = joined_matrix[,6]; + revenue = joined_matrix[,10]; -# CALCULATING COMBINATION KEY WITH PRIORITY:1 C_CITY, 2 S_CITY, D_YEAR -max_c_city= max(c_city); -max_s_city= max(s_city); -max_d_year = max(d_year); + # CALCULATING COMBINATION KEY WITH PRIORITY:1 C_CITY, 2 S_CITY, D_YEAR + max_c_city= max(c_city); + max_s_city= max(s_city); + max_d_year = max(d_year); -c_city_scale_f = ceil(max_c_city) + 1; -s_city_scale_f = ceil(max_s_city) + 1; -d_year_scale_f = ceil(max_d_year) + 1; + c_city_scale_f = ceil(max_c_city) + 1; + s_city_scale_f = ceil(max_s_city) + 1; + d_year_scale_f = ceil(max_d_year) + 1; -combined_key = c_city * s_city_scale_f * d_year_scale_f + s_city * d_year_scale_f + d_year; + combined_key = c_city * s_city_scale_f * d_year_scale_f + s_city * d_year_scale_f + d_year; -group_input = cbind(revenue, combined_key) + group_input = cbind(revenue, combined_key) -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); -#print(toString(agg_result[1,])); + agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + #print(toString(agg_result[1,])); -# Aggregation (SUM) -key = agg_result[, 1]; -revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + # Aggregation (SUM) + key = agg_result[, 1]; + revenue = rowSums(agg_result[, 2:ncol(agg_result)]); -# EXTRACTING C_CITY, S_CITY, D_YEAR -c_city = round(floor(key / (s_city_scale_f * d_year_scale_f))); -s_city = round(floor((key %% (s_city_scale_f * d_year_scale_f)) / d_year_scale_f)); -d_year = round(key %% d_year_scale_f); + # EXTRACTING C_CITY, S_CITY, D_YEAR + c_city = round(floor(key / (s_city_scale_f * d_year_scale_f))); + s_city = round(floor((key %% (s_city_scale_f * d_year_scale_f)) / d_year_scale_f)); + d_year = round(key %% d_year_scale_f); -result = cbind(c_city, s_city, d_year, revenue, key) + result = cbind(c_city, s_city, d_year, revenue, key) -# -- Sorting -- -- Sorting int columns works, but strings do not. -# ORDER BY D_YEAR ASC, REVENUE DESC -result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); -result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); + # -- Sorting -- -- Sorting int columns works, but strings do not. + # ORDER BY D_YEAR ASC, REVENUE DESC + result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); + result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); -c_city_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_city_meta); -s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); + c_city_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_city_meta); + s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); -res = cbind(c_city_dec, s_city_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; + res = cbind(c_city_dec, s_city_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; -# Print result -print("c_city | s_city | d_year | REVENUE") -print(res) + # Print result + print("c_city | s_city | d_year | REVENUE") + print(res) -print("\nQ3.3 finished.\n"); + print("\nQ3.3 finished.\n"); +} +else{ + # If the result table has 0 rows, skip group-by and aggregation. + # Print result + print("c_city | s_city | d_year | REVENUE") + print("The result table has 0 rows.") + print("\nQ3.3 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q3_4.dml b/scripts/ssb/queries/q3_4.dml index 5174dfe1da4..633bc794b35 100644 --- a/scripts/ssb/queries/q3_4.dml +++ b/scripts/ssb/queries/q3_4.dml @@ -60,7 +60,9 @@ cust_csv = read(input_dir + "/customer.tbl", data_type="frame", format="csv", he date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; +hasRows = 1; # If hasRows = 0, the result table is empty. # -- Data preparation -- @@ -78,6 +80,8 @@ lineorder_matrix_min = as.matrix(lineorder_csv_min); # Build filtered DATE table (D_YEARMONTH = 'Dec1997') d_filt_keys = matrix(0, rows=0, cols=1); d_filt_year = matrix(0, rows=0, cols=1); +d_filt = matrix(0, rows=0, cols=1); + for (i in 1:nrow(date_csv)) { if (as.scalar(date_csv[i,7]) == "Dec1997") { key_val = as.double(as.scalar(date_csv[i,1])); @@ -85,12 +89,14 @@ for (i in 1:nrow(date_csv)) { d_filt_keys = rbind(d_filt_keys, matrix(key_val, rows=1, cols=1)); d_filt_year = rbind(d_filt_year, matrix(year_val, rows=1, cols=1)); } -} + } if (nrow(d_filt_keys) == 0) { - d_filt_keys = matrix(0, rows=1, cols=1); - d_filt_year = matrix(0, rows=1, cols=1); + hasRows = 0; } -d_filt = cbind(d_filt_keys, d_filt_year); +else{ + d_filt = cbind(d_filt_keys, d_filt_year); +} + # Prepare SUPPLIER table on-the-fly encodings # Extracted: COL-1 | COL-4 @@ -98,23 +104,28 @@ d_filt = cbind(d_filt_keys, d_filt_year); # (only need S_CITY encoding, filter by S_CITY string itself) [supp_city_enc_f, supp_city_meta] = transformencode(target=supp_csv[,4], spec=general_spec); -# Build filtered SUPPLIER table (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') supp_filt_keys = matrix(0, rows=0, cols=1); supp_filt_city = matrix(0, rows=0, cols=1); -for (i in 1:nrow(supp_csv)) { - s_elem = as.scalar(supp_csv[i,4]) - if (s_elem == "UNITED KI1" | s_elem == "UNITED KI5") { - key_val = as.double(as.scalar(supp_csv[i,1])); - city_code = as.double(as.scalar(supp_city_enc_f[i,1])); - supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); - supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); +supp_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered SUPPLIER table (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') + for (i in 1:nrow(supp_csv)) { + s_elem = as.scalar(supp_csv[i,4]) + if (s_elem == "UNITED KI1" | s_elem == "UNITED KI5") { + key_val = as.double(as.scalar(supp_csv[i,1])); + city_code = as.double(as.scalar(supp_city_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); + } + } + if (nrow(supp_filt_keys) == 0) { + hasRows = 0; + } + else{ + supp_filt = cbind(supp_filt_keys, supp_filt_city); } } -if (nrow(supp_filt_keys) == 0) { - supp_filt_keys = matrix(0, rows=1, cols=1); - supp_filt_city = matrix(0, rows=1, cols=1); -} -supp_filt = cbind(supp_filt_keys, supp_filt_city); # Prepare CUSTOMER table on-the-fly encodings # Extracted: COL-1 | COL-4 @@ -122,100 +133,122 @@ supp_filt = cbind(supp_filt_keys, supp_filt_city); # (only need C_CITY encoding, filter by C_CITY string itself) [cust_city_enc_f, cust_city_meta] = transformencode(target=cust_csv[,4], spec=general_spec); -# Build filtered CUSTOMER table (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') cust_filt_keys = matrix(0, rows=0, cols=1); cust_filt_city = matrix(0, rows=0, cols=1); -for (i in 1:nrow(cust_csv)) { - c_elem = as.scalar(cust_csv[i,4]) - if (c_elem == "UNITED KI1" | c_elem == "UNITED KI5") { - key_val = as.double(as.scalar(cust_csv[i,1])); - city_code = as.double(as.scalar(cust_city_enc_f[i,1])); - cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); - cust_filt_city = rbind(cust_filt_city, matrix(city_code, rows=1, cols=1)); +cust_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered CUSTOMER table (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') + for (i in 1:nrow(cust_csv)) { + c_elem = as.scalar(cust_csv[i,4]) + if (c_elem == "UNITED KI1" | c_elem == "UNITED KI5") { + key_val = as.double(as.scalar(cust_csv[i,1])); + city_code = as.double(as.scalar(cust_city_enc_f[i,1])); + cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); + cust_filt_city = rbind(cust_filt_city, matrix(city_code, rows=1, cols=1)); + } + } + if (nrow(cust_filt_keys) == 0) { + hasRows = 0; + } + else{ + cust_filt = cbind(cust_filt_keys,cust_filt_city); } } -if (nrow(cust_filt_keys) == 0) { - cust_filt_keys = matrix(0, rows=1, cols=1); - cust_filt_city = matrix(0, rows=1, cols=1); -} -cust_filt = cbind(cust_filt_keys,cust_filt_city); #print("LO,DATE,CUST,SUPP") #print(toString(lineorder_matrix_min[1,])) -#print(toString(date_matrix_min[1,])) +#print(toString(d_filt[1,])) #print(toString(cust_filt[1,])) #print(toString(supp_filt[1,])) # -- JOIN TABLES WITH RA-JOIN FUNCTION -- - # Join LINEORDER table with CUST, SUPPLIER, DATE tables (star schema) -# Join order does matter! +# Join order does matter! +lo_cust = matrix(0, rows=0, cols=1); +lo_cust_supp = matrix(0, rows=0, cols=1); +joined_matrix = matrix(0, rows=0, cols=1); # WHERE LO_CUSTKEY = C_CUSTKEY -lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +if(hasRows){ + lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); + if(nrow(lo_cust[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_SUPPKEY = S_SUPPKEY +if(hasRows){ + lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); + if(nrow(lo_cust_supp[,1]) == 0){ + hasRows = 0; + } +} -lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); # WHERE LO_ORDERDATE = D_DATEKEY # (D_DATEKEY | D_YEAR) | (S_SUPPKEY | S_CITY | C_CUSTKEY | C_CITY | # LO_CUSTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE) - -joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); -#print(toString(joined_matrix[1,])); - -if(nrow(joined_matrix[,1]) == 0){ - print("c_city | s_city | d_year | REVENUE"); - print("The result table has 0 rows."); +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp, colB=7, method="hash2"); + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } } -else{ # -- Group-By and Aggregation (SUM)-- -# Group-By -d_year = joined_matrix[,2]; -s_city = joined_matrix[,4]; -c_city = joined_matrix[,6]; -revenue = joined_matrix[,10]; +if(hasRows){ + # Group-By + d_year = joined_matrix[,2]; + s_city = joined_matrix[,4]; + c_city = joined_matrix[,6]; + revenue = joined_matrix[,10]; -# CALCULATING COMBINATION KEY WITH PRIORITY:1 C_CITY, 2 S_CITY, D_YEAR -max_c_city= max(c_city); -max_s_city= max(s_city); -max_d_year = max(d_year); + # CALCULATING COMBINATION KEY WITH PRIORITY:1 C_CITY, 2 S_CITY, D_YEAR + max_c_city= max(c_city); + max_s_city= max(s_city); + max_d_year = max(d_year); -c_city_scale_f = ceil(max_c_city) + 1; -s_city_scale_f = ceil(max_s_city) + 1; -d_year_scale_f = ceil(max_d_year) + 1; + c_city_scale_f = ceil(max_c_city) + 1; + s_city_scale_f = ceil(max_s_city) + 1; + d_year_scale_f = ceil(max_d_year) + 1; -combined_key = c_city * s_city_scale_f * d_year_scale_f + s_city * d_year_scale_f + d_year; + combined_key = c_city * s_city_scale_f * d_year_scale_f + s_city * d_year_scale_f + d_year; -group_input = cbind(revenue, combined_key) + group_input = cbind(revenue, combined_key) -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); -#print(toString(agg_result[1,])); + agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + #print(toString(agg_result[1,])); -# Aggregation (SUM) -key = agg_result[, 1]; -revenue = rowSums(agg_result[, 2:ncol(agg_result)]); + # Aggregation (SUM) + key = agg_result[, 1]; + revenue = rowSums(agg_result[, 2:ncol(agg_result)]); -# EXTRACTING C_CITY, S_CITY, D_YEAR -c_city = round(floor(key / (s_city_scale_f * d_year_scale_f))); -s_city = round(floor((key %% (s_city_scale_f * d_year_scale_f)) / d_year_scale_f)); -d_year = round(key %% d_year_scale_f); + # EXTRACTING C_CITY, S_CITY, D_YEAR + c_city = round(floor(key / (s_city_scale_f * d_year_scale_f))); + s_city = round(floor((key %% (s_city_scale_f * d_year_scale_f)) / d_year_scale_f)); + d_year = round(key %% d_year_scale_f); -result = cbind(c_city, s_city, d_year, revenue, key) + result = cbind(c_city, s_city, d_year, revenue, key) -# -- Sorting -- -- Sorting int columns works, but strings do not. -# ORDER BY D_YEAR ASC, REVENUE DESC -result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); -result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); + # -- Sorting -- -- Sorting int columns works, but strings do not. + # ORDER BY D_YEAR ASC, REVENUE DESC + result_ordered = order(target=result, by=4, decreasing=TRUE, index.return=FALSE); + result_ordered = order(target=result_ordered, by=3, decreasing=FALSE, index.return=FALSE); -c_city_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_city_meta); -s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); + c_city_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=cust_city_meta); + s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); -res = cbind(c_city_dec, s_city_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; + res = cbind(c_city_dec, s_city_dec, as.frame(result_ordered[,3]), as.frame(result_ordered[,4])) ; -# Print result -print("c_city | s_city | d_year | REVENUE") -print(res) + # Print result + print("c_city | s_city | d_year | REVENUE") + print(res) -print("\nQ3.3 finished.\n"); + print("\nQ3.4 finished.\n"); +} +else{ + # If the result table has 0 rows, skip group-by and aggregation. + # Print result + print("c_city | s_city | d_year | REVENUE") + print("The result table has 0 rows.") + print("\nQ3.4 finished.\n"); } \ No newline at end of file diff --git a/scripts/ssb/queries/q4_1.dml b/scripts/ssb/queries/q4_1.dml index 1ecfc586ab7..33178a15e1a 100644 --- a/scripts/ssb/queries/q4_1.dml +++ b/scripts/ssb/queries/q4_1.dml @@ -55,7 +55,9 @@ date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; +hasRows = 1; # If hasRows = 0, the result table is empty. # -- Data preparation -- @@ -87,21 +89,23 @@ for (i in 1:nrow(part_csv)) { } } if (nrow(part_filt) == 0) { - part_filt = matrix(0, rows=1, cols=1); + hasRows = 0; } # Extracted: COL-1 | COL-6 # S_SUPPKEY | S_REGION # Build filtered SUPPLIER table (S_REGION = 'AMERICA') supp_filt = matrix(0, rows=0, cols=1); -for (i in 1:nrow(supp_csv)) { - if (as.scalar(supp_csv[i,6]) == "AMERICA") { - key_val = as.double(as.scalar(supp_csv[i,1])); - supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); +if(hasRows){ + for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "AMERICA") { + key_val = as.double(as.scalar(supp_csv[i,1])); + supp_filt = rbind(supp_filt, matrix(key_val, rows=1, cols=1)); + } + } + if (nrow(supp_filt) == 0) { + hasRows = 0; } -} -if (nrow(supp_filt) == 0) { - supp_filt = matrix(0, rows=1, cols=1); } # Prepare CUSTOMER table on-the-fly encodings @@ -110,23 +114,27 @@ if (nrow(supp_filt) == 0) { # (only need C_NATION encoding, filter by C_REGION string) [cust_nat_enc_f, cust_nat_meta] = transformencode(target=cust_csv[,5], spec=general_spec); -# Build filtered CUSTOMER table (C_REGION = 'AMERICA') cust_filt_keys = matrix(0, rows=0, cols=1); cust_filt_nat = matrix(0, rows=0, cols=1); -for (i in 1:nrow(cust_csv)) { - if (as.scalar(cust_csv[i,6]) == "AMERICA") { - key_val = as.double(as.scalar(cust_csv[i,1])); - nat_code = as.double(as.scalar(cust_nat_enc_f[i,1])); - cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); - cust_filt_nat = rbind(cust_filt_nat, matrix(nat_code, rows=1, cols=1)); +cust_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered CUSTOMER table (C_REGION = 'AMERICA') + for (i in 1:nrow(cust_csv)) { + if (as.scalar(cust_csv[i,6]) == "AMERICA") { + key_val = as.double(as.scalar(cust_csv[i,1])); + nat_code = as.double(as.scalar(cust_nat_enc_f[i,1])); + cust_filt_keys = rbind(cust_filt_keys, matrix(key_val, rows=1, cols=1)); + cust_filt_nat = rbind(cust_filt_nat, matrix(nat_code, rows=1, cols=1)); + } + } + if (nrow(cust_filt_keys) == 0) { + hasRows = 0; + } + else{ + cust_filt = cbind(cust_filt_keys,cust_filt_nat); } } -if (nrow(cust_filt_keys) == 0) { - cust_filt_keys = matrix(0, rows=1, cols=1); - cust_filt_nat = matrix(0, rows=1, cols=1); -} -cust_filt = cbind(cust_filt_keys,cust_filt_nat); - #print("LO,DATE,CUST,PART,SUPP") #print(toString(lineorder_matrix_min[1,])) #print(toString(date_matrix_min[1,])) @@ -136,65 +144,97 @@ cust_filt = cbind(cust_filt_keys,cust_filt_nat); # -- JOIN TABLES WITH RA-JOIN FUNCTION -- - +lo_cust = matrix(0, rows=0, cols=1); +lo_cust_supp = matrix(0, rows=0, cols=1); +lo_cust_supp_part = matrix(0, rows=0, cols=1); +joined_matrix = matrix(0, rows=0, cols=1); # Join LINEORDER table with CUST, SUPPLIER, PART, DATE tables (star schema) # Join order does matter! # WHERE LO_CUSTKEY = C_CUSTKEY -lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +if(hasRows){ + lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); + if(nrow(lo_cust[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_SUPPKEY = S_SUPPKEY -lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=5, method="hash2"); +if(hasRows){ + lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=5, method="hash2"); + if(nrow(lo_cust_supp[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_PARTKEY = P_PARTKEY -lo_cust_supp_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lo_cust_supp, colB=5, method="hash2"); +if(hasRows){ + lo_cust_supp_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lo_cust_supp, colB=5, method="hash2"); + if(nrow(lo_cust_supp_part[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_ORDERDATE = D_DATEKEY # (D_DATEKEY | D_YEAR) | (P_PARTKEY | S_SUPPKEY | C_CUSTKEY | C_NATION | # LO_CUSTKEY | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE | LO_SUPPLYCOST) -joined_matrix = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_cust_supp_part, colB=8, method="hash2"); - +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_cust_supp_part, colB=8, method="hash2"); + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} #print(toString(joined_matrix[1,])) # -- Group-By and Aggregation (SUM)-- -# Group-By -c_nat = joined_matrix[,6] -d_year = joined_matrix[,2] -lo_revenue = joined_matrix[,11] -lo_supplycost = joined_matrix[,12] -profit = lo_revenue - lo_supplycost; +if(hasRows){ + # Group-By + c_nat = joined_matrix[,6] + d_year = joined_matrix[,2] + lo_revenue = joined_matrix[,11] + lo_supplycost = joined_matrix[,12] + profit = lo_revenue - lo_supplycost; -# CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 S_NATION -max_d_year = max(d_year); -max_c_nat= max(c_nat); + # CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 S_NATION + max_d_year = max(d_year); + max_c_nat= max(c_nat); -d_year_scale_f = ceil(max_d_year) + 1; -c_nat_scale_f = ceil(max_c_nat) + 1; + d_year_scale_f = ceil(max_d_year) + 1; + c_nat_scale_f = ceil(max_c_nat) + 1; -combined_key = d_year * c_nat_scale_f + c_nat; + combined_key = d_year * c_nat_scale_f + c_nat; -group_input = cbind(profit, combined_key) + group_input = cbind(profit, combined_key) -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); -#print(toString(agg_result[1,])); + agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + #print(toString(agg_result[1,])); -# Aggregation (SUM) -key = agg_result[, 1]; -profit = rowSums(agg_result[, 2:ncol(agg_result)]); + # Aggregation (SUM) + key = agg_result[, 1]; + profit = rowSums(agg_result[, 2:ncol(agg_result)]); -# EXTRACTING D_YEAR, C_NATION -d_year = round(floor(key / (c_nat_scale_f))); -c_nat = round(floor((key %% (c_nat_scale_f)))); + # EXTRACTING D_YEAR, C_NATION + d_year = round(floor(key / (c_nat_scale_f))); + c_nat = round(floor((key %% (c_nat_scale_f)))); -result = cbind(d_year, c_nat, profit, key); + result = cbind(d_year, c_nat, profit, key); -# -- Sorting -- -- Sorting int columns works, but strings do not. -# ORDER BY D_YEAR, C_NATION ASC -result_ordered = order(target=result, by=4, decreasing=FALSE, index.return=FALSE); + # -- Sorting -- -- Sorting int columns works, but strings do not. + # ORDER BY D_YEAR, C_NATION ASC + result_ordered = order(target=result, by=4, decreasing=FALSE, index.return=FALSE); -c_nat_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=cust_nat_meta); + c_nat_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=cust_nat_meta); -res = cbind(as.frame(result_ordered[,1]), c_nat_dec, as.frame(result_ordered[,3])) ; + res = cbind(as.frame(result_ordered[,1]), c_nat_dec, as.frame(result_ordered[,3])) ; -# Print result -print("d_year | c_nation | PROFIT") -print(res) + # Print result + print("d_year | c_nation | PROFIT") + print(res) -print("\nQ4.1 finished.\n"); \ No newline at end of file + print("\nQ4.1 finished.\n"); +} +else{ + # If the result table has 0 rows, skip group-by and aggregation. + # Print result + print("d_year | c_nation | PROFIT") + print("The result table has 0 rows.") + + print("\nQ4.1 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q4_2.dml b/scripts/ssb/queries/q4_2.dml index 418cfbec00e..cb7794a56af 100644 --- a/scripts/ssb/queries/q4_2.dml +++ b/scripts/ssb/queries/q4_2.dml @@ -60,7 +60,9 @@ date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; +hasRows = 1; # If hasRows = 0, the result table is empty. # -- Data preparation -- @@ -81,70 +83,83 @@ date_matrix_min = as.matrix(date_csv_min); # WHERE D_YEAR = 1997 OR D_YEAR = 1998 d_filtA = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1997); d_filtB = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1998); - -d_filt = rbind(d_filtA,d_filtB) - +d_filt = matrix(0, rows=0, cols=1); +d_filt = rbind(d_filtA,d_filtB); +if(as.scalar(d_filt[1,1]) == 0){ + hasRows = 0; +} # Prepare PART table on-the-fly encodings # Extracted: COL-1 | COL-3 | COL-4 # P_PARTKEY | P_MFGR | P_CATEGORY # (only need P_CATEGORY encoding, filter by P_MFGR string) [part_cat_enc_f, part_cat_meta] = transformencode(target=part_csv[,4], spec=general_spec); -# Build filtered PART table (p_category == 'MFGR#1' OR p_category == 'MFGR#2'), keeping key and encoded category part_filt_keys = matrix(0, rows=0, cols=1); part_filt_cat = matrix(0, rows=0, cols=1); -for (i in 1:nrow(part_csv)) { - p_elem = as.scalar(part_csv[i,3]) - if ( p_elem == "MFGR#1" | p_elem == "MFGR#2" ) { - key_val = as.double(as.scalar(part_csv[i,1])); - cat_code = as.double(as.scalar(part_cat_enc_f[i,1])); - part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); - part_filt_cat = rbind(part_filt_cat, matrix(cat_code, rows=1, cols=1)); +part_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered PART table (p_category == 'MFGR#1' OR p_category == 'MFGR#2'), keeping key and encoded category + for (i in 1:nrow(part_csv)) { + p_elem = as.scalar(part_csv[i,3]) + if ( p_elem == "MFGR#1" | p_elem == "MFGR#2" ) { + key_val = as.double(as.scalar(part_csv[i,1])); + cat_code = as.double(as.scalar(part_cat_enc_f[i,1])); + part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); + part_filt_cat = rbind(part_filt_cat, matrix(cat_code, rows=1, cols=1)); + } } + if (nrow(part_filt_keys) == 0) { + hasRows = 0; + } + else{ + part_filt = cbind(part_filt_keys, part_filt_cat); + } + } -if (nrow(part_filt_keys) == 0) { - part_filt_keys = matrix(0, rows=1, cols=1); - part_filt_cat = matrix(0, rows=1, cols=1); -} -part_filt = cbind(part_filt_keys, part_filt_cat); - # Prepare SUPPLIER table on-the-fly encodings # Extracted: COL-1 | COL-5 | COL-6 # S_SUPPKEY | S_NATION | S_REGION # (only need S_NATION encoding, filter by S_REGION string) [supp_nat_enc_f, supp_nat_meta] = transformencode(target=supp_csv[,5], spec=general_spec); -# Build filtered SUPPLIER table (s_nation == 'AMERICA') supp_filt_keys = matrix(0, rows=0, cols=1); supp_filt_nat = matrix(0, rows=0, cols=1); -for (i in 1:nrow(supp_csv)) { - if (as.scalar(supp_csv[i,6]) == "AMERICA") { - key_val = as.double(as.scalar(supp_csv[i,1])); - nat_code = as.double(as.scalar(supp_nat_enc_f[i,1])); - supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); - supp_filt_nat = rbind(supp_filt_nat, matrix(nat_code, rows=1, cols=1)); +supp_filt = matrix(0, rows=0, cols=1); + +if(hasRows){ + # Build filtered SUPPLIER table (S_REGION == 'AMERICA') + for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,6]) == "AMERICA") { + key_val = as.double(as.scalar(supp_csv[i,1])); + nat_code = as.double(as.scalar(supp_nat_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_nat = rbind(supp_filt_nat, matrix(nat_code, rows=1, cols=1)); + } } -} -if (nrow(supp_filt_keys) == 0) { - supp_filt_keys = matrix(0, rows=1, cols=1); - supp_filt_nat = matrix(0, rows=1, cols=1); -} -supp_filt = cbind(supp_filt_keys, supp_filt_nat); + if (nrow(supp_filt_keys) == 0) { + hasRows = 0; + } + else{ + supp_filt = cbind(supp_filt_keys, supp_filt_nat); + } +} # Extracted: COL-1 | COL-6 # C_CUSTKEY | C_REGION -# Build filtered CUSTOMER table (c_nation == 'AMERICA') +# Build filtered CUSTOMER table (C_REGION == 'AMERICA') cust_filt = matrix(0, rows=0, cols=1); -for (i in 1:nrow(cust_csv)) { - if (as.scalar(cust_csv[i,6]) == "AMERICA") { - key_val = as.double(as.scalar(cust_csv[i,1])); - cust_filt = rbind(cust_filt, matrix(key_val, rows=1, cols=1)); +if(hasRows){ + for (i in 1:nrow(cust_csv)) { + if (as.scalar(cust_csv[i,6]) == "AMERICA") { + key_val = as.double(as.scalar(cust_csv[i,1])); + cust_filt = rbind(cust_filt, matrix(key_val, rows=1, cols=1)); + } + } + if (nrow(cust_filt) == 0) { + hasRows = 0; } } -if (nrow(cust_filt) == 0) { - cust_filt = matrix(0, rows=1, cols=1); -} - #print("LO,DATE,CUST,PART,SUPP") #print(toString(lineorder_matrix_min[1,])) #print(toString(date_matrix_min[1,])) @@ -157,64 +172,93 @@ if (nrow(cust_filt) == 0) { # Join LINEORDER table with CUST, SUPPLIER, PART, DATE tables (star schema) # Join order does matter! # WHERE LO_CUSTKEY = C_CUSTKEY -lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); +if(hasRows){ + lo_cust = raJoin::m_raJoin(A=cust_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); + if(nrow(lo_cust[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_SUPPKEY = S_SUPPKEY -lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); +if(hasRows){ + lo_cust_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_cust, colB=4, method="hash2"); + if(nrow(lo_cust_supp[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_PARTKEY = P_PARTKEY -lo_cust_supp_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lo_cust_supp, colB=5, method="hash2"); +if(hasRows){ + lo_cust_supp_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lo_cust_supp, colB=5, method="hash2"); + if(nrow(lo_cust_supp_part[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_ORDERDATE = D_DATEKEY # (D_DATEKEY | D_YEAR) | (P_PARTKEY | P_CATEGORY | (S_SUPPKEY | S_NATION | C_CUSTKEY | # LO_CUSTKEY | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE | LO_SUPPLYCOST) -joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp_part, colB=9, method="hash2"); - +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_cust_supp_part, colB=9, method="hash2"); + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} # -- Group-By and Aggregation (SUM)-- -# Group-By -d_year = joined_matrix[,2] -p_cat = joined_matrix[,4] -s_nat = joined_matrix[,6] -lo_revenue = joined_matrix[,12] -lo_supplycost = joined_matrix[,13] -profit = lo_revenue - lo_supplycost; +if(hasRows){ + # Group-By + d_year = joined_matrix[,2] + p_cat = joined_matrix[,4] + s_nat = joined_matrix[,6] + lo_revenue = joined_matrix[,12] + lo_supplycost = joined_matrix[,13] + profit = lo_revenue - lo_supplycost; -# CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 S_NATION, 3 P_CATEGORY -max_d_year = max(d_year); -max_s_nat= max(s_nat); -max_p_cat = max(p_cat); + # CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 S_NATION, 3 P_CATEGORY + max_d_year = max(d_year); + max_s_nat= max(s_nat); + max_p_cat = max(p_cat); -d_year_scale_f = ceil(max_d_year) + 1; -s_nat_scale_f = ceil(max_s_nat) + 1; -p_cat_scale_f = ceil(max_p_cat) + 1; + d_year_scale_f = ceil(max_d_year) + 1; + s_nat_scale_f = ceil(max_s_nat) + 1; + p_cat_scale_f = ceil(max_p_cat) + 1; -combined_key = d_year * s_nat_scale_f * p_cat_scale_f + s_nat * p_cat_scale_f + p_cat; + combined_key = d_year * s_nat_scale_f * p_cat_scale_f + s_nat * p_cat_scale_f + p_cat; -group_input = cbind(profit, combined_key) + group_input = cbind(profit, combined_key) -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); -#print(toString(agg_result[1,])); + agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + #print(toString(agg_result[1,])); -# Aggregation (SUM) -key = agg_result[, 1]; -profit = rowSums(agg_result[, 2:ncol(agg_result)]); + # Aggregation (SUM) + key = agg_result[, 1]; + profit = rowSums(agg_result[, 2:ncol(agg_result)]); -# EXTRACTING D_YEAR, S_NATION, P_CATEGORY -d_year = round(floor(key / (s_nat_scale_f * p_cat_scale_f))); -s_nat = round(floor((key %% (s_nat_scale_f * p_cat_scale_f)) / p_cat_scale_f)); -p_cat = round(key %% p_cat_scale_f); + # EXTRACTING D_YEAR, S_NATION, P_CATEGORY + d_year = round(floor(key / (s_nat_scale_f * p_cat_scale_f))); + s_nat = round(floor((key %% (s_nat_scale_f * p_cat_scale_f)) / p_cat_scale_f)); + p_cat = round(key %% p_cat_scale_f); -result = cbind(d_year, s_nat, p_cat, profit, key); + result = cbind(d_year, s_nat, p_cat, profit, key); -# -- Sorting -- -- Sorting int columns works, but strings do not. -# ORDER BY D_YEAR, S_NATION, P_CATEGORY ASC -result_ordered = order(target=result, by=5, decreasing=FALSE, index.return=FALSE); + # -- Sorting -- -- Sorting int columns works, but strings do not. + # ORDER BY D_YEAR, S_NATION, P_CATEGORY ASC + result_ordered = order(target=result, by=5, decreasing=FALSE, index.return=FALSE); -s_nat_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_nat_meta); -p_cat_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_cat_meta); + s_nat_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_nat_meta); + p_cat_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_cat_meta); -res = cbind(as.frame(result_ordered[,1]), s_nat_dec, p_cat_dec, as.frame(result_ordered[,4])) ; + res = cbind(as.frame(result_ordered[,1]), s_nat_dec, p_cat_dec, as.frame(result_ordered[,4])) ; -# Print result -print("d_year | s_nation | p_category | PROFIT") -print(res) + # Print result + print("d_year | s_nation | p_category | PROFIT"); + print(res); -print("\nQ4.2 finished.\n"); \ No newline at end of file + print("\nQ4.2 finished.\n"); +} +else{ + # If the result table has 0 rows, skip group-by and aggregation. + # Print result + print("d_year | s_nation | p_category | PROFIT"); + print("The result table has 0 rows."); + + print("\nQ4.2 finished.\n"); +} \ No newline at end of file diff --git a/scripts/ssb/queries/q4_3.dml b/scripts/ssb/queries/q4_3.dml index 583f0db901f..56d0d31788e 100644 --- a/scripts/ssb/queries/q4_3.dml +++ b/scripts/ssb/queries/q4_3.dml @@ -56,7 +56,9 @@ date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); supp_csv = read(input_dir + "/supplier.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); +# General variables. general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; +hasRows = 1; # If hasRows = 0, the result table is empty. # -- Data preparation -- @@ -81,8 +83,10 @@ cust_matrix_min = as.matrix(cust_csv[, 1]); # WHERE D_YEAR = 1997 OR D_YEAR = 1998 d_filtA = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1997); d_filtB = raSel::m_raSelection(date_matrix_min, col=2, op="==", val=1998); - d_filt = rbind(d_filtA,d_filtB) +if(as.scalar(d_filt[1,1]) == 0){ + hasRows = 0; +} # Prepare PART table on-the-fly encodings # Extracted: COL-1 | COL-5 @@ -90,24 +94,27 @@ d_filt = rbind(d_filtA,d_filtB) # (only need p_brand encoding, filter by p_category string) [part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); #print(toString(part_brand_enc_f)); - -# Build filtered PART table (p_brand == 'MFGR#14'), keeping key and encoded brand part_filt_keys = matrix(0, rows=0, cols=1); part_filt_brand = matrix(0, rows=0, cols=1); -for (i in 1:nrow(part_csv)) { - p_elem = as.scalar(part_csv[i,4]) - if ( p_elem == "MFGR#14" ) { - key_val = as.double(as.scalar(part_csv[i,1])); - brand_code = as.double(as.scalar(part_brand_enc_f[i,1])); - part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); - part_filt_brand = rbind(part_filt_brand, matrix(brand_code, rows=1, cols=1)); +part_filt = matrix(0, rows=0, cols=1); +if(hasRows){ + # Build filtered PART table (p_brand == 'MFGR#14'), keeping key and encoded brand + for (i in 1:nrow(part_csv)) { + p_elem = as.scalar(part_csv[i,4]) + if ( p_elem == "MFGR#14" ) { + key_val = as.double(as.scalar(part_csv[i,1])); + brand_code = as.double(as.scalar(part_brand_enc_f[i,1])); + part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); + part_filt_brand = rbind(part_filt_brand, matrix(brand_code, rows=1, cols=1)); + } + } + if (nrow(part_filt_keys) == 0) { + hasRows = 0; + } + else{ + part_filt = cbind(part_filt_keys, part_filt_brand); } } -if (nrow(part_filt_keys) == 0) { - part_filt_keys = matrix(0, rows=1, cols=1); - part_filt_brand = matrix(0, rows=1, cols=1); -} -part_filt = cbind(part_filt_keys, part_filt_brand); #print(part_filt[1,]) # Prepare SUPPLIER table on-the-fly encodings @@ -116,23 +123,25 @@ part_filt = cbind(part_filt_keys, part_filt_brand); # (only need S_CITY encoding, filter by S_NATION string) [supp_city_enc_f, supp_city_meta] = transformencode(target=supp_csv[,4], spec=general_spec); -# Build filtered SUPPLIER table (S_NATION == 'UNITED STATES') -supp_filt_keys = matrix(0, rows=0, cols=1); -supp_filt_city = matrix(0, rows=0, cols=1); -for (i in 1:nrow(supp_csv)) { - if (as.scalar(supp_csv[i,5]) == "UNITED STATES") { - key_val = as.double(as.scalar(supp_csv[i,1])); - city_code = as.double(as.scalar(supp_city_enc_f[i,1])); - supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); - supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); +if(hasRows){ + # Build filtered SUPPLIER table (S_NATION == 'UNITED STATES') + supp_filt_keys = matrix(0, rows=0, cols=1); + supp_filt_city = matrix(0, rows=0, cols=1); + for (i in 1:nrow(supp_csv)) { + if (as.scalar(supp_csv[i,5]) == "UNITED STATES") { + key_val = as.double(as.scalar(supp_csv[i,1])); + city_code = as.double(as.scalar(supp_city_enc_f[i,1])); + supp_filt_keys = rbind(supp_filt_keys, matrix(key_val, rows=1, cols=1)); + supp_filt_city = rbind(supp_filt_city, matrix(city_code, rows=1, cols=1)); + } + } + if (nrow(supp_filt_keys) == 0) { + hasRows = 0; + } + else{ + supp_filt = cbind(supp_filt_keys, supp_filt_city); } } -if (nrow(supp_filt_keys) == 0) { - supp_filt_keys = matrix(0, rows=1, cols=1); - supp_filt_city = matrix(0, rows=1, cols=1); -} -supp_filt = cbind(supp_filt_keys, supp_filt_city); - #print("LO,DATE,CUST,PART,SUPP") #print(toString(lineorder_matrix_min[1,])) #print(toString(date_matrix_min[1,])) @@ -145,63 +154,93 @@ supp_filt = cbind(supp_filt_keys, supp_filt_city); # Join LINEORDER table with PART, SUPPLIER, DATE, CUST tables (star schema) # Join order does matter! # WHERE LO_PARTKEY = P_PARTKEY -lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=2, method="hash2"); +if(hasRows){ + lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=2, method="hash2"); + if(nrow(lo_part[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_SUPPKEY = S_SUPPKEY -lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=5, method="hash2"); +if(hasRows){ + lo_part_supp = raJoin::m_raJoin(A=supp_filt, colA=1, B=lo_part, colB=5, method="hash2"); + if(nrow(lo_part_supp[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_ORDERDATE = D_DATEKEY -lo_part_supp_date = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_part_supp, colB=8, method="hash2"); +if(hasRows){ + lo_part_supp_date = raJoin::m_raJoin(A=d_filt, colA=1, B=lo_part_supp, colB=8, method="hash2"); + if(nrow(lo_part_supp_date[,1]) == 0){ + hasRows = 0; + } +} # WHERE LO_CUSTKEY = C_CUSTKEY # (C_CUSTKEY) | (D_DATEKEY | D_YEAR | S_SUPPKEY | S_CITY | P_PARTKEY | P_BRAND | # LO_CUSTKEY | LO_PARTKEY | LO_SUPPKEY | LO_ORDERDATE | LO_REVENUE | LO_SUPPLYCOST) -joined_matrix = raJoin::m_raJoin(A=cust_matrix_min, colA=1, B=lo_part_supp_date, colB=7, method="hash2"); +if(hasRows){ + joined_matrix = raJoin::m_raJoin(A=cust_matrix_min, colA=1, B=lo_part_supp_date, colB=7, method="hash2"); + if(nrow(joined_matrix[,1]) == 0){ + hasRows = 0; + } +} +#print(nrow(joined_matrix[,1])); #print(toString(joined_matrix[1,])) # -- Group-By and Aggregation (SUM)-- +if(hasRows){ + # Group-By + d_year = joined_matrix[,3] + s_city = joined_matrix[,5] + p_brand = joined_matrix[,7] + lo_revenue = joined_matrix[,12] + lo_supplycost = joined_matrix[,13] + profit = lo_revenue - lo_supplycost; -# Group-By -d_year = joined_matrix[,3] -s_city = joined_matrix[,5] -p_brand = joined_matrix[,7] -lo_revenue = joined_matrix[,12] -lo_supplycost = joined_matrix[,13] -profit = lo_revenue - lo_supplycost; + # CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 S_CITY, 3 P_BRAND + max_d_year = max(d_year); + max_s_city= max(s_city); + max_p_brand = max(p_brand); -# CALCULATING COMBINATION KEY WITH PRIORITY:1 D_YEAR, 2 S_CITY, 3 P_BRAND -max_d_year = max(d_year); -max_s_city= max(s_city); -max_p_brand = max(p_brand); + d_year_scale_f = ceil(max_d_year) + 1; + s_city_scale_f = ceil(max_s_city) + 1; + p_brand_scale_f = ceil(max_p_brand) + 1; -d_year_scale_f = ceil(max_d_year) + 1; -s_city_scale_f = ceil(max_s_city) + 1; -p_brand_scale_f = ceil(max_p_brand) + 1; + combined_key = d_year * s_city_scale_f * p_brand_scale_f + s_city * p_brand_scale_f + p_brand; -combined_key = d_year * s_city_scale_f * p_brand_scale_f + s_city * p_brand_scale_f + p_brand; + group_input = cbind(profit, combined_key) + agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); -group_input = cbind(profit, combined_key) -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); + # Aggregation (SUM) + key = agg_result[, 1]; + profit = rowSums(agg_result[, 2:ncol(agg_result)]); -# Aggregation (SUM) -key = agg_result[, 1]; -profit = rowSums(agg_result[, 2:ncol(agg_result)]); + # EXTRACTING D_YEAR, S_CITY, P_BRAND + d_year = round(floor(key / (s_city_scale_f * p_brand_scale_f))); + s_city = round(floor((key %% (s_city_scale_f * p_brand_scale_f)) / p_brand_scale_f)); + p_brand = round(key %% p_brand_scale_f); -# EXTRACTING D_YEAR, S_CITY, P_BRAND -d_year = round(floor(key / (s_city_scale_f * p_brand_scale_f))); -s_city = round(floor((key %% (s_city_scale_f * p_brand_scale_f)) / p_brand_scale_f)); -p_brand = round(key %% p_brand_scale_f); + result = cbind(d_year, s_city, p_brand, profit, key); -result = cbind(d_year, s_city, p_brand, profit, key); + # -- Sorting -- -- Sorting int columns works, but strings do not. + # ORDER BY D_YEAR, S_CITY, P_BRAND ASC + result_ordered = order(target=result, by=5, decreasing=FALSE, index.return=FALSE); -# -- Sorting -- -- Sorting int columns works, but strings do not. -# ORDER BY D_YEAR, S_CITY, P_BRAND ASC -result_ordered = order(target=result, by=5, decreasing=FALSE, index.return=FALSE); + s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); + p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); -s_city_dec = transformdecode(target=result_ordered[,2], spec=general_spec, meta=supp_city_meta); -p_brand_dec = transformdecode(target=result_ordered[,3], spec=general_spec, meta=part_brand_meta); + res = cbind(as.frame(result_ordered[,1]), s_city_dec, p_brand_dec, as.frame(result_ordered[,4])) ; -res = cbind(as.frame(result_ordered[,1]), s_city_dec, p_brand_dec, as.frame(result_ordered[,4])) ; + # Print result + print("d_year | s_city | p_brand | PROFIT"); + print(res); -# Print result -print("d_year | s_city | p_brand | PROFIT"); -print(res); + print("\nQ4.3 finished.\n"); +} +else{ + # If the result table has 0 rows, skip group-by and aggregation. + # Print result + print("d_year | s_city | p_brand | PROFIT"); + print("The result table has 0 rows."); -print("\nQ4.3 finished.\n"); + print("\nQ4.3 finished.\n"); +} From 0b7e6bbce724bc6de5eaf2f66b8f076720ed2799 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Tue, 13 Jan 2026 14:29:11 +0100 Subject: [PATCH 12/22] Delete some unneeded files. --- scripts/ssb/queries/q2_1_groupby.dml | 122 -------------------- scripts/ssb/queries/simple_join_example.dml | 84 -------------- 2 files changed, 206 deletions(-) delete mode 100644 scripts/ssb/queries/q2_1_groupby.dml delete mode 100644 scripts/ssb/queries/simple_join_example.dml diff --git a/scripts/ssb/queries/q2_1_groupby.dml b/scripts/ssb/queries/q2_1_groupby.dml deleted file mode 100644 index 9de17dda531..00000000000 --- a/scripts/ssb/queries/q2_1_groupby.dml +++ /dev/null @@ -1,122 +0,0 @@ -/* DML-script implementing the ssb query Q1.1 in SystemDS. -**input_dir="/scripts/ssb/data" - -* Run with docker: -docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q2_1_groupby.dml -nvargs input_dir="/scripts/data/" - -SELECT SUM(lo_revenue), p_brand -FROM part, lineorder -WHERE - lo_partkey = p_partkey - AND p_category = 'MFGR#12' - GROUP BY p_brand - ORDER BY p_brand; - -*Please run the original SQL query (eg. in Postgres) -to verify the correctness of DML version. --> First tests: Works on the dataset with scale factor 0.1. - -*Based on the older implementations. -https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q1_1.dml -https://github.com/ghafek/systemds/blob/feature/ssb-benchmark/scripts/ssb/queries/q4_3.dml -In comparison to older version the join method was changed -from sort-merge to hash2 to improve the performance. - -Input parameters: -input_dir - Path to input directory containing the table files (e.g., ./data) -*/ - -# Call ra-modules with ra-functions. -source("./scripts/builtin/raSelection.dml") as raSel -source("./scripts/builtin/raJoin.dml") as raJoin -source("./scripts/builtin/raGroupby.dml") as raGrp - -# Set input parameters. -input_dir = ifdef($input_dir, "./data"); -print("Loading tables from directory: " + input_dir); - -# Read and load input CSV files from date and lineorder. -#date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); -lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); -part_csv = read(input_dir + "/part.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); - -general_spec = "{ \"ids\": false, \"recode\": [\"C1\"] }"; - -# -- Data preparation -- - -# Extract only the necessary columns from part table. -# Extracted: COL-4 | COL-13 -# => LO_PARTKEY | LO_REVENUE - -lineorder_csv_min = cbind(lineorder_csv[, 4], lineorder_csv[, 13]); -lineorder_matrix_min = as.matrix(lineorder_csv_min); - -# -- Filter the data with RA-SELECTION function. - -## Prepare PART on-the-fly encodings (only need p_brand encoding, filter by p_category string) -# We'll encode column 5 (p_brand) on-the-fly and later filter by category string 'MFGR#12'. -[part_brand_enc_f, part_brand_meta] = transformencode(target=part_csv[,5], spec=general_spec); -#print(toString(part_brand_enc_f)); - -# Build filtered PART table (p_category == 'MFGR#12'), keeping key and encoded brand -part_filt_keys = matrix(0, rows=0, cols=1); -part_filt_brand = matrix(0, rows=0, cols=1); -for (i in 1:nrow(part_csv)) { - if (as.scalar(part_csv[i,4]) == "MFGR#12") { - key_val = as.double(as.scalar(part_csv[i,1])); - brand_code = as.double(as.scalar(part_brand_enc_f[i,1])); - part_filt_keys = rbind(part_filt_keys, matrix(key_val, rows=1, cols=1)); - part_filt_brand = rbind(part_filt_brand, matrix(brand_code, rows=1, cols=1)); - } -} -if (nrow(part_filt_keys) == 0) { - part_filt_keys = matrix(0, rows=1, cols=1); - part_filt_brand = matrix(0, rows=1, cols=1); -} -part_filt = cbind(part_filt_keys, part_filt_brand); - -# -- Join -- -# Join LINEORDER and DATE tables WHERE LO_PARTKEY = P_PARTKEY -# P_PARTKEY | P_BRAND | LO_PARTKEY | LO_REVENUE -lo_part = raJoin::m_raJoin(A=part_filt, colA=1, B=lineorder_matrix_min, colB=1, method="hash2"); -#print(toString(lo_part[1,])) - -#print(lo_part[1,]) -# -- GROUP-BY & AGGREGATION -- -#print(toString(p_brand_dec)) -#print("LO-PART JOINED."); - -# -- Group-By and Aggregation (SUM)-- - -# Group-By -p_brand = lo_part[,2] -lo_revenue = lo_part[,4] - -# CALCULATING COMBINATION KEY WITH PRIORITY:P_BRAND -max_p_brand = max(p_brand); -p_brand_scale_f = ceil(max_p_brand) + 1; - -combined_key = p_brand; - -group_input = cbind(lo_revenue, combined_key) - -agg_result = raGrp::m_raGroupby(X=group_input, col=2, method="nested-loop"); - -# Aggregation (SUM) -key = agg_result[, 1]; -revenue = rowSums(agg_result[, 2:ncol(agg_result)]); -p_brand = round(key %% p_brand_scale_f); -result = cbind(p_brand, revenue); - -# -- Sorting -- -- Sorting not working!!! -# ORDER BY P_BRAND ASC -result_ordered = order(target=result, by=1, decreasing=FALSE, index.return=FALSE); - -p_brand_dec = transformdecode(target=result_ordered[,1], spec=general_spec, meta=part_brand_meta); -result = cbind(p_brand_dec, as.frame(result_ordered[,2])); - -# Print result -print("p_brand | SUM(lo_revenue)") -print(result) - -#print("Q4.2 finished.\n"); diff --git a/scripts/ssb/queries/simple_join_example.dml b/scripts/ssb/queries/simple_join_example.dml deleted file mode 100644 index a8014fcec85..00000000000 --- a/scripts/ssb/queries/simple_join_example.dml +++ /dev/null @@ -1,84 +0,0 @@ -/* -docker run -it --rm -v $PWD:/scripts/ apache/systemds -f /scripts/queries/simple_join_example.dml -nvargs input_dir="/scripts/data" -WARNING: Using incubator modules: jdk.incubator.vector -Hello SystemDS! -Hello, World! -Loading tables from directory: /scripts/data -SystemDS Statistics: -Total execution time: 1.992 sec. - - -An Error Occurred : - DMLRuntimeException -- org.apache.sysds.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 8 and 58 -- Error evaluating instruction: CP°FCall°./scripts/builtin/raJoin.dml°m_raJoin°true°5°1°A=_mVar88·MATRIX·FP64°colA=1·SCALAR·INT64·true°B=_mVar102·MATRIX·FP64°colB=1·SCALAR·INT64·true°method=hash·SCALAR·STRING·true°joined_matrix - DMLRuntimeException -- ERROR: Runtime error in program block generated from statement block between lines 8 and 58 -- Error evaluating instruction: CP°FCall°./scripts/builtin/raJoin.dml°m_raJoin°true°5°1°A=_mVar88·MATRIX·FP64°colA=1·SCALAR·INT64·true°B=_mVar102·MATRIX·FP64°colB=1·SCALAR·INT64·true°method=hash·SCALAR·STRING·true°joined_matrix - DMLRuntimeException -- error executing function ./scripts/builtin/raJoin.dml::m_raJoin - DMLRuntimeException -- ERROR: Runtime error in function program block generated from function statement block between lines 39 and 222 -- Error evaluating function program block - DMLRuntimeException -- ERROR: Runtime error in program block generated from statement block between lines 137 and 145 -- Error evaluating instruction: CP°ba+*°_mVar169·MATRIX·FP64°A·MATRIX·FP64°_mVar170·MATRIX·FP64°8 - RuntimeException -- Dimensions do not match for matrix multiplication (2496!=2557). - -*/ -#Start in systemds/scripts/ssb -#docker run -it -v $PWD:/scripts/ apache/systemds -f /scripts/queries/simple_join_example.dml -nvargs input_dir="/scripts/data" - -#Run and delete the container immediately. -#docker run -it --rm -v $PWD:/scripts/ apache/systemds -f /scripts/queries/simple_join_example.dml -nvargs input_dir="/scripts/data" - -print("Hello SystemDS!") -print("Hello, World!") - -/* DML-script implementing the ssb query Q1.1 in SystemDS. -**input_dir="/scripts/ssb/data" -SELECT COUNT(*) -FROM lineorder, date -WHERE - lo_orderdate = d_datekey - AND lo_quantity > 25; - -Usage: (We did not use here) -./bin/systemds scripts/ssb/queries/q1_1.dml -nvargs input_dir="/path/to/data" -./bin/systemds scripts/ssb/queries/q1_1.dml -nvargs input_dir="/Users/ghafekalsaho/Desktop/data" -or with explicit -f flag: -./bin/systemds -f scripts/ssb/queries/q1_1.dml -nvargs input_dir="/path/to/data" - -Parameters: -input_dir - Path to input directory containing the table files (e.g., ./data) -*/ -# -- SOURCING THE RA-FUNCTIONS -- -source("./scripts/builtin/raSelection.dml") as raSel -source("./scripts/builtin/raJoin.dml") as raJoin - -# -- PARAMETER HANDLING -- -input_dir = ifdef($input_dir, "./data"); -print("Loading tables from directory: " + input_dir); -#input_dir = ifdef($input_dir, "./data"); -#print("Loading tables from directory: " + input_dir); - -# -- READING INPUT FILES -- -# CSV TABLES -date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); -lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); - -# -- PREPARING -- -# EXTRACTING MINIMAL DATE DATA TO OPTIMIZE RUNTIME => COL-1 : DATE-KEY | COL-5 : YEAR -date_csv_min = cbind(date_csv[, 1], date_csv[, 5]); -date_matrix_min = as.matrix(date_csv_min); - -# EXTRACTING MINIMAL LINEORDER DATA TO OPTIMIZE RUNTIME => COL-6 : LO_ORDERDATE | -# COL-9 : LO_QUANTITY -lineorder_csv_min = cbind(lineorder_csv[, 16], lineorder_csv[, 6], lineorder_csv[, 9]); -lineorder_matrix_min = as.matrix(lineorder_csv_min); - -# LO_QUANTITY < 25 -lo_quan_filt = raSel::m_raSelection(lineorder_matrix_min, col=3, op="<", val=25); - -# -- JOIN TABLES WITH RA-JOIN FUNCTION -- -# JOINING FILTERED LINEORDER TABLE WITH FILTERED DATE TABLE WHERE LO_ORDERDATE = D_DATEKEY -joined_matrix = raJoin::m_raJoin(A=date_matrix_min, colA=1, B=lo_quan_filt, colB=1, method="hash"); -print("LO-DATE JOINED."); - -count = nrow(joined_matrix[,1]) -#print("COUNT: " + count) -print("COUNT: " + as.integer(count)) - -#print("Helsimple_join_example finished.\n"); - From 4888139b7ebf9aaab9baffa13d825d483766621a Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Tue, 13 Jan 2026 17:55:41 +0100 Subject: [PATCH 13/22] Move ssb directory to staging. --- scripts/{ => staging}/ssb/ReadMe.md | 22 ++++++++++++++++++++++ scripts/{ => staging}/ssb/queries/q1_1.dml | 0 scripts/{ => staging}/ssb/queries/q1_2.dml | 4 +++- scripts/{ => staging}/ssb/queries/q1_3.dml | 0 scripts/{ => staging}/ssb/queries/q2_1.dml | 0 scripts/{ => staging}/ssb/queries/q2_2.dml | 0 scripts/{ => staging}/ssb/queries/q2_3.dml | 0 scripts/{ => staging}/ssb/queries/q3_1.dml | 0 scripts/{ => staging}/ssb/queries/q3_2.dml | 0 scripts/{ => staging}/ssb/queries/q3_3.dml | 0 scripts/{ => staging}/ssb/queries/q3_4.dml | 0 scripts/{ => staging}/ssb/queries/q4_1.dml | 0 scripts/{ => staging}/ssb/queries/q4_2.dml | 0 scripts/{ => staging}/ssb/queries/q4_3.dml | 0 scripts/{ => staging}/ssb/sql/q1.1.sql | 0 scripts/{ => staging}/ssb/sql/q1.2.sql | 0 scripts/{ => staging}/ssb/sql/q1.3.sql | 0 scripts/{ => staging}/ssb/sql/q2.1.sql | 0 scripts/{ => staging}/ssb/sql/q2.2.sql | 0 scripts/{ => staging}/ssb/sql/q2.3.sql | 0 scripts/{ => staging}/ssb/sql/q3.1.sql | 0 scripts/{ => staging}/ssb/sql/q3.2.sql | 0 scripts/{ => staging}/ssb/sql/q3.3.sql | 0 scripts/{ => staging}/ssb/sql/q3.4.sql | 0 scripts/{ => staging}/ssb/sql/q4.1.sql | 0 scripts/{ => staging}/ssb/sql/q4.2.sql | 0 scripts/{ => staging}/ssb/sql/q4.3.sql | 0 27 files changed, 25 insertions(+), 1 deletion(-) rename scripts/{ => staging}/ssb/ReadMe.md (80%) rename scripts/{ => staging}/ssb/queries/q1_1.dml (100%) rename scripts/{ => staging}/ssb/queries/q1_2.dml (97%) rename scripts/{ => staging}/ssb/queries/q1_3.dml (100%) rename scripts/{ => staging}/ssb/queries/q2_1.dml (100%) rename scripts/{ => staging}/ssb/queries/q2_2.dml (100%) rename scripts/{ => staging}/ssb/queries/q2_3.dml (100%) rename scripts/{ => staging}/ssb/queries/q3_1.dml (100%) rename scripts/{ => staging}/ssb/queries/q3_2.dml (100%) rename scripts/{ => staging}/ssb/queries/q3_3.dml (100%) rename scripts/{ => staging}/ssb/queries/q3_4.dml (100%) rename scripts/{ => staging}/ssb/queries/q4_1.dml (100%) rename scripts/{ => staging}/ssb/queries/q4_2.dml (100%) rename scripts/{ => staging}/ssb/queries/q4_3.dml (100%) rename scripts/{ => staging}/ssb/sql/q1.1.sql (100%) rename scripts/{ => staging}/ssb/sql/q1.2.sql (100%) rename scripts/{ => staging}/ssb/sql/q1.3.sql (100%) rename scripts/{ => staging}/ssb/sql/q2.1.sql (100%) rename scripts/{ => staging}/ssb/sql/q2.2.sql (100%) rename scripts/{ => staging}/ssb/sql/q2.3.sql (100%) rename scripts/{ => staging}/ssb/sql/q3.1.sql (100%) rename scripts/{ => staging}/ssb/sql/q3.2.sql (100%) rename scripts/{ => staging}/ssb/sql/q3.3.sql (100%) rename scripts/{ => staging}/ssb/sql/q3.4.sql (100%) rename scripts/{ => staging}/ssb/sql/q4.1.sql (100%) rename scripts/{ => staging}/ssb/sql/q4.2.sql (100%) rename scripts/{ => staging}/ssb/sql/q4.3.sql (100%) diff --git a/scripts/ssb/ReadMe.md b/scripts/staging/ssb/ReadMe.md similarity index 80% rename from scripts/ssb/ReadMe.md rename to scripts/staging/ssb/ReadMe.md index 2d3f262a983..810f099ca2a 100644 --- a/scripts/ssb/ReadMe.md +++ b/scripts/staging/ssb/ReadMe.md @@ -39,3 +39,25 @@ For more options look into the original documentation. - Use [SSB generator](https://github.com/eyalroz/ssb-dbgen) to generate data. - Run ssh scripts for experiments in the selected database systems. Use also scale factors. - Compare the runtime of each query in each system. + +Run with: +``` +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q4_3.dml -nvargs input_dir="/scripts/data/very_small_s0_01_dataset" +``` + +``` +docker exec -i postgres-ssb psql -U postgres -d ssb_s0_01 < ../sql/q1.2.sql +``` + +Troubleshooting: +Do not forget the correct paths. $(pwd) is where your main systemds directory is. +``` +export SYSTEMDS_ROOT=$(pwd) +export PATH=$SYSTEMDS_ROOT/bin:$PATH +``` +[Link](https://apache.github.io/systemds/site/run) + + +``` +mvn package #Use as compile. Why so long execution time? +``` \ No newline at end of file diff --git a/scripts/ssb/queries/q1_1.dml b/scripts/staging/ssb/queries/q1_1.dml similarity index 100% rename from scripts/ssb/queries/q1_1.dml rename to scripts/staging/ssb/queries/q1_1.dml diff --git a/scripts/ssb/queries/q1_2.dml b/scripts/staging/ssb/queries/q1_2.dml similarity index 97% rename from scripts/ssb/queries/q1_2.dml rename to scripts/staging/ssb/queries/q1_2.dml index 3bf9c533579..d376d8e6b8e 100644 --- a/scripts/ssb/queries/q1_2.dml +++ b/scripts/staging/ssb/queries/q1_2.dml @@ -4,6 +4,9 @@ * Run with docker: docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q1_1.dml -nvargs input_dir="/scripts/data/" +# Open in scripts/ssb/ +../../bin/systemds queries/q1_1.dml -nvargs input_dir="data/" + SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE FROM lineorder, date --dates WHERE @@ -34,7 +37,6 @@ source("./scripts/builtin/raJoin.dml") as raJoin # Set input parameters. input_dir = ifdef($input_dir, "./data"); print("Loading tables from directory: " + input_dir); - # Read and load input CSV files from date and lineorder. date_csv = read(input_dir + "/date.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); lineorder_csv = read(input_dir + "/lineorder.tbl", data_type="frame", format="csv", header=FALSE, sep="|"); diff --git a/scripts/ssb/queries/q1_3.dml b/scripts/staging/ssb/queries/q1_3.dml similarity index 100% rename from scripts/ssb/queries/q1_3.dml rename to scripts/staging/ssb/queries/q1_3.dml diff --git a/scripts/ssb/queries/q2_1.dml b/scripts/staging/ssb/queries/q2_1.dml similarity index 100% rename from scripts/ssb/queries/q2_1.dml rename to scripts/staging/ssb/queries/q2_1.dml diff --git a/scripts/ssb/queries/q2_2.dml b/scripts/staging/ssb/queries/q2_2.dml similarity index 100% rename from scripts/ssb/queries/q2_2.dml rename to scripts/staging/ssb/queries/q2_2.dml diff --git a/scripts/ssb/queries/q2_3.dml b/scripts/staging/ssb/queries/q2_3.dml similarity index 100% rename from scripts/ssb/queries/q2_3.dml rename to scripts/staging/ssb/queries/q2_3.dml diff --git a/scripts/ssb/queries/q3_1.dml b/scripts/staging/ssb/queries/q3_1.dml similarity index 100% rename from scripts/ssb/queries/q3_1.dml rename to scripts/staging/ssb/queries/q3_1.dml diff --git a/scripts/ssb/queries/q3_2.dml b/scripts/staging/ssb/queries/q3_2.dml similarity index 100% rename from scripts/ssb/queries/q3_2.dml rename to scripts/staging/ssb/queries/q3_2.dml diff --git a/scripts/ssb/queries/q3_3.dml b/scripts/staging/ssb/queries/q3_3.dml similarity index 100% rename from scripts/ssb/queries/q3_3.dml rename to scripts/staging/ssb/queries/q3_3.dml diff --git a/scripts/ssb/queries/q3_4.dml b/scripts/staging/ssb/queries/q3_4.dml similarity index 100% rename from scripts/ssb/queries/q3_4.dml rename to scripts/staging/ssb/queries/q3_4.dml diff --git a/scripts/ssb/queries/q4_1.dml b/scripts/staging/ssb/queries/q4_1.dml similarity index 100% rename from scripts/ssb/queries/q4_1.dml rename to scripts/staging/ssb/queries/q4_1.dml diff --git a/scripts/ssb/queries/q4_2.dml b/scripts/staging/ssb/queries/q4_2.dml similarity index 100% rename from scripts/ssb/queries/q4_2.dml rename to scripts/staging/ssb/queries/q4_2.dml diff --git a/scripts/ssb/queries/q4_3.dml b/scripts/staging/ssb/queries/q4_3.dml similarity index 100% rename from scripts/ssb/queries/q4_3.dml rename to scripts/staging/ssb/queries/q4_3.dml diff --git a/scripts/ssb/sql/q1.1.sql b/scripts/staging/ssb/sql/q1.1.sql similarity index 100% rename from scripts/ssb/sql/q1.1.sql rename to scripts/staging/ssb/sql/q1.1.sql diff --git a/scripts/ssb/sql/q1.2.sql b/scripts/staging/ssb/sql/q1.2.sql similarity index 100% rename from scripts/ssb/sql/q1.2.sql rename to scripts/staging/ssb/sql/q1.2.sql diff --git a/scripts/ssb/sql/q1.3.sql b/scripts/staging/ssb/sql/q1.3.sql similarity index 100% rename from scripts/ssb/sql/q1.3.sql rename to scripts/staging/ssb/sql/q1.3.sql diff --git a/scripts/ssb/sql/q2.1.sql b/scripts/staging/ssb/sql/q2.1.sql similarity index 100% rename from scripts/ssb/sql/q2.1.sql rename to scripts/staging/ssb/sql/q2.1.sql diff --git a/scripts/ssb/sql/q2.2.sql b/scripts/staging/ssb/sql/q2.2.sql similarity index 100% rename from scripts/ssb/sql/q2.2.sql rename to scripts/staging/ssb/sql/q2.2.sql diff --git a/scripts/ssb/sql/q2.3.sql b/scripts/staging/ssb/sql/q2.3.sql similarity index 100% rename from scripts/ssb/sql/q2.3.sql rename to scripts/staging/ssb/sql/q2.3.sql diff --git a/scripts/ssb/sql/q3.1.sql b/scripts/staging/ssb/sql/q3.1.sql similarity index 100% rename from scripts/ssb/sql/q3.1.sql rename to scripts/staging/ssb/sql/q3.1.sql diff --git a/scripts/ssb/sql/q3.2.sql b/scripts/staging/ssb/sql/q3.2.sql similarity index 100% rename from scripts/ssb/sql/q3.2.sql rename to scripts/staging/ssb/sql/q3.2.sql diff --git a/scripts/ssb/sql/q3.3.sql b/scripts/staging/ssb/sql/q3.3.sql similarity index 100% rename from scripts/ssb/sql/q3.3.sql rename to scripts/staging/ssb/sql/q3.3.sql diff --git a/scripts/ssb/sql/q3.4.sql b/scripts/staging/ssb/sql/q3.4.sql similarity index 100% rename from scripts/ssb/sql/q3.4.sql rename to scripts/staging/ssb/sql/q3.4.sql diff --git a/scripts/ssb/sql/q4.1.sql b/scripts/staging/ssb/sql/q4.1.sql similarity index 100% rename from scripts/ssb/sql/q4.1.sql rename to scripts/staging/ssb/sql/q4.1.sql diff --git a/scripts/ssb/sql/q4.2.sql b/scripts/staging/ssb/sql/q4.2.sql similarity index 100% rename from scripts/ssb/sql/q4.2.sql rename to scripts/staging/ssb/sql/q4.2.sql diff --git a/scripts/ssb/sql/q4.3.sql b/scripts/staging/ssb/sql/q4.3.sql similarity index 100% rename from scripts/ssb/sql/q4.3.sql rename to scripts/staging/ssb/sql/q4.3.sql From 28a11ef916e9180279c48a77c20face797b61056 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Fri, 16 Jan 2026 17:30:32 +0100 Subject: [PATCH 14/22] Docker compose initial version-Creating two containers. TO DO connect. --- scripts/staging/ssb/ReadMe.md | 8 ++++---- scripts/staging/ssb/test_scripts/Dockerfile | 16 +++++++++++++++ .../ssb/test_scripts/docker-compose.yaml | 20 +++++++++++++++++++ 3 files changed, 40 insertions(+), 4 deletions(-) create mode 100644 scripts/staging/ssb/test_scripts/Dockerfile create mode 100644 scripts/staging/ssb/test_scripts/docker-compose.yaml diff --git a/scripts/staging/ssb/ReadMe.md b/scripts/staging/ssb/ReadMe.md index 810f099ca2a..218c278fe4a 100644 --- a/scripts/staging/ssb/ReadMe.md +++ b/scripts/staging/ssb/ReadMe.md @@ -9,12 +9,12 @@ ## Setup -1. First, install [Docker](https://docs.docker.com/get-started/get-docker/) and its necessary libraries. +1. First, install [Docker](https://docs.docker.com/get-started/get-docker/), [Docker Compose](https://docs.docker.com/compose/install/) and its necessary libraries. - For Ubuntu, there is the [following tutorial using apt repository](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository). You can add [Docker Desktop](https://docs.docker.com/desktop/setup/install/linux/ubuntu/), too. + For Ubuntu, there is the following tutorials [for Docker](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) and [Docker Compose](https://docs.docker.com/compose/install/linux/#install-using-the-repository) using apt repository. You can add [Docker Desktop](https://docs.docker.com/desktop/setup/install/linux/ubuntu/), too. -2. Now, follow the tutorials to install Docker versions of database systems [SystemDS](https://apache.github.io/systemds/site/docker), [PostgreSQL](https://hub.docker.com/_/postgres), .... +1. Now, follow the tutorials to install Docker versions of database systems [SystemDS](https://apache.github.io/systemds/site/docker), [PostgreSQL](https://hub.docker.com/_/postgres), .... If the example in the SystemDS link does not work, use that code line instead. Create a DML file, open its directory and execute the code. @@ -42,7 +42,7 @@ For more options look into the original documentation. Run with: ``` -docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q4_3.dml -nvargs input_dir="/scripts/data/very_small_s0_01_dataset" +docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q4_3.dml -nvargs input_dir="/scripts/data/..." ``` ``` diff --git a/scripts/staging/ssb/test_scripts/Dockerfile b/scripts/staging/ssb/test_scripts/Dockerfile new file mode 100644 index 00000000000..4f71315bb1d --- /dev/null +++ b/scripts/staging/ssb/test_scripts/Dockerfile @@ -0,0 +1,16 @@ +# Help: https://docs.docker.com/compose/gettingstarted/#step-1-set-up +# Star Schema Benchmark data set generator (ssb-dbgen): +# https://github.com/eyalroz/ssb-dbgen.git + +FROM alpine:latest +WORKDIR /ssb-dbgen +ARG scale #scale-factor of the generated data. +RUN apk update +RUN apk add git gcc cmake make musl-dev +RUN git clone https://github.com/eyalroz/ssb-dbgen.git --depth 1 +# Build the generator +WORKDIR /ssb-dbgen/ssb-dbgen +RUN echo "build and generate data with datagen." +RUN cmake -B ./build && cmake --build ./build +# Run the generator (with -s ) +RUN build/dbgen -b dists.dss -v -s $scale diff --git a/scripts/staging/ssb/test_scripts/docker-compose.yaml b/scripts/staging/ssb/test_scripts/docker-compose.yaml new file mode 100644 index 00000000000..fb7c8371c88 --- /dev/null +++ b/scripts/staging/ssb/test_scripts/docker-compose.yaml @@ -0,0 +1,20 @@ +# Use +#https://stackoverflow.com/questions/35231362/dockerfile-and-docker-compose-not-updating-with-new-instructions +#scale=[YOUR_SCALE] docker-compose build --no-cache +#scale=[YOUR_SCALE] docker-compose up -d + +# Example +#scale=0.01 docker-compose build --no-cache +#scale=0.01 docker-compose up -d +services: + datagen: + build: + context: . + args: + - scale + + systemds: + image: apache/systemds:latest + #command: echo "I'm running ${COMPOSE_PROJECT_NAME}" + #command: docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/q1_3.dml -nvargs input_dir="/scripts/data/very_small_s0_01_dataset" + From 7d7381c7a816347b266b16a3df835089290cbe60 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Wed, 21 Jan 2026 02:39:54 +0100 Subject: [PATCH 15/22] shell run_script working as expected. docker compose version not working. --- scripts/staging/ssb/Dockerfile | 48 ++++++++++++++ scripts/staging/ssb/ReadMe.md | 61 +++++++++--------- scripts/staging/ssb/docker-compose.yaml | 35 ++++++++++ scripts/staging/ssb/run_script.sh | 64 +++++++++++++++++++ scripts/staging/ssb/test_scripts/Dockerfile | 16 ----- .../ssb/test_scripts/docker-compose.yaml | 20 ------ 6 files changed, 178 insertions(+), 66 deletions(-) create mode 100644 scripts/staging/ssb/Dockerfile create mode 100644 scripts/staging/ssb/docker-compose.yaml create mode 100755 scripts/staging/ssb/run_script.sh delete mode 100644 scripts/staging/ssb/test_scripts/Dockerfile delete mode 100644 scripts/staging/ssb/test_scripts/docker-compose.yaml diff --git a/scripts/staging/ssb/Dockerfile b/scripts/staging/ssb/Dockerfile new file mode 100644 index 00000000000..c7a0234dcbe --- /dev/null +++ b/scripts/staging/ssb/Dockerfile @@ -0,0 +1,48 @@ +# Help: https://docs.docker.com/compose/gettingstarted/#step-1-set-up +# Star Schema Benchmark data set generator (ssb-dbgen): +# https://github.com/eyalroz/ssb-dbgen.git + +FROM alpine:latest +ENV SCALE = 0.1 +WORKDIR /ssb-dbgen +ENV query_no = "0" #run the query number. +RUN apk update +RUN apk add git gcc cmake make musl-dev +RUN git clone https://github.com/eyalroz/ssb-dbgen.git --depth 1 +# Build the generator +WORKDIR /ssb-dbgen/ssb-dbgen +RUN echo "build and generate data with datagen." +RUN cmake -B ./build && cmake --build ./build +## Run the generator (with -s ) +RUN build/dbgen -b dists.dss -v -s $scale + +# TODO: Some code for copying the data files into a volume to +#make it available for other container and the host. +#This is working: + +FROM apache/systemds:latest +WORKDIR /input +WORKDIR /systemds +ENV QUERY=q1_1 +WORKDIR /systemds +COPY queries queries +# COPY host_data_dir container_data_dir +COPY data/very_small_s0_01_dataset data +CMD ["queries/$QUERY.dml","data"] + +#TODO: Currently only accepting one query each. +#FROM apache/systemds:latest +#WORKDIR /input +#ENV QUERY=q1_1 +#WORKDIR /systemds +#COPY queries queries +#COPY helper.sh helper.sh +#RUN chmod u+x helper.sh +# COPY host_data_dir container_data_dir +#COPY data/very_small_s0_01_dataset data + +#CMD if [ "$QUERY" = "all" ]; then \ +# ["./helper.sh", "A", "B"]; \ +# else \ +# ["queries/$QUERY.dml","data"]; \ +# fi diff --git a/scripts/staging/ssb/ReadMe.md b/scripts/staging/ssb/ReadMe.md index 218c278fe4a..015cf4dcd9a 100644 --- a/scripts/staging/ssb/ReadMe.md +++ b/scripts/staging/ssb/ReadMe.md @@ -5,10 +5,33 @@ - There are [13 queries already written in SQL](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-queries). - There are existing DML relational algebra operators raSelect(), raJoin() and raGroupBy(). - Our task is to implement the DML version of these queries to run them in SystemDS. -- There are existing DML query implementations ([Git request](https://github.com/apache/systemds/pull/2280) and [code](https://github.com/ghafek/systemds/tree/feature/ssb-benchmark/scripts/ssb)) of the previous group which are a bit slow and contain errors. - -## Setup +- There are existing DML query implementations ([Git request](https://github.com/apache/systemds/pull/2280) and [code](https://github.com/apache/systemds/tree/main/scripts/staging/ssb)) of the previous group which are a bit slow and contain errors. +## General steps +- Prepare the setup. +- Translate/rewrite the queries into DML language to run them in SystemDS. +- Therefore, we should use these relational algebra operators in DML. +- Use [SSB generator](https://github.com/eyalroz/ssb-dbgen) to generate data. +- Run ssh scripts for experiments in the selected database systems. Use also scale factors. +- Compare the runtime of each query in each system. +## Run +To run our queries, we can execute the following **run_script.sh** script (in ssb directory). We can run in both modes. +- All queries +- A selected query +``` +./run_script.sh all [SCALE] # For all queries +./run_script.sh [QUERY_NUMBER] [SCALE] # For a selected query +``` +Example +``` +./run_script.sh all 0.1 +./run_script.sh q_4_3 0.1 +``` +### Further expansion: +Using docker compose. + +## Setup +(To run without the shell script) 1. First, install [Docker](https://docs.docker.com/get-started/get-docker/), [Docker Compose](https://docs.docker.com/compose/install/) and its necessary libraries. For Ubuntu, there is the following tutorials [for Docker](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) and [Docker Compose](https://docs.docker.com/compose/install/linux/#install-using-the-repository) using apt repository. You can add [Docker Desktop](https://docs.docker.com/desktop/setup/install/linux/ubuntu/), too. @@ -19,9 +42,9 @@ If the example in the SystemDS link does not work, use that code line instead. Create a DML file, open its directory and execute the code. ``` -docker run -it --rm -v $PWD:/scripts apache/systemds:nightly -f /scripts/[file_name].dml +docker run -it --rm -v $PWD:/scripts apache/systemds:latest -f /scripts/[file_name].dml # Example -docker run -it --rm -v $PWD:/scripts apache/systemds:nightly -f /scripts/hello.dml +docker run -it --rm -v $PWD:/scripts apache/systemds:latest -f /scripts/hello.dml ``` 3. Clone the git repository of [ssb-dbgen (SSB data set generator)](https://github.com/eyalroz/ssb-dbgen/tree/master) and generate data with it. ``` @@ -32,32 +55,10 @@ build/dbgen -b dists.dss -v -s 1 ``` For more options look into the original documentation. -## General steps -- Prepare the setup. -- Translate/rewrite the queries into DML language to run them in SystemDS. -- Therefore, we should use these relational algebra operators in DML. -- Use [SSB generator](https://github.com/eyalroz/ssb-dbgen) to generate data. -- Run ssh scripts for experiments in the selected database systems. Use also scale factors. -- Compare the runtime of each query in each system. - Run with: ``` -docker run -it --rm -v $PWD:/scripts/ apache/systemds:nightly -f /scripts/queries/q4_3.dml -nvargs input_dir="/scripts/data/..." +docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/[QUERY_NUMBER].dml -nvargs input_dir="/scripts/data/..." +docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/q4_3.dml -nvargs input_dir="/scripts/data/..." ``` -``` -docker exec -i postgres-ssb psql -U postgres -d ssb_s0_01 < ../sql/q1.2.sql -``` - -Troubleshooting: -Do not forget the correct paths. $(pwd) is where your main systemds directory is. -``` -export SYSTEMDS_ROOT=$(pwd) -export PATH=$SYSTEMDS_ROOT/bin:$PATH -``` -[Link](https://apache.github.io/systemds/site/run) - - -``` -mvn package #Use as compile. Why so long execution time? -``` \ No newline at end of file +To compare the correctness and do benchmarks, PostgreSQL can be used. \ No newline at end of file diff --git a/scripts/staging/ssb/docker-compose.yaml b/scripts/staging/ssb/docker-compose.yaml new file mode 100644 index 00000000000..ae55faadd2e --- /dev/null +++ b/scripts/staging/ssb/docker-compose.yaml @@ -0,0 +1,35 @@ +# Use +#https://stackoverflow.com/questions/35231362/dockerfile-and-docker-compose-not-updating-with-new-instructions +#docker-compose --build --no-cache +#docker-compose up + +# Example +#docker-compose up --build --no-cache +#docker-compose up + +# Create .env file and modify before each docker compose up. +#SCALE=[OUR_VALUE] +#QUERY=[OUR_QUERY_NUMBER] +#Example: +#SCALE=0.01 +#QUERY=q1_1 + +#This docker-compose file is linked to Dockerfile. + +services: + datagen: + build: + context: . + environment: + SCALE: ${SCALE} + #volumes: + # - dgvolume:/home/ssb-dbgen/ssb-dbgen + systemds: + build: + context: . + environment: + QUERY: ${QUERY} + +#volumes: + #dgvolume: + # external: true \ No newline at end of file diff --git a/scripts/staging/ssb/run_script.sh b/scripts/staging/ssb/run_script.sh new file mode 100755 index 00000000000..95b41a0b90a --- /dev/null +++ b/scripts/staging/ssb/run_script.sh @@ -0,0 +1,64 @@ + +#!/bin/bash +#Mark as executable. +#chmod +x run_script.sh + +#You can run in both modes. +# ./run_script.sh all [SCALE] # For all queries +# ./run_script.sh q[QUERY_NUMBER] [SCALE] # For a certain query +#Example +# ./run_script.sh all 0.1 +# ./run_script.sh q_4_3 0.1 +QUERY=$1 +SCALE=$2 + +# Colors for output +GREEN='\033[0;32m' +BLUE='\033[0;34m' +RED='\033[0;31m' +NC='\033[0m' # No Color + +echo -e "${BLUE}=== Test environment with SSB Data Loader ===${NC}\n" + +echo "Arg 1 (QUERY): $0" +echo "Arg 1 (QUERY): $1" +echo "Arg 2 (SCALE): $2" + +# Install docker. +echo -e "${GREEN}Install packages${NC}" +echo -e "${BLUE}sudo apt install docker git gcc cmake make${NC}" +sudo apt install docker git gcc cmake make + +# Check whether the data directory exists. +#cd .. +echo -e "${GREEN}Check for existing data directory and prepare the ssb-dbgen${NC}" +if [ ! -d ssb-dbgen ]; then + mkdir data_dir + git clone https://github.com/eyalroz/ssb-dbgen.git --depth 1 + cd ssb-dbgen +else + cd ssb-dbgen + git pull +fi + +echo -e "${GREEN}Build ssb-dbgen and generate data with a given scale factor${NC}" +# Build the generator +cmake -B ./build && cmake --build ./build +# Run the generator (with -s ) +build/dbgen -b dists.dss -v -s $SCALE +mv *.tbl ../data_dir + +echo -e "${GREEN}Executing DML queries${NC}" + +##all: {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} +if [[ $QUERY = "all" ]] +then + echo "Execute all 13 queries." + for q in {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} + do + docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/$q.dml -nvargs input_dir="/scripts/data_dir" + done +else + echo "Execute query $QUERY" + docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/$QUERY.dml -nvargs input_dir="/scripts/data_dir" +fi diff --git a/scripts/staging/ssb/test_scripts/Dockerfile b/scripts/staging/ssb/test_scripts/Dockerfile deleted file mode 100644 index 4f71315bb1d..00000000000 --- a/scripts/staging/ssb/test_scripts/Dockerfile +++ /dev/null @@ -1,16 +0,0 @@ -# Help: https://docs.docker.com/compose/gettingstarted/#step-1-set-up -# Star Schema Benchmark data set generator (ssb-dbgen): -# https://github.com/eyalroz/ssb-dbgen.git - -FROM alpine:latest -WORKDIR /ssb-dbgen -ARG scale #scale-factor of the generated data. -RUN apk update -RUN apk add git gcc cmake make musl-dev -RUN git clone https://github.com/eyalroz/ssb-dbgen.git --depth 1 -# Build the generator -WORKDIR /ssb-dbgen/ssb-dbgen -RUN echo "build and generate data with datagen." -RUN cmake -B ./build && cmake --build ./build -# Run the generator (with -s ) -RUN build/dbgen -b dists.dss -v -s $scale diff --git a/scripts/staging/ssb/test_scripts/docker-compose.yaml b/scripts/staging/ssb/test_scripts/docker-compose.yaml deleted file mode 100644 index fb7c8371c88..00000000000 --- a/scripts/staging/ssb/test_scripts/docker-compose.yaml +++ /dev/null @@ -1,20 +0,0 @@ -# Use -#https://stackoverflow.com/questions/35231362/dockerfile-and-docker-compose-not-updating-with-new-instructions -#scale=[YOUR_SCALE] docker-compose build --no-cache -#scale=[YOUR_SCALE] docker-compose up -d - -# Example -#scale=0.01 docker-compose build --no-cache -#scale=0.01 docker-compose up -d -services: - datagen: - build: - context: . - args: - - scale - - systemds: - image: apache/systemds:latest - #command: echo "I'm running ${COMPOSE_PROJECT_NAME}" - #command: docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/q1_3.dml -nvargs input_dir="/scripts/data/very_small_s0_01_dataset" - From 8e4364462ad768a39eca23e72034c8ee5b907f57 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Wed, 21 Jan 2026 22:20:07 +0100 Subject: [PATCH 16/22] Docker script works for a selected query, but not all at once, --- scripts/staging/ssb/Dockerfile | 32 ++++++------- scripts/staging/ssb/ReadMe.md | 63 +++++++++++++++---------- scripts/staging/ssb/docker-compose.yaml | 14 ++---- scripts/staging/ssb/run_script.sh | 2 +- 4 files changed, 56 insertions(+), 55 deletions(-) diff --git a/scripts/staging/ssb/Dockerfile b/scripts/staging/ssb/Dockerfile index c7a0234dcbe..bc7f7b242e1 100644 --- a/scripts/staging/ssb/Dockerfile +++ b/scripts/staging/ssb/Dockerfile @@ -1,41 +1,37 @@ # Help: https://docs.docker.com/compose/gettingstarted/#step-1-set-up +# https://docs.docker.com/build/building/multi-stage/ # Star Schema Benchmark data set generator (ssb-dbgen): # https://github.com/eyalroz/ssb-dbgen.git -FROM alpine:latest +# We use multi-stage method. + +# First create the datagen docker container +FROM alpine:latest AS datagen ENV SCALE = 0.1 -WORKDIR /ssb-dbgen -ENV query_no = "0" #run the query number. RUN apk update RUN apk add git gcc cmake make musl-dev RUN git clone https://github.com/eyalroz/ssb-dbgen.git --depth 1 # Build the generator -WORKDIR /ssb-dbgen/ssb-dbgen +WORKDIR /ssb-dbgen RUN echo "build and generate data with datagen." RUN cmake -B ./build && cmake --build ./build -## Run the generator (with -s ) -RUN build/dbgen -b dists.dss -v -s $scale - -# TODO: Some code for copying the data files into a volume to -#make it available for other container and the host. -#This is working: +# Run the generator (with -s ) +RUN build/dbgen -b dists.dss -v -s $SCALE +RUN mkdir -p ../data +RUN mv *.tbl ../data +# Second: use the systemds docker container +# And execute a selected query. FROM apache/systemds:latest WORKDIR /input WORKDIR /systemds ENV QUERY=q1_1 WORKDIR /systemds COPY queries queries -# COPY host_data_dir container_data_dir -COPY data/very_small_s0_01_dataset data +COPY --from=datagen data data CMD ["queries/$QUERY.dml","data"] -#TODO: Currently only accepting one query each. -#FROM apache/systemds:latest -#WORKDIR /input -#ENV QUERY=q1_1 -#WORKDIR /systemds -#COPY queries queries +#TODO: Currently only accepting one query each. To expand to accept more queries. #COPY helper.sh helper.sh #RUN chmod u+x helper.sh # COPY host_data_dir container_data_dir diff --git a/scripts/staging/ssb/ReadMe.md b/scripts/staging/ssb/ReadMe.md index 015cf4dcd9a..3ca0d48242c 100644 --- a/scripts/staging/ssb/ReadMe.md +++ b/scripts/staging/ssb/ReadMe.md @@ -5,39 +5,15 @@ - There are [13 queries already written in SQL](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-queries). - There are existing DML relational algebra operators raSelect(), raJoin() and raGroupBy(). - Our task is to implement the DML version of these queries to run them in SystemDS. -- There are existing DML query implementations ([Git request](https://github.com/apache/systemds/pull/2280) and [code](https://github.com/apache/systemds/tree/main/scripts/staging/ssb)) of the previous group which are a bit slow and contain errors. - -## General steps -- Prepare the setup. -- Translate/rewrite the queries into DML language to run them in SystemDS. -- Therefore, we should use these relational algebra operators in DML. -- Use [SSB generator](https://github.com/eyalroz/ssb-dbgen) to generate data. -- Run ssh scripts for experiments in the selected database systems. Use also scale factors. -- Compare the runtime of each query in each system. -## Run -To run our queries, we can execute the following **run_script.sh** script (in ssb directory). We can run in both modes. -- All queries -- A selected query -``` -./run_script.sh all [SCALE] # For all queries -./run_script.sh [QUERY_NUMBER] [SCALE] # For a selected query -``` -Example -``` -./run_script.sh all 0.1 -./run_script.sh q_4_3 0.1 -``` -### Further expansion: -Using docker compose. +- There are existing DML query implementations ([Git request](https://github.com/apache/systemds/pull/2280) and [code](https://github.com/apache/systemds/tree/main/scripts/staging/ssb)) of the previous group which are a bit slow and contain errors. They also provided longer scripts to run experiments in SystemDS, Postgres and DuckDB. ## Setup -(To run without the shell script) 1. First, install [Docker](https://docs.docker.com/get-started/get-docker/), [Docker Compose](https://docs.docker.com/compose/install/) and its necessary libraries. For Ubuntu, there is the following tutorials [for Docker](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) and [Docker Compose](https://docs.docker.com/compose/install/linux/#install-using-the-repository) using apt repository. You can add [Docker Desktop](https://docs.docker.com/desktop/setup/install/linux/ubuntu/), too. -1. Now, follow the tutorials to install Docker versions of database systems [SystemDS](https://apache.github.io/systemds/site/docker), [PostgreSQL](https://hub.docker.com/_/postgres), .... +1. Now, follow the tutorials to install Docker versions of database systems [SystemDS](https://apache.github.io/systemds/site/docker) If the example in the SystemDS link does not work, use that code line instead. Create a DML file, open its directory and execute the code. @@ -61,4 +37,39 @@ docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/q4_3.dml -nvargs input_dir="/scripts/data/..." ``` +## Run scripts +### Using shell script +To run our queries, we can execute the following **run_script.sh** script (in ssb directory). We can run in both modes. +- All queries +- A selected query +``` +./run_script.sh all [SCALE] # For all queries +./run_script.sh [QUERY_NUMBER] [SCALE] # For a selected query +``` +Example +``` +./run_script.sh all 0.1 +./run_script.sh q_4_3 0.1 +``` +### Using docker compose +To run our queries, we can use docker compose (in ssb directory). We can run in one mode. +- A selected query + +``` +docker-compose up --build +docker-compose up +``` +Create an .env file and modify before each "docker compose up". +``` +# in .env file +SCALE=[OUR_VALUE] +QUERY=[OUR_QUERY_NUMBER] +``` +``` +#Example: +# in .env file +SCALE=0.01 +QUERY=q1_1 +``` +### Further considerations. To compare the correctness and do benchmarks, PostgreSQL can be used. \ No newline at end of file diff --git a/scripts/staging/ssb/docker-compose.yaml b/scripts/staging/ssb/docker-compose.yaml index ae55faadd2e..ab2d221e9d6 100644 --- a/scripts/staging/ssb/docker-compose.yaml +++ b/scripts/staging/ssb/docker-compose.yaml @@ -1,10 +1,10 @@ # Use #https://stackoverflow.com/questions/35231362/dockerfile-and-docker-compose-not-updating-with-new-instructions -#docker-compose --build --no-cache +#docker-compose --build #docker-compose up # Example -#docker-compose up --build --no-cache +#docker-compose up --build #docker-compose up # Create .env file and modify before each docker compose up. @@ -14,7 +14,7 @@ #SCALE=0.01 #QUERY=q1_1 -#This docker-compose file is linked to Dockerfile. +#This docker-compose file is linked to the Dockerfile. services: datagen: @@ -22,14 +22,8 @@ services: context: . environment: SCALE: ${SCALE} - #volumes: - # - dgvolume:/home/ssb-dbgen/ssb-dbgen systemds: build: context: . environment: - QUERY: ${QUERY} - -#volumes: - #dgvolume: - # external: true \ No newline at end of file + QUERY: ${QUERY} \ No newline at end of file diff --git a/scripts/staging/ssb/run_script.sh b/scripts/staging/ssb/run_script.sh index 95b41a0b90a..141bb19e7b9 100755 --- a/scripts/staging/ssb/run_script.sh +++ b/scripts/staging/ssb/run_script.sh @@ -48,7 +48,7 @@ cmake -B ./build && cmake --build ./build build/dbgen -b dists.dss -v -s $SCALE mv *.tbl ../data_dir -echo -e "${GREEN}Executing DML queries${NC}" +echo -e "${GREEN}Execute DML queries${NC}" ##all: {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} if [[ $QUERY = "all" ]] From e7b2c8f7be28b7ff0993941e58561a15ea5d8184 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Fri, 23 Jan 2026 01:04:51 +0100 Subject: [PATCH 17/22] Updated version. TODO bind duckdb. And solve other TODOs in the shell scripts. --- scripts/staging/ssb/Dockerfile | 52 ++------ scripts/staging/ssb/ReadMe.md | 95 +++++++------- scripts/staging/ssb/docker-compose.yaml | 27 ++-- .../ssb/other/dia_ssb_script_structure1.jpg | Bin 0 -> 36854 bytes scripts/staging/ssb/other/ssb_init.sql | 124 ++++++++++++++++++ .../ssb/other_docker_compose/Dockerfile | 80 +++++++++++ .../other_docker_compose/docker-compose.yaml | 46 +++++++ scripts/staging/ssb/run_script.sh | 64 --------- scripts/staging/ssb/shell/run_script.sh | 104 +++++++++++++++ 9 files changed, 425 insertions(+), 167 deletions(-) create mode 100644 scripts/staging/ssb/other/dia_ssb_script_structure1.jpg create mode 100644 scripts/staging/ssb/other/ssb_init.sql create mode 100644 scripts/staging/ssb/other_docker_compose/Dockerfile create mode 100644 scripts/staging/ssb/other_docker_compose/docker-compose.yaml delete mode 100755 scripts/staging/ssb/run_script.sh create mode 100755 scripts/staging/ssb/shell/run_script.sh diff --git a/scripts/staging/ssb/Dockerfile b/scripts/staging/ssb/Dockerfile index bc7f7b242e1..da4dac7bf8f 100644 --- a/scripts/staging/ssb/Dockerfile +++ b/scripts/staging/ssb/Dockerfile @@ -1,44 +1,14 @@ # Help: https://docs.docker.com/compose/gettingstarted/#step-1-set-up -# https://docs.docker.com/build/building/multi-stage/ -# Star Schema Benchmark data set generator (ssb-dbgen): -# https://github.com/eyalroz/ssb-dbgen.git -# We use multi-stage method. +FROM postgres:latest +# Copy data into container +COPY data_dir tmp +# Init the data and load to the database with a sql script. +COPY other/ssb_init.sql /docker-entrypoint-initdb.d/ +WORKDIR /tmp +RUN sed -i 's/|$//' "customer.tbl" +RUN sed -i 's/|$//' "part.tbl" +RUN sed -i 's/|$//' "supplier.tbl" +RUN sed -i 's/|$//' "date.tbl" +RUN sed -i 's/|$//' "lineorder.tbl" -# First create the datagen docker container -FROM alpine:latest AS datagen -ENV SCALE = 0.1 -RUN apk update -RUN apk add git gcc cmake make musl-dev -RUN git clone https://github.com/eyalroz/ssb-dbgen.git --depth 1 -# Build the generator -WORKDIR /ssb-dbgen -RUN echo "build and generate data with datagen." -RUN cmake -B ./build && cmake --build ./build -# Run the generator (with -s ) -RUN build/dbgen -b dists.dss -v -s $SCALE -RUN mkdir -p ../data -RUN mv *.tbl ../data - -# Second: use the systemds docker container -# And execute a selected query. -FROM apache/systemds:latest -WORKDIR /input -WORKDIR /systemds -ENV QUERY=q1_1 -WORKDIR /systemds -COPY queries queries -COPY --from=datagen data data -CMD ["queries/$QUERY.dml","data"] - -#TODO: Currently only accepting one query each. To expand to accept more queries. -#COPY helper.sh helper.sh -#RUN chmod u+x helper.sh -# COPY host_data_dir container_data_dir -#COPY data/very_small_s0_01_dataset data - -#CMD if [ "$QUERY" = "all" ]; then \ -# ["./helper.sh", "A", "B"]; \ -# else \ -# ["queries/$QUERY.dml","data"]; \ -# fi diff --git a/scripts/staging/ssb/ReadMe.md b/scripts/staging/ssb/ReadMe.md index 3ca0d48242c..72e4d43edb3 100644 --- a/scripts/staging/ssb/ReadMe.md +++ b/scripts/staging/ssb/ReadMe.md @@ -1,75 +1,74 @@ # Star Schema Benchmark (SSB) for SystemDS [SystemDS-3862](https://issues.apache.org/jira/browse/SYSTEMDS-3862) -## Foundation: +## Foundation - There are [13 queries already written in SQL](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-queries). -- There are existing DML relational algebra operators raSelect(), raJoin() and raGroupBy(). -- Our task is to implement the DML version of these queries to run them in SystemDS. -- There are existing DML query implementations ([Git request](https://github.com/apache/systemds/pull/2280) and [code](https://github.com/apache/systemds/tree/main/scripts/staging/ssb)) of the previous group which are a bit slow and contain errors. They also provided longer scripts to run experiments in SystemDS, Postgres and DuckDB. +- There are existing DML relational algebra operations raSelect(), raJoin() and raGroupBy(). +- Our task is to implement the DML version of these queries and run them in SystemDS and PostgreSQL. Therefore, we provide a shell scripts. +- There are existing DML query implementations ([Git request](https://github.com/apache/systemds/pull/2280) and [code](https://github.com/apache/systemds/tree/main/scripts/staging/ssb)) of the previous group which are a bit slow and contain errors. They also provided longer scripts to run experiments in SystemDS, PostgreSQL and DuckDB. ## Setup -1. First, install [Docker](https://docs.docker.com/get-started/get-docker/), [Docker Compose](https://docs.docker.com/compose/install/) and its necessary libraries. +- First, install [Docker](https://docs.docker.com/get-started/get-docker/), [Docker Compose](https://docs.docker.com/compose/install/) and its necessary libraries. For Ubuntu, there is the following tutorials [for Docker](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) and [Docker Compose](https://docs.docker.com/compose/install/linux/#install-using-the-repository) using apt repository. You can add [Docker Desktop](https://docs.docker.com/desktop/setup/install/linux/ubuntu/), too. +The shell script covers the installation of the following points. -1. Now, follow the tutorials to install Docker versions of database systems [SystemDS](https://apache.github.io/systemds/site/docker) - +- Docker version of the database system [SystemDS](https://apache.github.io/systemds/site/docker) +- Docker compose version of [PostgreSQL](docker-compose.yaml) based on its [documentation]((https://hub.docker.com/_/postgres)). +- [ssb-dbgen](https://github.com/eyalroz/ssb-dbgen/tree/master) (SSB data set generator `datagen`) -If the example in the SystemDS link does not work, use that code line instead. Create a DML file, open its directory and execute the code. -``` -docker run -it --rm -v $PWD:/scripts apache/systemds:latest -f /scripts/[file_name].dml -# Example -docker run -it --rm -v $PWD:/scripts apache/systemds:latest -f /scripts/hello.dml -``` -3. Clone the git repository of [ssb-dbgen (SSB data set generator)](https://github.com/eyalroz/ssb-dbgen/tree/master) and generate data with it. -``` -# Build the generator -cmake -B ./build && cmake --build ./build -# Run the generator (with -s ) -build/dbgen -b dists.dss -v -s 1 -``` For more options look into the original documentation. -Run with: -``` -docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/[QUERY_NUMBER].dml -nvargs input_dir="/scripts/data/..." -docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/q4_3.dml -nvargs input_dir="/scripts/data/..." -``` +## Structure of the test system +![diagram](other/dia_ssb_script_structure1.jpg) +Our script will depict the following structure. +The data is generated by datagen and stored locally (on localhost). + +After that, it is copied into the database containers (SystemDS, PostgreSQL and DuckDB) where the queries are executed. + + `TODO`: DuckDB in Docker or locally? ## Run scripts -### Using shell script -To run our queries, we can execute the following **run_script.sh** script (in ssb directory). We can run in both modes. -- All queries -- A selected query +To run our queries, we can execute the following shell script `run_script.sh` (in ssb directory). It has the three following parameters. +1. `QUERY_NAME`: Name of the query to be executed. + - **all**: executes all queries + - **[QUERY_NAME]** like q1_1: Executes the selected query like q1_1.dml. +2. `SCALE`: The numerical scale factor like 0.01 or 1 etc. + - Be careful: Please do not experiment with large scale factor over 0.2 in SystemDS. Its join operation is currently very slow. +3. `DB_SYSTEM`: Name of the database system used. + - **all**: executes all queries + - **systemds**: SystemDS + - **postgres**: PostgreSQL + - ( **duckdb**: DuckDB) + +The order should be as follows: ``` -./run_script.sh all [SCALE] # For all queries -./run_script.sh [QUERY_NUMBER] [SCALE] # For a selected query +./run_script.sh [QUERY_NAME] [SCALE] [DB-SYSTEM] ``` -Example +Examples ``` -./run_script.sh all 0.1 -./run_script.sh q_4_3 0.1 +./run_script.sh all 0.1 all +./run_script.sh q4_3 0.1 systemds +./run_script.sh all 0.1 postgres ``` -### Using docker compose -To run our queries, we can use docker compose (in ssb directory). We can run in one mode. -- A selected query - +### Before running the script +Before running the script, create an .env file to set PostgreSQL environment variables. ``` -docker-compose up --build -docker-compose up +# in .env file +POSTGRES_USER=[YOUR_USERNAME] +POSTGRES_PASSWORD=[YOUR_USERNAME] +POSTGRES_DB=[YOUR_DB_NAME] +PORT_NUMBER=[YOUR_PORT_NUMBER] ``` -Create an .env file and modify before each "docker compose up". + +Mark the script as executable. ``` -# in .env file -SCALE=[OUR_VALUE] -QUERY=[OUR_QUERY_NUMBER] +chmod +x run_script.sh ``` +Run this command to load environment variables. ``` -#Example: -# in .env file -SCALE=0.01 -QUERY=q1_1 +source my_custom.env ``` ### Further considerations. To compare the correctness and do benchmarks, PostgreSQL can be used. \ No newline at end of file diff --git a/scripts/staging/ssb/docker-compose.yaml b/scripts/staging/ssb/docker-compose.yaml index ab2d221e9d6..26941ad2115 100644 --- a/scripts/staging/ssb/docker-compose.yaml +++ b/scripts/staging/ssb/docker-compose.yaml @@ -1,5 +1,4 @@ -# Use -#https://stackoverflow.com/questions/35231362/dockerfile-and-docker-compose-not-updating-with-new-instructions +# The docker compose file to create a postgres instance. #docker-compose --build #docker-compose up @@ -8,22 +7,22 @@ #docker-compose up # Create .env file and modify before each docker compose up. -#SCALE=[OUR_VALUE] -#QUERY=[OUR_QUERY_NUMBER] -#Example: -#SCALE=0.01 -#QUERY=q1_1 +# in .env file +#POSTGRES_USER=[YOUR_USERNAME] +#POSTGRES_PASSWORD=[YOUR_USERNAME] +#POSTGRES_DB=[YOUR_DB_NAME] +#PORT_NUMBER=[YOUR_PORT_NUMBER] #This docker-compose file is linked to the Dockerfile. services: - datagen: + postgres: build: context: . + restart: always environment: - SCALE: ${SCALE} - systemds: - build: - context: . - environment: - QUERY: ${QUERY} \ No newline at end of file + POSTGRES_USER: ${POSTGRES_USER} + POSTGRES_PASSWORD: $(POSTGRES_PASSWORD) + POSTGRES_DB: ${POSTGRES_DB} + ports: + - "${PORT_NUMBER}:5432" \ No newline at end of file diff --git a/scripts/staging/ssb/other/dia_ssb_script_structure1.jpg b/scripts/staging/ssb/other/dia_ssb_script_structure1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..e6b43ee691704997003cf810cec4bb67871dafa8 GIT binary patch literal 36854 zcmeFZ2UHa6)+Ss{&P_%#3Q7jaNa#k9BqAu11ym%9D6x^2Msh|#K|pB~5S5$}5F~?u zoGGFyVQc^e8;p_- zcF+L`L-$Dy`)vdMvcV`QsiF604p%ZFY0Sf4UQ&3S-Q&UkvXWxU415|9(>_-(& z({Nn8MtkfIr{evTXLO=xDq6WN4HLwauKR@1GjQ`9;pIDiLR>;pN?Ao!P5qRH?pZy3 z14E;8=9eujt*mWqoo+b0xVpJ}`1<(=1fqiOhDSt3MaRU(rKUZ6l%DZ8GwXR?e!+{v zm#IkyD(a&OH0-A@(q6m6aZK?(9p{;p zXBDmVqDq$tT-SYu8MwuiamR^=L;G!H|KA1{_79Eh&w>4QTr&U@B@B9blxzSJ*f-Cm z2xR~yfPek^|NA5ObpYTHWqal%Gtg#fF&0axMsT@f3F#o_!emJ20l=_%F?+qmqc+J~ zx}Q8}zlR=ZtS!Csgb5$2(E=JEJutg6ScVqDj0ckRHkXpjHaLSN$BVa>kyNvi4*+>L z-~fp5-6=c(p7#=p4glsCc?Uq>41mN5?7unyzEq;64uE#|mj}SsPAL3eANc>mGi0>A zaDUH8_w}?%Daji240=(};T7cVPS_#O1m!YmbtG9%iF_Uim{g@utL)#DP-&b z*tS7%VQ|%Bm|wv2!uEnb#jF*vesvP!y9TtdS!J<61y>66BZOuR`&xpji+z^oD2!9Yw~q{IZ8!oN`@z z*)9(Pig38X!voSUp=cPD4*DMeXFhj2l9b8Np-=vhKO0$OY58&mgaTv~7CKN0MN&RH z+~htZmJ34|J>36j48XJ-qcskIXTJC{G9QKuv(*4aOn2{S$89+@bQkTW(0(g3$S(fL zw--RBBMwF)M8R9Loy|<+qZdcz;-B|^(z;}DOu^__x+K?)xXb>|sE-zY&QkAK^#E@( zm<~0-3H+VEf~+=V4*;<>!vmo5{*vkbx$1_)H~Y_h-=EQ#C1EDgdBWSt2(- zQ^ewQId`h^XyWknRx5FCbJF>7r!9dvV7XZkHQBKsOr$^yf;D&3z{q1kezJjmY`S<) z_3r*9c)DiKq>!Iz7&7?C)VVI$s^8&|Y0m-g37_M%wDEiN{*Fu@J2Q`-s0FIZYd3y7 zt=&|@)SDBLAG3OA-UJXjPBe{f63lOI#3x>#Jzs8gJ!0_oo>hO!g!<6kDov@-Q(|$z zcrH_kcqbqCo7Zc1Ph{NtyZ{EQYRX=amz?a$GDelM`Mjg)H}9CfWH1osLkG~tk<6c= zMB8n#@uZ9Qc4S5MYf^W1mfr_yy}W3w>7`>+4#S;BjKPYa7ZgOBsKtYaM*C=woJWl3#wgvQd+5(oomYJi}Y7_v4>+Hg{ibR5?x>)Ev+R*sR|G$M3n z%!*Z`U7hV?>q(hNxlqi+Bi%^g{@wT@~bNnE)H{kQL6G5&1A*)Sg3z=Ta8^lw?f24DGZc8j@`zAu}Fu zgd5Bgk)04s@we-w{k6&oU9JX7V#ZOU_+rdYb1j_%V3!LAnK$pUt|__ z8+`Bc1p{RKZ|DiLar6ag#;R+R-s4=$)}S3hw+dO8>3NUNe6xIC0tSyg0NmwS5_c!t z%ILw;>5%Olej>-rg4$kfU3szN(70`D0ur+zHD%`B1-e{xL$>v$Q(xa3k&elRpVa$NZ zzvl*MABBu8Ze6HO-jfibIsnFsjC0`@xp1ldw3d$(2<8J|8cFy`&dxpD`;RA)rv`;U z#`Xh%tx1b4NTj$BS6MLr^izJb#R4{vx8KTgB*`h0|IxK>GhIn%5rM+?vY5LYbuo@O zo=(>=`6JWsCb{_{F8;D?K3;JK)>!8dEA{E@&QcmbYv1E}zEEwAP)W~=tG;fNM7Jt2 zjqurXwW)@RuM~7ElFM)M0nqdejMO^nMNGrJhzVi$yvtICZ1)Q3?uM6s<}OUzE~dF>{-Gp);xT-uSzQ{+B@0Miot zw{POKfMaf%ybOs243C!LfaskU1lh90X0CI`2!6IZ6)!yQ+D~NGnetevmZ~s1i9J9U zJcw!k=G^_B1vLPi-ZN6!j6sXbN64U6@PZB_BL~3QakQ@MwZaQ)FTb?R0QPoz#}cc0 zKecTfuT)5R{wn~hq6?tZB0?C!;7%-?Q>AH`ZB0hI-C$hmYWnVlGTg6?;>0jetF7{F z{%~2^O>fTEoyT5BPc>e~0W^But<1sgj`Xru2`%mS?#kL(fp*LG@-|!Bd$Dq*pY8gD z*Jp#nP6ztOf3WjImnUd+sfI~VZh9O>a`;hV)csPl=;UUzG+JZimcJm+#fNiXd)rRz zn{O zBU2rhbR}ZnX?wQc_MB)I9NB2HcxmhCdC^8E>(O{V4V`!c9aW_PuZu*&Ud*MAP@Uc` zH_8OCjws$4LOVv{!3|P@NFyRm)w2Eddr}M9R(5H|wRNmhR~EFAWdiEtLhLNfo+FHz z9!OC3Aw)vP0$SjHZ_xJ3W#NQw{3b0JHrsx;vq>ZCgYv%WE>}YqdF0DGJ`@x8UcZZa zX3lg<*alUoC&>pKSb8gtcS#`>I)D}h;DBjV4U*H;KV(?hQpluf%dYDs(dbPnb-~aa z!#bNQ{0RWR1lkmWhA1T?qI?92Cm-d3WOLuidN|SGUwjd`5u< zVv+XRC^Y_yc=`3cy?OS<4xWoPUuP0GY7!$xjQH$LeQIBqw>I;_(yy_XE@t1-*U^MW zK~ZYEEY`LqQ8`h~Xl#A4HEYey*ErBwOJV+Gm?U~C%W&8_gpeS4ivkhQ6A>Dy446^c zk0`r~-F}%{r6d5*UNQmD(YP{NR5t!|r}lNz$Q>sGkg{mi(stsN_NcUO+3G|_(H)tY z%OX)xGeJiaT~eYS-KI*_SL|E?c6a|M?jbPaWDb%H_-xM{oFsF*wwp4b>h_dyP|>Hk zbH6dq&T&7Ze+;d=66D~cI$V1c z74K*l&9-{i&x&>6g@X;wC2Hnllx2fe zYciE2FDIpM%B zaqJGyksjItv?@WZ+O~`Vtdvaka{z)#Ss6T^ts^uqBvjfHT5fo(H6uU_`~m7hLZ zKk=xrMjEzvF;V<@t}nN7mvOH6ZQ3Cu4Vixp%T!8odSTz^?3=fm^|m=Lf3BeK?Wlf( ze@PKkBs_BY>cktS(!KC4)&y4gISAfqS(#^HW(>@*?F|+kl;*s#Z6&f*3s)Tn!eV(3PPF~E99yM9+>=27D>akN94(oGS7##8_e!292 z#B0j;O{We3aZ@6k+N6d))z= z*Zd?WP>=B8+oUO&LNsqBb2hw9=<}kfE47{K)u0UPsnSRXh$$&qz)W&yGgo#ymhiX} z?-h76UGJYz2DQZg$ z^DQIxG37=Uu<$!LhTwyz@Gq6*1niC}N%sJF@%0Pk_+OvGe?7LK{clg0ptADR(pawUL zaCVOTD1$aKpDE{;EJPJD-fyjhhB;v1TlKu1SuOjOg2aBe(+Ul`(yk&=iH3au9c?ly z9AZ;gNrI)M({H>ZXS~Ck<-Ru4*AbNib9l$}^YT&I2BY%f1F$$ZCxt7DO%Ehd{r6pe z9si{dC~N5v^OnUs7`-rEjx^}gL3fsZk?gMFHWd6kwKeWt++OpfcY)l&mrVu1j+NE3 z5nU@f71Qr~h1Ne(vJZBKh!8P}AxxpmEEi&=*o!G(6&FYtfa#I}rKv-32~L+$A`jNQr`r;XaEUK5=0%ShjH zr%_MkO+HpHO(?J1b=l6YJVCJuX~nIF~OeP`Nnb+@q9Gt>K{owIep z6OW0WExKQf^{>^8)r8P%S#G_4OhBro`g}WZH?VEM{L(1K%^j^tMB+{);=F}c9pI^+_D*fPjq_jm zK4iKFGSsJv$MM9u<-dt{eViQmsI52bG_Wi6JK%w`)M5;jr7B{kk&tW2F8rG$6*^YX z^6{y=@Q#@l{Q>aGGDAnGNJq$GPaM-*jOK@mS{U5szS`--z5hUU?6=7&fK3WvCrN?1 z6DCA?Tt=jp?Tp(48^J)m+7{I|lSWd@DSnrY*e6b!R@&5n&*Fb4Ky?=p`|L{&?}!Wu ziF+&5eQsnMv|JeJ>Y5j+tBcj=5={gyo2df;HQ?|U$0^-`kRjAX+&loPwXE>=(t0(Y zf9@uJ3>YR(t~|J}BNT%1a18aS;4|&;)V_1NJNud46^{yzFG3@yII+)i;h9JeQ&6J$ z*&c$VM~EuJm9fmDuLhE+CX8`SE)>o=-Pi*>dSo=;NnJvYt0YeY6CMo-q|Z&;#yF z2vct(qxghlRD{EYxJSQf*%jG#-;`@IFIimJf5w*7Q`q=q?h2imR)VNwhWHWt&QRJ6 zk|?2WJ^Cd{jDTDpbRs&w%aLs%B;4us;=YgX$sOaltoLBf-j4s&1fJG4ed%ceoei&% zs4{GA#_=6IqCv0@au68Cz8h%qSEj}8zOAq33)&V;u?U+N?7uQj*mw7HREL=^GvC+Y zYEs%$C2Z;y^zuPtW98t}mZK7}dczwOSm3g3aQz2d7M)`}A(}}NKxz`>Tf24fUvj4D| z&awNMeiqG}LSg2w$5~^ zM`7spD9$EdsqZQ-yT4vBC<<2~_TKgTN%9?UrQ6JnB78s~R9p&kWDYs9YL&GZQLR}p zF27`K^Fz?svT1CSkNSEaUCx%x4eM}$UmmT#ACzJEma~74G`Sn?~I!`eJW3Pi&)CoSHzdo(tEtxNH{l%t9Za1Q>wtru^g&NZRJ(B*T$rN%UDl z0^<1iwlr78$kMoAsf^6KM<3l&$CDqrxH$JG6jTVHx)yY#Kz9qsKw~80xU6tL!*zhG zu>CwdLV|Mq<)0NDc(kmaFk{~VK-Gz1T~+0-SZxrG-2CCizJJZJde)P5?);A88E~$0 zzu=n1jE}L`XLS)l*TfK6l2>xu0lc{L~#nA8TRPGA}TYZDP^7 z6Xag9syN`uS@Pyt+?+An)o0Ery&^Gq?~|031c*tRaXepZV%}+BpW1CtxKL8UpXjfp zbTi#*$X15XSI~JaP^DvM`d)Ebe(|&CE)Sd^h6ev!-&adgk&hq?5s@9pBWf`!W$aLm zr2m7=vzn;E!P?m70fvq|g`!)UPpZ-bPBGSSQV#zO-u%A{M*rgi8H{Uuye6@FvaXKJ zZg#e7WI(x;O;nH9#YGDM#L`AWwUTk%*>{La)|eT}&h)K@MPBoFJ8BYEd*b@e407nI zcW)Yj{tx;;q70uedqx2}e7_@qt)h19mLG)cek;^L=6L}j`Zt&WN^pGJv6bTQgcY8ajXUTh@)(vF!vl)EH5}9W|&4P#%6`_8JB-0dhrq2(-vVlV1zl z4|&?+hNZXRmjZYuB|TNv8H$+4YU4%f(ot<9Mh>d}Q}x%XuTl(>BsL>krW@oSfJ^uK zUUqxK#$v2;>&}~>v#UIp=8Szs?(JpX@N7&ZDkxO;n3eWR0DVwc7$z9EdcSOm*5ca1 zY-r1Qv2`>&np^WKG$vP$j5|My%$E&C*Hd{uazI=Rg`XbUwpVJudjgA_=bl~P4axB2 zdX%p@O{%}H6Vv!OUEP9GEcH&WGVJajjtug1P_+m0b5PC(`MD0{?ya1^@pI7O^aCK( zKOZx#Ll7qy*xCbkOMoD&^`{5Gi3)=EVX7D1vOQ0IxcjeQ{y&}+e;-46=$nmHezyJ6 zVssbnMkd#2!s#|GXD-d`_$Lut1;T2v?$?q6rIM5$z*f!Lxtffq*u?JWKVxIlu`oLc zP(rXn_%9LsKg%xu{riS}a;!-vWp#I%?unC+9sr|$+kfUGoY2AQI#ODRDEU3@edG_R zk;4=aN(^soLWyBG21*RIp~UbD1K+W=)*1WwIDyu`7{sS^o`@A2%3@=E5_pbqMhcD;r&(AYBa1AN10W zTzI9VSghec(v#E>2N+Azs4jiv1L_U{df`K9&{=Ot8YJ)*k_J7z3`v6wev=0AA4-Fw zArt`9_&5ZdIZw% z#R)TNITHq<=harCL}p)vuWInjVsO_zlbY-um=GA?3*s?a&`n z^#Ag_^ST3tPnrFEl0u$c$kyaLgz(!*e;Ps2w=K>!6{fGEWY4FJF-Lr1WFjO=>n@yMO%%`4~hy>mZO2v(#g{ z6Iod^=(z4pW7nEoIO9^^y_^L&f|ngR#SQ@bz02q`rn^6fN$M@TLXWp0Q+WpB32lsj^%ph7+6@{bHsIMU5dy4a6AO>l1dwY#6?i6$S(r3~CK&D$DA zOB1afo;h9|V%%4h55sm0Q3xmGXbrV|4BpQnxw=B^Msf4diRww&jIRQUvdbadi#Q$r z5H@gwklf9eJEVnN&W3MPD*Aa6cKEWA&0T#eJ|H=BS+@Rg`+vF;iT~U$v1sD0I!ZKg zD_Bo~yZ&8uc-d}eRz&rX&LY(D5t6Q@}c|hQotN>^6eYC?{>BK4}dro z2^B|fFzeS3Z&nZ1XVxFzI5RwG<6q$>oULeQ)ml$b_>~z8eEUB%Pw4tz7R-mePDs4~ zaKb)LItqM4+xsKp3Bqm3%fT_uopd``Qa_IR5{ct=@+LhtO{YM;zawXknRBDWT zpAmLf836yXZcQxR{ORW;6d?~hhIhAIVD^zDPvh2lk8GILJ-a-sBoynHa^nHllPw)> zXB&InJQ_=0DuFNw*#G=8zPE6YLcnXd5*G}kUda_yct=3@%?9CU8JF=g+^sh2l1HTE zpD!fxUWu5|m=V1<81dg5+pvt=rqok)(t*o@hF2EYrs5fCZnUm_x6fmNUHJ^KKw=A5 zVxisVPHeUPMqJ%ubQ^-Nax@PyTNW}aB7|!Xj)N@91u6jAqh}!Isrkg3TnMoqykFeP zk41TvUXLeCCwoArIB$lTkQ>`}BfIy}sPIeK2Y_zmt(7y<7~hb`MYgp=iU5d zToQfgDgI=tQ2!8Wu!gXRH=mmYr8{>t5hkiVJ|4g?xpjFKJS|6@1W6a93@>jp`9m_+yHQXe=Y(%1@4M$(Y3)+qv zS5L@g=#DgxrSJSyoa=-I%$>9;{-(ZddKVak-YE1RE?YrDI1bEyCFCw07k}u5918Ft zY(6V-sIU3&c!Cm1ckjZN1&$#VBqx}Rlka3?7T2%JHfw3jZhveXjC#iRm8C_ zK{@^BT=@Hy3ACEai^8Be4jUy0(fP9HZQOS{o%o&HUFtJGOf5x;QLki9MH&uqyNCcL zKv0J;oMcZNd_O@}nZ4rH%$})cOF|@F)oT;rO0#|h-<&%D2A>NbRf<-6zz`ZN`1I+E zr|#5ISWo7fl4?(;lV|RW#;F{!un>?61%TT?Ew?XUhiclC2Y310%v{&d09B9BDW_{umgen}RAF!`K7iOu76u!aT(e6^oVE26 zO~)B-HGucT-ZA>|%AQD;#?CtX}_V`bL&_ zo?lO;sVAV9YFHi4)bp;WYe}(OB-D-a`S0agAeTiLFW%Y0g4WI%)K#r(_!(D&`j}=& zm`LO6d+8V3~2*4#4*84+S2__Z`%BYZX_ zqBU;DRZ^oS$X|K|=SR6jKPH@}l)jdFF-k=EG-%wmz)rXo?wJdU>WplA331UZOq$dt z(E*~HQn`$SwAx=NuOE6SO8(PCY&hxCMYIC`Ex(nbwVg+dj6A-6-udU~>)l9|nmL)YcXK&T z{>rdVWO?E0&DN6RrCPMI*v_XkJiK)itnFpfXHCV7X+)dPI^wn_mhMjkD#N^65!&E) zqI^36#J1XE+oT6nT~mb-k~ODnB`?PHa;7Ni6W=E=RNvwn3p+xo!@oGSSR=a#CvxBd z9>dLr!)qA3Nc3aRfxQPI4*@Kp1flA2#S>GV{_I8}B*bXu87+Y?d9!bFLdQNhaF6{W z)+Hg1g4Nijtc2@Sk2QV0BoS z{#xt1s*yOST3cJ8JUXZpkt8w3{-Gab0k9h_!E}VQ3A4IbJjA9KLjfAjYj@=YmZ3+B z_WAr#z4EF%)vG>Tm`~q6A)LirjtH80xn|#{t^QqL0b9aUBc27~lH;+|O+n(3P3A>k zc4-Pqo}M$IZ2#HNRQvp%qTzGbxDiP)hLMw3HadZzx{hp&0LdsDZ_t&Rm@@fMs#QV&doIWff8gHyB>r zUD~;1?`qo6K+yeK_EN9Vqkl4t(eSFJRno}_U5K(8yT#C3Jw!=Xu|}QoG5?jp{DV$qOrPb6hVNo&A98mAUJ*`<+j zA8q7J^x3o;Lr59pQP}gt+b{ie-Db*WWq)uKBc6&4|59DJ;zc`iwxeU{dGYyrVAZcB zVmA>RdUNwS=pW4eMt%bxh(Gm_&paMmsTL>R`BgKJ&LpROWq*{OR`Y@z<@%qKhlq*Q z5P*tZS|&kzIw;i{-1e_@h2;V6amJGUEap7(I0Z;k7|{D&fl^5)A=+0=Q%5`M{m$T} zs_=`WrMQNghFF&{)DFpRB8_Jx^c5SlOaD+hL& zsAVPO<*MT)%*6~SuB%T*@`^oxtAL)*+_HXdFz=n7TFX*N%JWj0j6Rmi-{_Yq$dV#- zxrl;JyEPXs2nH>CbA81!Yl#xpaD&NB28FiFo?V{LXHZV+$4zTzq8 zrvp%tbP(L#lA7DK3e*V>+y(Ue;B_O~-kx1L=$cuI%T7@>ji6`Jx)*w$i->qs! z_w-3ui>9}!1o;`J)pNG`wbs|zlJphTHhJ7~;ZG2}=u1%el)n!}#U|=f>#vgmi!34V z6Iij`n;cR`uh!~RtQ7aRv{23Xg>Xs8tGEvP0#&F|_Rd*@b=^5!1j`EiTO$O| zlnv}vrzNIDJz{E92ha1Avwv_tpvzk8JW< zmZw8m;cpdq`4jniKu#FTqZ zAQ%{`6_iUdBy_gKDbr+%rsObY8$aCocJo%=^)%Z#gNajlw|n|l)x}s1X$}4@+YS5V z09j&MAsGEzowv1R_YP6;&!Vj`Vwybdz@L4_k$)AUeZ;G?fk~lua8mm_umEtjg(<{GXMT-5)b~Avg}zVe>$heOKy7`ve+X~J4}vzjT> z2QL+fQet*jn6sU_@;VB|Xe8sM^|HX&;8ta2IuJSyPP>t3zo0 zQ59x8y)oH5VV5nL4w{Q{J2q?wzz8e#l%ezOS;JOfTjLM9;h$U}7c$yAbChUN7>Qsi z;-Kx&zUjW(P-gyow*C0{7Ix>j;id5gV&Lpc;75Ul9$g&Q<*UF+AeZ`gNJg1SluzFF zW<(XWAy~&`%WgCg3x148hPTCBk2926Y4DYlVR)59OX=oR|5AkM0c;eCg#v>eSm7^= zj7?mxP~SVaC+qk^9#$w1Pe0chXuh1_#qz#z<<67xW#l>F`z9TVb}29^L=>#*;&Tl5 zr>hyH2dmb1L+hLC?dWS_);C`8^xmk-c6q4h%gRxG#{T`n{TSM+A(+mQ1y&*^8Xb(A zHRS@$&GrHwp>MF7er&SaF&wVzJ66P`aq?rp;5lIW_i?~@EynU*I$sN;>p2#xm9Z0S z3hH@jSrKkju%fjlK49~u{rO}&6y~j=Cgtm5nHyGx+)sBO!0n)`Pc=edKp4I~U}`d@mkV?Q{6P7>Vu(KTswI$93jr%K3)%xi3zdT zGf=#4uW4l8ePMLJ>E)Bgr0;{8fdFv#EktX?V_by8L)3_tWy{(agGorI@Z-mQ6~}V0 z%BrHa+4?%0{)yV!H-=&qDvaNc_N+b00!E;dTsI>ejzzam{B&U3V?bFBY|o3h{rqWr zv9|K|{S9AtG3O<3DXq4>JWYMeP)X~M{zWRkK3U=Oqa@C&6+@!qvfF#gV`Ydz=GwgQ zAkR-hjY1P=UWfF*Yb&_?TJmBjEeg5g1!^vtvbhV%5(cCFkN7{OS@4ua?0j|{yY(q< z+6(co03B6Rc|AfqJ?LIMMf(<`wnVzn}lFf96yhiW!e7Gi+K}}wO=Z@|%gFyBw z1-1{RdA?VD+yEkW{Rbf!506RfM930692|cI5I(T)tSm52E%^`%=9@0bsJSH=Sv(bx zLvaEPlk5A0j1EOqlD%EqR{w(Zn5@b@L%G7hBRo4j?EHPVQ}4UqX0tdy6y$T#Vr~E; z*Y43SIkvSh&A3_KRf}wTMEF*t7PIagJe%pN@uc>)RzZpL-_Iksud{&Mk^GLZO7a&vt z`Bl_T<1H*bS=Bv9w%j7T9m`S5p*Q*5qD`+<%<&6P;n0A9iQ8xJvCVdC4{bSQ8UL&9 zD99=r`SI%^p#(MEc$1%?4gS<3&~{;~x}k6oNOW4#&-7eICWI(uw`Mf#RA_gHZ#*CM zT0ddv-}r)6Yt4P9EB2|H#;;$RF+C0vl<5sZoaC8Cxm!?6WD5?G*W~%i?_hV3J71>j zT6Ro#I0=EV?B`AcWCZG;ZF9=Kbd5KMd!QBl|5$PgcQ}8O3mSN+nXKf}3ecW|! zD~9sQV}LI2gHEA|TDRe3GbYFIT1N`6CUHWQ5m@K+d=^Cmr(S5Ku8;4lay%l7mz_EH z;C|xyGsoCygaXiu@+$*?C%K=@P(zIDZp-v8E@){P4q+WAa%Wvlura@!Y%S~Quc*-8 z$N%)bou*AZ{4oU}6*9nwHUs-``{d+hLD5TKxOd*-;PO-WIYQ*8(yBo9iaXk6F=eGH zlB^0%3J+5@wnkq6I+vn!mC~v$M2YY@>?WG0Amjsv8*RAzw5uY2dY;5V6fl0-{D>;! zV!<3>OxHBp`JDI3xsf9V4^GlaKVaE==O~1x-jhXfbhN;m& zROnGzNW z-24U491$7KYZ;dT4ONV^V)yLY^a;?@9y5zDuM)& zTRtsJqyylGFd-f+g92w6R?~3$0C+&!{Zp>X01(P%;Cqf_?u=GP&|nASfw_qaU$-OH zZ9oeFtRftIyVRUY0QBOL=yTcRc2~%EOoS*Nwgwan$K?KMxwYrGBipsPfmI?qAZrob zpMEv%Z_57+%4KZOf4irN^2X&96_KPzw491imGV&;PF)kbf*YO1iAwh}oi)Zfl!sQ2 z=|4(0DbYAKuc{iE%fb*mNc*K0%Z)}ZJ#~C{=gZy6``i3`c1UUi4R;!$cTCF|_aGux5MFcJuADEWdS`AoE`O*4;UySH0#_X>gBTQ8zfRTewf? z0FaWG%ynj3egF^NkFJKQ++UGZ-?9r(2Bsr2H!8#Uz%-;mjg8fH$xw;aavHo8?2Gee!nhp(x8~;BYd;T;-}&@Qp&8@< z>2m-wnm3oFYy$~#v9M+?%+esl*e-Jd_X-bJ2gy}~x;}VUk)-RP2;KdL1HgGO4$1pt zF&fE%zLyV54J%_=zaox|I}g<`=K1;-dT2zb_9UE$Jx#CD3rGrFF*LlY4(zl3t#0VI zrf_&xh$i}MJWjq_6-<+RyqiM!*(S{9(Cv~+H_%`7>g#p&x(5L?KCkU)fwb@hGfFM{ zy)z_7Q1Z^pFBtA>Q@XW=n$M9RCz}upLU!2_LL2hM>sejPgQ=FYpFLC~uE;6oT-$mW zN(X6#W1EkXhp^1(qj(ctJP#z&{8+2ux?>PC(AgQ4&bQ*r)nPtsB$L+kAh9@1P1sFQ zN+PjJN62!KPs@mST^nvlG#dF%}`d}=0!}nw7A^+ zR$a-~Tt-KV5D~Nu#OjcQm1~>1GFlPGz-+uc(|%fSJ$YOxr0P}Wd@N~D45r9+^4Hmi zu%>qxhG3JR^iCvLRy2#&f=*`UX={f}zEkj;-&?u151NtV+dbddT`THyTpzR}A0NFa z<|*K!B++hE{gE{roxguu3R?bo^LHT9#euj`(6R5HjzsR$8a&_kIhj3@@@pUsJy38D|tNoS6(mXSl zm3TBCD_!%g1`b=4pc>1?y~{AKJtS(dLn>eZ=gCDhsY)u#=`noVD}sNr7<05LoBV9}mEN`4tq`#P^Yhh}@Hyrei+F;e5RkfaGGjM6U1!=mGsDgbO24fjhl~+Nxal9QE;Lc9H>M zGpxpaWC70QVQC33G-&odF63RgV4|#$d9wLNid5<8>v>tgws(v$gH^XiO;k%E=?pPC z!e5%-X5FMadh@7hwcYuPIj2PHqugR?RHTnRe~(q2yz(T}&w-hc6uu-(|AkCXsL7aC zPGop5hAfyHwV6$&^^#9V>IEaQETDj}**$~o7-Yh7xv5S*uw^zjAE~u}E|cSx zxV_`^MEc~+jXIygQ&N{76iNXa9>DqVq3v9nPNtKRhQ|!Ys!q}6QuG@uEWZ&;4Ni4~ zZU4?p{?nsQ+D-)K5a3G^A4?tr{H|w3l|LYWuV}jnRe&IXFPD7?@LA!umR4q8S|3H7 zrWJ_;{<9S(*fK8~H&R>hiamjY z8+FKn}@bVgK{Ly&Uc(9cRXpx+(7Mor<(oF6INK?47pE;rlDp6M_i197@ML* zPhA8*Pmai#lV`lFX1|SAQ9xasY~qU8CEghK_;wDSfn_8^;!XwL z=38}NqbY~ODtF1tuLFfWTeh+dB@MIGR#}&H+sM~a;y2YrC_ZHD$$*#I9qNg)o5@<9 zMT-t#_&K89@+wdI2xH>8HYEX{r?qPp4%#up-L)r=nZyeZ)KlJVUlI=A-ef`rwmtB$ z?XyL>cAA6D!wYT+$w9L71EXRQ2H`xJfg%7fdnU0X%#a2d1x1cU3eeMj#QQmsC(gTC zp2hLXsx56~nS9xhyX*T?#meaFrUtXsT$M}3cS$DD90&ENAx)p9aF zm@{ZJyjd;F-fcJbJY81k!#4%aUdC<)W6>iI^@W0t-_8Sm=0ZS_HbNA{EWPD#J?Yty zQ$Z8!mVVN~L!ZM}OhVy^FVmf8Qui;jhCaxBQfnRR$N8IcGugYiF@|7TZKgl3GMH3) zv*{>R&?hLdlXI$jR52%}Xd;cN;t(2xVg2_IsF{%<$JPU&2HEMwJx%;%GiOw=I{Xfx zWIVp{q^j=xr%z6)L0tgwxuzv_vF66mNy6gBG@j9@C46H6HY8{DCBUu=BRn)jt4^lc zuW3?@viHS*jw9K7^2mxud6N{}gGB@bY#rUCS?urH2;7vy8m;!IPAi=ec6M`C`N zKx!*VY>aG2OO0!%4Nt>!|$0tj_+er4=vJ)(Tn z4|ygVl;edb_KpU@fnr2a6V4zvr{sS2rX(&4vs5<<@A!ab?n*5F2R7lG+-LB+_sdy5 zLaq4vIsP@A8X3@dMI7x%wBjDHpCqcqq))q?!sAU2douEB-ON8M8~irB zsO#Sy9r|49Qt_^087W~Ry;@SQ6CgHK``jNP_P>B-nYLNA^o81+^VKZtE-EXvD$*ud zPqFsAM>&f;IK3498_RM9i=#axSzHeQ-YatmtYQzEw#5FFe@oTr`md=y5X&-+u%KFT zW4m(eMz-4@EX(f!<2OtDYoZH-_!@TqUa~xx^?e6J@ET}yCltGOXN?y zCp&yI|8$k*+WqMd6R1ES7-}!f$#$iAT4<~5>FLSdPQ{vQh)>jEh|7yK0{|9Cw4)8l zOA}tNUK~>_Z>74yrX!?D@K_^rMZNLe(^$J%emcM7XER;q2WoO;Jtj}yTe0_vj+yVr z{mcU(u^hJU_wU{)U`CtxMU9b|hu;oSs-0E~BlGw9x;~$ioXg62drX-pP%?)pphCH} zXRX~xd*}d2zxn`vv=2c+pg2GJcCfsPxaB!YrGGdon`D#|+8CLni_#j?~a|Qs+#^vXu-o zeCec`ZRZXB+s7+pfgAj#fr&X3l|n5|g~Duy%N6{IO8)xT{~RLJSv-M+mNMW)LMs_S z^~01Kg%HZP0vdGLUVS_I?Ocy)&#;NNqTui#(@WiU#gj~_N12LY z5Iyrf1|foii0Mbq(VlO9Bc{D-z;FQ(M!zGBLKW=r--zjIA>Cy;5iuHIM^m%UM1O>F zAF;}xZTEP>%{8*UTjr)>ak09yNEjV7H#zj z35VKpsZe?`0_27Suz1?|=2O2K#9NKC6M5RqPt2CZD4wCK*L^Vjg{Z#2hyAs8=i><< zXbB%PxB-DDR7;XD*v`GcM!5A@rh4!Ch~S8F)s^#~{n7@cQ@@{IH`YorR$_-1sL+u5 zO9$#^>>|@PHnu_b?W6IH508qzs!6~6a(6s^I52-vsO(2_T?oC47lGIgY&F+h4m;>{FEF2kU>f!eQLej}{95F=!=T}{ zUaGK4hvXD}OZLr-7|O|}Q@DA?HVqmqlr_3r&bXxH&lg@hEV*Wq5M2SCz zivRSEKvAiNhyF~(Cr|T2>y)I~qnX|i632EY-(k2~!sJdoDit646uN-AyQQ!BxF*F} zwZFS)rpZPu-3Cqt%UgDgB^_CG0VfG{hq64Dx40j_$?{&D?1E%@XUKV!H} z;0C5U4+nS~>IDr?&Q^OR1jRpagH5zhLNXzLni}GAr}v@q(Jn?=J$Fwyy7;5{n{(4N zPCoDNYLE7X!7Oy&0Y4u7TL$9r_7t?7maz1s(?c$V!9+hsZKv~3ANBJne4{`1BtK^H z?kUc#(tO1S3=j0(C_m676T5K=lx;s#(dL9|G_#UrQ_^CGs(aWS=CzF|&<}PC{R;S0 zDyC&d^EnFXCLFmSI{uu{xze=2M^tWt!%?ZCFFejBxb{We3|i~?g;PA*(=ISXoASA+4@FRrrU*(GX;PIgAOTT|h#;sm0hJmspaRlD5u|qz5EO%g6lp5G zmxLmSbdb;!r5Z_4AVR=*b7sysXTEpdZ?5^~T=Ra{oSzBdfjrONYp->$`@WYCJP+IL z7YKquJ%hD*?(K*ul~mhizPTW^%H{PKVs3?6hG`2y)QFr}ko{bgi2urirog}MIBd%bQ1P<@mnwFh=~{uvq+-{& z<9GczeJ8AvrK9-Z`VDd$adftdA}tf?mEfY};)5IG$4IZ5!7pC8RBj0{Ov@&Z#}_Yt z2x=GF52@FM%v|6!((!a5b>JTMy$ja;7>{X58Hx#`)d}7^FAvIpvJq^}qH&i*sP_ZH zF6@#BX#g5q2v~;fsO=KKGW=J6fCgl=WgM}%9vTy3Rm6@I(x?gW%1)JUZRyqPJglxt zVbJ3aSjm<+q23;JI^$>j`lt0Vw*#cf2Y2b(&qgVlGcJIxeL~?e^)6jo)8|${P~T578)SQ*!RX+g(|Rw ziD*LR0^W!)^0xgC=qPM#scV@x@O8bTn|c5W9ySe&69cYpt`wvAZLiOzq9fW;m^+O)cV`)N_J`{0&i`#qxV z^174jNwPx_Qxo9Mw>;kE&U=by0Pg%^GT_cr2eDGPNpqlKcWj=p53kz!c;&+aE7s{= z&vN9^*iuen%@2qR8Yjj#qaFm%57RXPc#Os@5OF`GXgaNF=z}kJcWRP5xdHO}xuO1& z_Z+bnCYFX$9CNy}$Y`{$bzQFEPGz@oS=`b-*i@Xx<+ao2AfFH0Pd%ntc)>0R5cKYC z7FRKcV}qyR(_e~HHVyYn5UAVMvgiYE1`L9`976Hig@+SlCd04vNtf4!?5y=PU|yqI zKiE#cq6Rz4XG_xzS=;n^JJ&j8_VF1~juOMv@_L=0?ROjiu{bs`3xMYz(AC6dv(vB^ z$W*lx{!$q1$jSf-gSOJoRrrWU-;^?@!||8aPC6f8BhzT?MVfBuIXBe}>@a)7&Rtzr zufQJ>vg7L;H?MXl%;j(VA{CHXj~qQUx}&t9LTm5?lB}1S6*f=s_1XA3k&OyG3iw~@ zgP=X=;_k_jhyOU-cy#IHi9}bm8%S)7EPg}LT^%vBzphk%yUCdspmh+;{~YsSt;A@^}cRG=dGq`?w+5a)kp4sMrh5Q5Ig5V}M`cZQt` zO;u~iQI7eTuov0{n&bztnZK7!S$9jJm&9b=GZh{w+T7lc)ajqlWsn{$r(I-?I z+(Unfu76n!AT#{;Dr3I7VQmZ%rt5dfdAS{#8|sWBxHBr@igJ67P4I9*D;@|+pPwJ`t@j_adC1vfIA zdAndr@zV#r3oS{gXzRWY9fivxRO_nu5WP$}tU5f33q4hLE>F!KJD~d67&^`HYt;VR zq;jaKm{kKc-pM2AfTzAHX zoor%3?Cv~fWplCSm9+Mm$eVU=`HAAtr$EO^|9S(WQnT;7{zoMa#I+c&I|q}Q-rX*8 zZ#R6|6_+>ZL~{)PTql1SD4j#|g|Vi?_JUv(wh=Q?ve^x-8*Jvi%og1%yx;lK_m z-gt5{2F0ZS>SC^WGzwUe5>3OuO#|Bg*^y6amc)js;?wUoHhgm=U-m627IFzLc4~-Y z(zz3Z#HgqLa1aOWz<;G@~@{4$(Fpfqi^V0V&6u`#wLQ{E~)0Qy>0ka@xSo2|G6& zxHx-3PoDOPPlWBP!(Hek(i4YbAx~p$$~vNvQU&PzMU~Px=@NwWO_&65FR)%>5pVu@ zNZ^n_Fcm0NL3Y2}&-nXv5`YaHC4Ds`hsa<`A51e-*ht4N6YZO#SJu4OR3{bNTjyr+ zm4j0_1)EAuS(*1CLejSl<*6d=Noj2`TBIS~kOmM%g{NbsZEE2}Ck&PoV zDNUMPR2q*p-X&5dGPC$RaVuCK&a1>*+T)gF|8+y|+;lu6|3R$f8Pa0> zvn8*WbM`(rR--N2ckGNm`dlc|d6z>i!-Kp;s>1a6%}hNePPYX}lJDdIb%W8!gDSTi zc#prABf?f_{i~2^;Ix5{>ZDkc(WK;=s!(-tZ%zyHld8>*@cPp^ z-r49z8GrHj-1(nNJZa$WyzkU)#w>I*WLq$r=ikX@mFUvVx1eH_R8~3=s?7oP+qr~B zMen;S=*_U~lJnOGVlsI8^<68A$`<>yxcAkMJ;W@%Q)_6x-vVu>{(9v6%Te9GvnxP^ z|Ld&iAAd#lm8HWb44W3xC_@n;ZM3pi#a|hwxO;B|@YFZBSaBOSsB@2)m-6HoD>S@~ z6xNb-)=EG0LXY7%#24wZrROcGlj(tq;YdRfrJ%0xJ4P>;s3zPr-Ca$8*x);?Gjdg7#^=qbkPB1Wg}OSDx^`Me(trv03mnjg?XuQwc}iENSITO___I@)zN__jjcQ%+)Ge-_*9oR?EFIq9<>xdw^}B+SHs zTwLOeArsb{gV*pgk?dtwH!SVBPkM&+``x#vGham=I+E07PuUvZFr-?WTAj$DNgjPqcLdV`c22w>{X@4n+CQy#q>>x+FaWi5P8 zE`lZ`E%dQVyYB63|GL8p&^2%%N0P>OVh4Xf(#xpNr(K-l zf9EFzYsoK2*%zw(buMo|@uYiG$%p71r0BrCJdRplYlEP7vv;qa(7Z_K_tYL;T2@U6I7b`*8r&Gb#DEyo>ZJgnZ6DL9 z2NWq&>ns-IRI$9snWqX#r`WTzgRVS`Vbxp`qv@%T8OxiIe&H_-C`b%tXI|A;tLD}W zWT|+RDQ#pDV(iB!Bju@9M+ty6LOb8Iodqg)EgfP>jcBOazS(7S*EOc|0GSx7h<^Je zH-WCIvb;Jy`+?-`qq5-_(sfwMAe!MO`<43v2evcSrOAV{fx#69fkuP3kb7pnb08cq z`PD=#KEnx|ncsH+nazgD4@yPU0U`RP>(4Jk?r43gu>IU)Wh{&Z`x1|jdx0}AZ;)Na zSX6~poMt>9*Sge>HD|({=tdxzFvdP?H9Bw}H=4T%29O^t66RbAdAna8KIX>aU3jSv zsVY?j&f(LcF?Zn>l4)!m;!^1HLJ35aw_O67Cz`91F=Za&R4~Mvv|t@$T-8oAa+2)7 zJkDFby2CS6Z{V0GlwlB7a4JRzA^G(B!_2|6v4Us&wcj6_)V&U!GJL>a-iQi(UXqAM zTwF07y*^ZL=~D=D#OBmZj_4ajpyJ=)e8(WE_sLR7FKD3C)&bjM=Ls#8+hD_6EmGHq zDQtK{rWfzSy>L=T)-u%((^c6b4ktNOGRhrFIF{{W5@ClO8xO)j_fQOt!XGG>#L%WV z`9Nlu+KVaqx6&>mtf!QP8Y{iC6Qym>-4hgi(iGjZ-#Z{09{}NMIH3 zRhJa5K*h^Y9xo3HWsyhPYdWtxh!oPway9q?&%AS+UiSZX{v1CW%$ zw%C({<<%e5b}nE2`hKvprt4fK_8haf%xcWc42?=u=XF;G539C5u62qs=XJild=y5NSvfZN;9O%f96%NmyBn_Vbax4kb%E ze__m}`W~jk>qI>otmg8HtFn@ zR(iWOH=bwxmBkd=jX}Hr8hsp7f7x5Wv*C%Pj)Eo!b~un8M8inWV#e9|NaS(*nO-*U za`JP$N$r8RMds(^Uw8Z29;!RrcV5^o@#?+A3-FWQk0hD}-Bxk?Y+B${d2qrh&6fjJ z=-mo#X02oV(|z`rN06|hDXZ>-t|t$;Ng33)H>$VfHfY3%4g*{AXl> z)wxp#TP?!F37-56WdEZL!z-==ti_++_1t=6o@KQ4Eb`NY{Y|UjPyzmPNG0N@`H)6F zZ<|(?m;^nqXA6}ow^m&BB6GrMW1|jGNu(%u(jT5N`YA9122EzaqqQE(e!$kl6Ibv} zF;Z%7IFDCE&MS09#Y*kf5}Qj6VR|&;=~n4SBRNj!3NaPd#A(Ra%WN5vCCr9p zJZuh=E!(D9WDxSI@dr9MqstdQw~eIHsrTPGs3R1;mDOn%8`X7x=pl_V6x}>s0lQ!C zh+~4GHyYVVs8HsH+-oi>d!{4G6J6Yj%{5-GcDr7gyl*zi(;?m=(*AzJh}N2RyHM>Y zzLcHpg0^i|KiwxqaZYXuUGsO-)py~wJI8pj^?)$kR)p5)vH_2MiXIP1gAX>-iw(h@ zE-r0WT{&=VSzRQp;~t+3S6o{5qGGRCq1D(#Yvq0DvM#dvT}~xA*-7|qa~6wr_<9A| z3l?0AG~o|dv-b3xW;2`>QxlS}P{A*=hWQ|niAN*YBQF$-KK?}eJ$hA)T64>cq8#30 zF&b52z}aTukT<*SFo>?2;;0*hMn+BQPDbn7-M328QqGX$&Iwc856lizUQu-*uWlDt z^l;D8H31F>Ek%@+86HVd#qn}$4|yQmFII%PD6dFgr*`P+3p4u|r^O{N+j;hl$jnRL zhct^!(yIt9gYu@DL06B(lIwAB$uBVuE)MpzZAW7>l?qfZw%d0q-#p<&{byUopA`&M zJLMAjMmEx@U*Jt$ea%3Hzxa$Z7hjeAeV?3Zv7?R;p0dyuX4?yCd)&Sfk+r`!6bu>h z%)Gh&b#@fvr;3p4sAU(ltiIk^F^phD7-)BUNzpu^JEO04bafhbteXBgsFeBnk(Y?5 z7>Y!`V&+SVeB37bGI??~kX;M+)yvt|Ok}u7Fi!4BGC^=oE&94z8GnMT5dFt27LP&j z?K2tiFpOKozS+f6k8S0R^q1>Sd75VIVaD6uoSUFac0MRN_1zL;CXI$s z3~(qeBtJ>vX}FM)cc4w(&|1W&r@ZSY^`9+qEO9fRePDQp-LYG0qijR-I=JgT#Fe=E zTUh3bEQ%uOrZXSkP=%_COAqIj{!q&8w|iq9&1A-k<|{J2BswG{yQmY=-$QCTI7c+F z*$JR0OW>DfI4Q^a6^^UvKO;LY`J3ovyyulWgHV4zu;VZdx!DYYSgZCRI_q$m%6;oHD$G8H-K^t8zpwYha`8po*O>OR2?QKQaY=*pUK9 z63aT6N)b+hshL(z?@wB>7$n7ZBBU~3lxS-{X_K0Wd%|ugiCQa2V7c+jf$+aT z&Hi&e?^fOXGCO^+vJH$()DDaUNcCCe#}bqA`~;H%kO1$SMw5u>ivtuk9E=6~IMZSLQg!{%nv-CN?{-T1 zxgveGo7W0Et#neWMS@ffSzKThfss;DYN|45tAow_W9mE&Rkg!1o~r)Zssx+QI8`~j zgp*E$k!xK96^TO(_fLrgsrj|o&;=-Jk`B%^VGO)x=A%9iQ+Nw?8>NRmf#M{xaMFsq zSaEK^vQU}tQ&!xt(4+HR!AI^rxP1SW21E#8K(Yc`3n{G`;f@H3w3JDUifK&oTBQ<@ zSjSl|=kpCAlX$oKc|IxaY3FuzqE1G#liOzFxeRxyj!faELr56O#D>IF5-O5M%Tnl6 zoBH@rwm$8Pm^<%_>V`#*C&cZuLJKgDsOUba0_mMN_fI$M(Ga2;JZhZYpPfLG;2ohG!AyBLxOlJo%ZX$x1Cd zqQH-$@x`rMGPs%sn=1J?f?1G;Ao!@24vKu^Yy zo|CIG>;Ci9)-J>EUymGj7nL}E>A<{8ZIGWo%R9J+`7V6}}jsJ^Px4XPwk?^r}pwd6>3hcESh94t5Y^htp>L0sLg03&em}+#%v* zRb;(ZflYZvdnTOcYY)?N3mrE}k@@$2mXS2+MuOTk`+Gv6;@OpWYo*6H(eTDYWYpPy zsftd@6~EbD*0=Dv`h#aPjxF7*y65-kFed8e{vP&n3YIhSZfdU__bk= z9vzcpqaj53V@4*mGop2MnUx(}<*Pajj`VX$XRTE}25FpPGFeaT|JE&WiXP1$VQLY* zn7pj=G@dz;wW+`8MrM{z$xWX$Z&@AD)XF)>^h+7SiMKJ-#iN9Sxz3w2{c5VS4fY60 zijCu}CE?=M#mJ&hH4h69-{5!B^Sb2`sMKwLZKlBOA>^LCGsdNzN5rml@bQxE@!U#{ z#L8?OjIpx91-_rFcTC``aO`<2jx!>yuBuHv zf4NQ|k<0PHIjBCV>MfUq91pXGniqQd_2j@}`?5KCo}`2x9sSzl@19MT!wf8DaG)dy zWY2+<#0 z^NT;SMq77v;0?>ecsmHZtQSJ0vkldZO}FyW&T7s;@hxB->d>kcVPX?M=iobz!nes|_}g zg(AEzLzr~Hq6ya4**Jc=uCJM1gs%thP2hd!^MMFyhpS^9q>9<8yVjARshK~Zj$NHc z*z4_7*up0uPMWpP3ncAMbu0?BkKa7y2CO5^;nNgf0J6C7kJ;|TSpp+U4hydz5Z~^o z#{cB2(e%LJO<1>e#3&#*Pe*NXgl=h46ox#dbecB_FETUpq9sM1&iP1+nBn3S9+uUe zWMHS(6JUTggMS4%034nAyZXA)hmld1KtSola=?ZnOBBg=&+}%~&HPNcfwR=RaChn8 zaZH9bN^q6sN=>!N*6T;v>hOCHXvRDc+}3d|4wCMpN0H~7QLLUeY8cRX6M zW7aC^TN>wtcnrgv9AXV;M^IT46cKV%hG>L*u{45jbN*6)C?PD*O=_s>hV{Vltdl+_ z6Blo=oe!2zNeDmOOveH!wDzTuPq`2h7{zIBj~>%@3!BJ!5_|1V&8&|r*r8`@&n!Ma zJ`=Uha~OYETMxoUR?DT`gO=+xiPU&Z09!*CVTV@}FXiw>HP+8Gq?xr6Rf36YL#!0# zRwtQfV@*%hGBwahhWS_C{Ry!f3}ns_<`CxbwGbPy4uw6 z!34`|Zil++IEbJkbn9j3mSz@nB}>rLvvfdfm6NP~I=p@BW7))f_ho(u+qt_J-=+KS zd;-ds*3z-?4$#n^rUR4HdgEg5EKD)f^%u!MD)$VV*RO4QtvINF!8uwc)85r*zj5Gb zMcmh8H}Yu~4H*_Qo<`3W%jk@Fa=UCTb1N5rkuEN_z#h+JKkgZslaQtSet-~fqQO;8 z)rmADr$0wZ6IWfhQYuEW1N~J#IBlMe3U3%LeZnt|r4^#?piLDL7c8r}eDyk2FA(jy z;23$0=W%&$evhrRaC8;)3Cp+IK7ZpD)RNA$GoyLMy1$H-%ZCnk=&Eub&EUeH2$BFB z4%x8Tvu33kII`%Q8vU}_UlgSOu7a=-Ozz(`GJ?tcfZTeqF#Ij>)S80jn}Pz^Npw}Y zNWqa*zqj(s$8)QiMl%M?=ifM-y*@7g?TKu`!~y-H_%s2UZ(hLR1$mmJ-|S0|v0=ED z{w_|);=keG3wG>)Zpd)Ke@YqYSsO&!&YhLZQ^WrJdPtBWtzW{ z9G=ap>OPMHynHUU|gK`CPj1)6vbq|1e#5wOK`m2^qMZQO}!Y=55)! zBUCvkpDDekvaGynz>P!M_EJ!mk4>rZT|N-RYe= zRU6FtXfeZNsg%dr{9{~y>Zf6ikFU)hK`MZ?GbklZa3kDkdQ}rOhn-W%IP&cFqTq?H zH?L-b_X@=Am8Vkp^?HBZv*Wu$a@&@}p;~xVSTg+FIGt6>qZbkU;T4u%fvuL~ASb{7 zS7iRbZT0Q?W&e9;8L$Rv(o*jl)U8Sw27_rIip#Lz9}tIYz1*{;9+vPmg!pM&gEzv_ zh>AV9>M+=&c(Dt%M(oK07b)o5T`4G2 zZuSqT8w=I}&vwBNXyFYKa75AYe4uJy2dhU1m@98eqQHDRtTT+#09G7ZFVL1oivn$_ zVg}Hb%KW=8cn$`yi=UjLAjtl-fm4HEP|rD;dBsv$upi2)s%BuJiceWbTq!oiWRw}n zF{*;mz6@nP{7z7pfKJ;wU2oK$XxV!0i+FUZc<5MHFr@nO*H#)pME{zn@Xxy`|LvOk zZCU#}zWVns-rv@7|IjON4*j>(vHzqwKz~e;{9|P1AM-)~qlcmX5d!|f5J2Ukx1A1i zn54P#X4FGw?CWevW&wPZZE|eN!uw+D5kg*Uh>ql4NzKIJpzF=SdjZey1?XP Date: Fri, 23 Jan 2026 14:26:14 +0100 Subject: [PATCH 18/22] Corrected postgres part. TODO bind duckdb. --- scripts/staging/ssb/ReadMe.md | 9 +-- scripts/staging/ssb/shell/run_script.sh | 92 ++++++++++++++++--------- 2 files changed, 63 insertions(+), 38 deletions(-) diff --git a/scripts/staging/ssb/ReadMe.md b/scripts/staging/ssb/ReadMe.md index 72e4d43edb3..0d7c16e7234 100644 --- a/scripts/staging/ssb/ReadMe.md +++ b/scripts/staging/ssb/ReadMe.md @@ -33,14 +33,15 @@ After that, it is copied into the database containers (SystemDS, PostgreSQL and To run our queries, we can execute the following shell script `run_script.sh` (in ssb directory). It has the three following parameters. 1. `QUERY_NAME`: Name of the query to be executed. - **all**: executes all queries - - **[QUERY_NAME]** like q1_1: Executes the selected query like q1_1.dml. + - **[QUERY_NAME]** like q1_1 or q1.1: Executes the selected query like q1_1. dml. Both formats q1_1 or q1.1 are allowed. It will be automatically translated. + - Currently the following queries are avalaible {"q1_1", "q1_2", "q1_3", "q2_1", "q2_2", "q2_3", "q3_1", "q3_2","q3_3", "q3_4", "q4_1", "q4_2", "q4_3"} 2. `SCALE`: The numerical scale factor like 0.01 or 1 etc. - Be careful: Please do not experiment with large scale factor over 0.2 in SystemDS. Its join operation is currently very slow. 3. `DB_SYSTEM`: Name of the database system used. - **all**: executes all queries - - **systemds**: SystemDS - - **postgres**: PostgreSQL - - ( **duckdb**: DuckDB) + - **systemds**: SystemDS executes DML scripts. + - **postgres**: PostgreSQL executes SQL queries. + - ( **duckdb**: DuckDB executes SQL queries.) The order should be as follows: ``` diff --git a/scripts/staging/ssb/shell/run_script.sh b/scripts/staging/ssb/shell/run_script.sh index 832a0be8009..f33dc7eeee2 100755 --- a/scripts/staging/ssb/shell/run_script.sh +++ b/scripts/staging/ssb/shell/run_script.sh @@ -3,6 +3,12 @@ #Mark as executable. #chmod +x run_script.sh +# Read the database credentials from .env file. +source $PWD/.env + +# Variables and arguments. +PG_CONTAINER="ssb-postgres-1" + QUERY_NAME=$1 SCALE=$2 DB_SYSTEM=$3 @@ -19,17 +25,15 @@ echo "Arg 0 (SHELL_SCRIPT): $0" echo "Arg 1 (QUERY_NAME): ${QUERY_NAME}" echo "Arg 2 (SCALE): ${SCALE}" echo "Arg 3 (DB_SYSTEM): ${DB_SYSTEM}" -echo "Arg 3 (DB_SYSTEM): ${DB_SYSTEM}" + # Install docker. echo -e "${GREEN}Install packages${NC}" echo -e "${BLUE}sudo apt install docker git gcc cmake make${NC}" sudo apt install docker git gcc cmake make # Check whether the data directory exists. -cd .. echo -e "${GREEN}Check for existing data directory and prepare the ssb-dbgen${NC}" if [ ! -d ssb-dbgen ]; then - mkdir data_dir git clone https://github.com/eyalroz/ssb-dbgen.git --depth 1 cd ssb-dbgen else @@ -42,63 +46,83 @@ echo -e "${GREEN}Build ssb-dbgen and generate data with a given scale factor${NC cmake -B ./build && cmake --build ./build # Run the generator (with -s ) build/dbgen -b dists.dss -v -s $SCALE +mkdir -p ../data_dir mv *.tbl ../data_dir -$DB_SYSTEM == "dbsystemds" || -if [ ${DB_SYSTEM} == "systemds" ] || [ ${DB_SYSTEM} == "all" ] ; then +# Go back to ssb home directory +cd .. +if [ "${DB_SYSTEM}" == "systemds" ] || [ "${DB_SYSTEM}" == "all" ] ; then docker pull apache/systemds:latest echo -e "${GREEN}Execute DML queries in SystemDS${NC}" + QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/\./_/') ##all: {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} - if [ $QUERY_NAME == "all" ]; then + if [ "${QUERY_NAME}" == "all" ]; then echo "Execute all 13 queries." for q in {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} do - echo "Execute query ${QUERY_NAME}." + echo "Execute query ${QUERY_NAME}.dml" docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/${q}.dml -nvargs input_dir="/scripts/data_dir" done else - echo "Execute query ${QUERY_NAME}" - docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/${QUERY}.dml -nvargs input_dir="/scripts/data_dir" + echo "Execute query ${QUERY_NAME}.dml" + docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/${QUERY_NAME}.dml -nvargs input_dir="/scripts/data_dir" fi fi - +echo "! "$(docker ps -a --filter name=${PG_CONTAINER})"" if [ ${DB_SYSTEM} == "postgres" ] || [ ${DB_SYSTEM} == "all" ] ; then - #TODO Maybe check if the db exists and just do docker cp src_dir dest_dir and SQL COPY? - docker compose up --build + #Look more in the documentation. + #https://docs.docker.com/reference/cli/docker/container/ls/ + if [ "$(docker ps -a --filter name=${PG_CONTAINER})" ]; then + if [ ! "$(docker ps --filter name=${PG_CONTAINER})" ]; then + echo "Starting existing container..." + docker start ${PG_CONTAINER} + fi + docker cp data_dir ${PG_CONTAINER}:/tmp + for table in customer part supplier date lineorder; do + #docker exec -i ${PG_CONTAINER} ls + docker exec -i ${PG_CONTAINER} sed -i 's/|$//' "${table}.tbl" + + docker exec -i ${PG_CONTAINER} psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "TRUNCATE TABLE ${table} CASCADE; COPY ${table} FROM '/tmp/${table}.tbl' DELIMITER '|';" + done + else + echo "Creating new PostgreSQL container..." + docker compose up --build -d + fi # Change query_name e.g. from q1_1 to q1.1 QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/_/./') echo -e "${GREEN}Execute SQL queries in PostgresSQL${NC}" ##all: {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} - if [ $QUERY_NAME = "all" ]; then + if [ "${QUERY_NAME}" = "all" ]; then echo "Execute all 13 queries." - for q in {"q1.1","q1.2","q1.3","q2.1","q2.2","q2.3","q3.1","q3.2","q3.3","q3.4","q4.1","q4.2","q4.3"} - do - echo "Execute query ${QUERY_NAME}." - docker exec -i postgres-ssb psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} < sql/${QUERY_NAME}.sql + for q in {"q1.1","q1.2","q1.3","q2.1","q2.2","q2.3","q3.1","q3.2","q3.3","q3.4","q4.1","q4.2","q4.3"}; do + echo "Execute query ${q}.sql" + docker exec -i ${PG_CONTAINER} psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} < sql/${q}.sql done else - echo "Execute query ${QUERY_NAME}" - docker exec -i postgres-ssb psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} < sql/${QUERY_NAME}.sql + echo "Execute query ${QUERY_NAME}.sql" + echo "docker exec -i ${PG_CONTAINER} psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} < sql/${QUERY_NAME}.sql" + docker exec -i ${PG_CONTAINER} psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} < sql/${QUERY_NAME}.sql fi fi #TODO Add duckdb support -if [ ${DB_SYSTEM} == "duckdb" ] || [ ${DB_SYSTEM} == "all" ] ; then - # Change query_name e.g. from q1_1 to q1.1 - QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/_/./') - echo -e "${GREEN}Execute SQL queries in DuckDB${NC}" +#if [ "${DB_SYSTEM}" == "duckdb" ] || [ "${DB_SYSTEM}" == "all" ]; then +# # Change query_name e.g. from q1_1 to q1.1 +# QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/_/./') +# echo -e "${GREEN}Execute SQL queries in DuckDB${NC}" ##all: {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} - if [ $QUERY_NAME = "all" ]; then - echo "Execute all 13 queries." - for q in {"q1.1","q1.2","q1.3","q2.1","q2.2","q2.3","q3.1","q3.2","q3.3","q3.4","q4.1","q4.2","q4.3"} - do - echo "Execute query ${QUERY_NAME}." - #TODO - done - else - echo "Execute query ${QUERY_NAME}" - #TODO -fi \ No newline at end of file +# if [ "${QUERY_NAME}" = "all" ]; then +# echo "Execute all 13 queries." +# for q in {"q1.1","q1.2","q1.3","q2.1","q2.2","q2.3","q3.1","q3.2","q3.3","q3.4","q4.1","q4.2","q4.3"} +# do +# echo "Execute query ${QUERY_NAME}." +# #TODO +# done +# else +# echo "Execute query ${QUERY_NAME}" +# #TODO +# fi +#fi \ No newline at end of file From 0b8d30f1220973ab0aae147d4b01f252847cb164 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Tue, 27 Jan 2026 01:03:40 +0100 Subject: [PATCH 19/22] Added checks for existing packages and argument flags. TODO bind duckdb. --- scripts/staging/ssb/Dockerfile | 18 +-- scripts/staging/ssb/ReadMe.md | 74 +++++++----- .../staging/ssb/other/script_flags_help.txt | 31 +++++ scripts/staging/ssb/other/ssb_init.sql | 7 -- scripts/staging/ssb/shell/run_script.sh | 110 +++++++++++++++--- 5 files changed, 182 insertions(+), 58 deletions(-) create mode 100644 scripts/staging/ssb/other/script_flags_help.txt diff --git a/scripts/staging/ssb/Dockerfile b/scripts/staging/ssb/Dockerfile index da4dac7bf8f..6c88bad73b6 100644 --- a/scripts/staging/ssb/Dockerfile +++ b/scripts/staging/ssb/Dockerfile @@ -1,14 +1,16 @@ # Help: https://docs.docker.com/compose/gettingstarted/#step-1-set-up FROM postgres:latest -# Copy data into container -COPY data_dir tmp # Init the data and load to the database with a sql script. COPY other/ssb_init.sql /docker-entrypoint-initdb.d/ -WORKDIR /tmp -RUN sed -i 's/|$//' "customer.tbl" -RUN sed -i 's/|$//' "part.tbl" -RUN sed -i 's/|$//' "supplier.tbl" -RUN sed -i 's/|$//' "date.tbl" -RUN sed -i 's/|$//' "lineorder.tbl" + +# Copy data into container +#COPY data_dir tmp + +#WORKDIR /tmp +#RUN sed -i 's/|$//' "customer.tbl" +#RUN sed -i 's/|$//' "part.tbl" +#RUN sed -i 's/|$//' "supplier.tbl" +#RUN sed -i 's/|$//' "date.tbl" +#RUN sed -i 's/|$//' "lineorder.tbl" diff --git a/scripts/staging/ssb/ReadMe.md b/scripts/staging/ssb/ReadMe.md index 0d7c16e7234..d322a7e416b 100644 --- a/scripts/staging/ssb/ReadMe.md +++ b/scripts/staging/ssb/ReadMe.md @@ -7,12 +7,28 @@ - Our task is to implement the DML version of these queries and run them in SystemDS and PostgreSQL. Therefore, we provide a shell scripts. - There are existing DML query implementations ([Git request](https://github.com/apache/systemds/pull/2280) and [code](https://github.com/apache/systemds/tree/main/scripts/staging/ssb)) of the previous group which are a bit slow and contain errors. They also provided longer scripts to run experiments in SystemDS, PostgreSQL and DuckDB. +## Directory structure +``` +ssb/ +├── docker-compose.yaml # Compose file for Docker containers (here for PostgreSQL) +├── Dockerfile +├── README.md # This explanation +├── queries/ # DML queries (q1_1.dml ... q4_3.dml) +│ ├── q1_1.dml - q1_3.dml # Flight 1 +│ ├── q2_1.dml - q2_3.dml # Flight 2 +│ ├── q3_1.dml - q3_4.dml # Flight 3 +│ └── q4_1.dml - q4_3.dml # Flight 4 +├── shell/ +│ ├── run_script.sh # Main script +└── sql/ # SQL versions + `ssb.duckdb` for DuckDB +├── other_docker_compose # Other alternative compose file +``` ## Setup - First, install [Docker](https://docs.docker.com/get-started/get-docker/), [Docker Compose](https://docs.docker.com/compose/install/) and its necessary libraries. For Ubuntu, there is the following tutorials [for Docker](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) and [Docker Compose](https://docs.docker.com/compose/install/linux/#install-using-the-repository) using apt repository. You can add [Docker Desktop](https://docs.docker.com/desktop/setup/install/linux/ubuntu/), too. -The shell script covers the installation of the following points. +The shell script covers the installation of the following points. We use Ubuntu and Debian. For other OS, please look closer at the documentations. - Docker version of the database system [SystemDS](https://apache.github.io/systemds/site/docker) - Docker compose version of [PostgreSQL](docker-compose.yaml) based on its [documentation]((https://hub.docker.com/_/postgres)). @@ -27,49 +43,53 @@ The data is generated by datagen and stored locally (on localhost). After that, it is copied into the database containers (SystemDS, PostgreSQL and DuckDB) where the queries are executed. - `TODO`: DuckDB in Docker or locally? + `TODO`: DuckDB locally? Change the diagram. + +## Run the script +### Before running the script +Before running the script, create an .env file to set PostgreSQL environment variables. +``` +# in .env file +POSTGRES_USER=[YOUR_USERNAME] +POSTGRES_PASSWORD=[YOUR_USERNAME] +POSTGRES_DB=[YOUR_DB_NAME] +PORT_NUMBER=[YOUR_PORT_NUMBER] +``` -## Run scripts -To run our queries, we can execute the following shell script `run_script.sh` (in ssb directory). It has the three following parameters. -1. `QUERY_NAME`: Name of the query to be executed. +Mark the script as executable. +``` +$ chmod +x run_script.sh +``` +### Run the script +To run the queries, we can execute the following shell script `run_script.sh` (in ssb directory). It has the three following parameter flags. +1. `q`: (QUERY_NAME) Name of the query to be executed. - **all**: executes all queries - **[QUERY_NAME]** like q1_1 or q1.1: Executes the selected query like q1_1. dml. Both formats q1_1 or q1.1 are allowed. It will be automatically translated. - Currently the following queries are avalaible {"q1_1", "q1_2", "q1_3", "q2_1", "q2_2", "q2_3", "q3_1", "q3_2","q3_3", "q3_4", "q4_1", "q4_2", "q4_3"} -2. `SCALE`: The numerical scale factor like 0.01 or 1 etc. +2. `s`: (SCALE) The numerical scale factor like 0.01 or 1 etc. - Be careful: Please do not experiment with large scale factor over 0.2 in SystemDS. Its join operation is currently very slow. -3. `DB_SYSTEM`: Name of the database system used. +3. `d`: (DB_SYSTEM) Name of the database system used. - **all**: executes all queries - **systemds**: SystemDS executes DML scripts. - **postgres**: PostgreSQL executes SQL queries. - ( **duckdb**: DuckDB executes SQL queries.) -The order should be as follows: +The command line could look like this: ``` -./run_script.sh [QUERY_NAME] [SCALE] [DB-SYSTEM] +$ ./run_script.sh -q [YOUR_QUERY_NAME] -s [YOUR_SCALE] -d [YOUR_DB_SYSTEM] ``` Examples ``` -./run_script.sh all 0.1 all -./run_script.sh q4_3 0.1 systemds -./run_script.sh all 0.1 postgres +$ ./run_script.sh -q all -s 0.1 -d all +$ ./run_script.sh -q q4_3 -s 0.1 -d systemds +$ ./run_script.sh -q all -s 1 -d postgres ``` -### Before running the script -Before running the script, create an .env file to set PostgreSQL environment variables. +If an option is not specified by user, the default values are as follows: ``` -# in .env file -POSTGRES_USER=[YOUR_USERNAME] -POSTGRES_PASSWORD=[YOUR_USERNAME] -POSTGRES_DB=[YOUR_DB_NAME] -PORT_NUMBER=[YOUR_PORT_NUMBER] +QUERY_NAME="q2_1" +SCALE=0.1 +DB_SYSTEM="systemds" ``` -Mark the script as executable. -``` -chmod +x run_script.sh -``` -Run this command to load environment variables. -``` -source my_custom.env -``` ### Further considerations. To compare the correctness and do benchmarks, PostgreSQL can be used. \ No newline at end of file diff --git a/scripts/staging/ssb/other/script_flags_help.txt b/scripts/staging/ssb/other/script_flags_help.txt new file mode 100644 index 00000000000..7ac66d4578e --- /dev/null +++ b/scripts/staging/ssb/other/script_flags_help.txt @@ -0,0 +1,31 @@ +From ReadMe.md: +To run our queries, we can execute the following shell script `run_script.sh` (in ssb directory). It has the three following parameter flags. +1. `q`: (QUERY_NAME) Name of the query to be executed. + - **all**: executes all queries + - **[QUERY_NAME]** like q1_1 or q1.1: Executes the selected query like q1_1. dml. Both formats q1_1 or q1.1 are allowed. It will be automatically translated. + - Currently the following queries are avalaible {"q1_1", "q1_2", "q1_3", "q2_1", "q2_2", "q2_3", "q3_1", "q3_2","q3_3", "q3_4", "q4_1", "q4_2", "q4_3"} +2. `s`: (SCALE) The numerical scale factor like 0.01 or 1 etc. + - Be careful: Please do not experiment with large scale factor over 0.2 in SystemDS. Its join operation is currently very slow. +3. `d`: (DB_SYSTEM) Name of the database system used. + - **all**: executes all queries + - **systemds**: SystemDS executes DML scripts. + - **postgres**: PostgreSQL executes SQL queries. + - ( **duckdb**: DuckDB executes SQL queries.) + +The command line could look like this: +``` +$ ./run_script.sh -q [YOUR_QUERY_NAME] -s [YOUR_SCALE] -d [YOUR_DB_SYSTEM] +``` +Examples +``` +$ ./run_script.sh -q all -s 0.1 -d all +$ ./run_script.sh -q q4_3 -s 0.1 -d systemds +$ ./run_script.sh -q all -s 0.1 -d postgres +``` +If an option not specified by user, the default values are as follows: +``` +QUERY_NAME="q2_1" +SCALE=0.1 +DB_SYSTEM="systemds" +``` +For more details give a closer look to ReadMe.md. \ No newline at end of file diff --git a/scripts/staging/ssb/other/ssb_init.sql b/scripts/staging/ssb/other/ssb_init.sql index 50ae0542593..4e28a9cf712 100644 --- a/scripts/staging/ssb/other/ssb_init.sql +++ b/scripts/staging/ssb/other/ssb_init.sql @@ -115,10 +115,3 @@ ADD PRIMARY KEY (p_partkey); --ALTER TABLE lineorder --ADD FOREIGN KEY (lo_partkey) REFERENCES part (p_partkey); - --- Copying data inside. -COPY customer FROM '/tmp/customer.tbl' DELIMITER '|'; -COPY part FROM '/tmp/part.tbl' DELIMITER '|'; -COPY supplier FROM '/tmp/supplier.tbl' DELIMITER '|'; -COPY date FROM '/tmp/date.tbl' DELIMITER '|'; -COPY lineorder FROM '/tmp/lineorder.tbl' DELIMITER '|'; \ No newline at end of file diff --git a/scripts/staging/ssb/shell/run_script.sh b/scripts/staging/ssb/shell/run_script.sh index f33dc7eeee2..c3d612f3ef1 100755 --- a/scripts/staging/ssb/shell/run_script.sh +++ b/scripts/staging/ssb/shell/run_script.sh @@ -9,10 +9,16 @@ source $PWD/.env # Variables and arguments. PG_CONTAINER="ssb-postgres-1" -QUERY_NAME=$1 -SCALE=$2 -DB_SYSTEM=$3 +#https://stackoverflow.com/questions/7069682/how-to-get-arguments-with-flags-in-bash +#Initial variable values. +QUERY_NAME="q2_1" +SCALE=0.1 +DB_SYSTEM="systemds" + +isQflag=0 +isSflag=0 +isDflag=0 # Colors for output GREEN='\033[0;32m' BLUE='\033[0;34m' @@ -21,15 +27,87 @@ NC='\033[0m' # No Color echo -e "${BLUE}=== Test environment with SSB Data Loader ===${NC}\n" +#https://unix.stackexchange.com/questions/129391/passing-named-arguments-to-shell-scripts +# Parsing the argument flags. +while getopts "q:s:d:h:" opt; do + case ${opt} in + q) QUERY_NAME="$OPTARG" + isQflag=1;; + s) SCALE=$OPTARG + isSflag=1;; + d) DB_SYSTEM="$OPTARG" + isDflag=1;; + h) echo "Help:" + cat < other/script_flags_help.txt;; + \?) echo "Option ${opt} not found. Try again." + echo "Please use: $0 -q [YOUR_QUERY_NAME] -s [SCALE] -d [DB_SYSTEM]";; + esac + case $OPTARG in + -*) echo "Option ${opt} should have an argument.";; + esac +done +echo "isQflag=$isQflag" +echo "isSflag=$isSflag" +echo "isDflag=$isDflag" +if [ isQflag==0 ]; then + echo "Warning: q-flag [QUERY_NAME] is empty. The default q is q2_1." +fi +if [ isSflag==0 ]; then + echo "Warning: s-flag [SCALE] is empty. The default s is 0.01." +fi +if [ isDflag==0 ]; then + echo "Warning: d-flag [DATABASE] is empty. The default d is systemds." +fi + echo "Arg 0 (SHELL_SCRIPT): $0" echo "Arg 1 (QUERY_NAME): ${QUERY_NAME}" echo "Arg 2 (SCALE): ${SCALE}" echo "Arg 3 (DB_SYSTEM): ${DB_SYSTEM}" +exit +# Check for the existing required packages. If not install them. +isAllowed="no" +echo -e "${GREEN}Install required packages${NC}" +echo -e "${GREEN}Check whether the following packages exist:${NC}" +echo "docker 'docker compose' git gcc cmake make" + +#. +for package in docker git gcc cmake make; do + if [ ! "$(${package} --version)" ]; then + echo -e "${BLUE} ${package} package is required for this test bench. Do you want to allow the installation? (yes/no)${NC}" + read -r isAllowed + while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do + echo "Your anwser is ${isAllowed}." + if [ "${isAllowed}" == "yes" ] || [ "${isAllowed}" == "y" ]; then + echo "sudo apt-get install ${package}." + sudo apt-get install ${package} + elif [ "${isAllowed}" == "no" ] || [ "${isAllowed}" == "n" ]; then + echo -e "${RED}Sorry, we cannot continue with that test bench without the required packages. The test bench is stopped.${NC}" + exit + else + echo "Your answer '${isAllowed}' is neither 'yes' or 'no'. Please try again." + read -r isAllowed + fi + + done + fi +done -# Install docker. -echo -e "${GREEN}Install packages${NC}" -echo -e "${BLUE}sudo apt install docker git gcc cmake make${NC}" -sudo apt install docker git gcc cmake make +if [ ! "$(docker compose version)" ]; then + echo -e "${BLUE}docker compose is required for this test bench. Do you want to allow the installation? (yes/no)${NC}" + read -r isAllowed + while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do + if [ ${isAllowed} == "yes" ]; then + echo "sudo apt-get install docker-compose-plugin" + sudo apt-get install docker-compose-plugin + elif [ "${isAllowed}" == "no" ] || [ "${isAllowed}" == "n" ]; then + echo -e "${RED}Sorry, we cannot continue with that test bench without the required packages. The test bench is stopped.${NC}" + exit + else + echo "Your answer '${isAllowed}' is neither 'yes' or 'no'. Please try again." + fi + read -r isAllowed + done +fi # Check whether the data directory exists. echo -e "${GREEN}Check for existing data directory and prepare the ssb-dbgen${NC}" @@ -69,8 +147,8 @@ if [ "${DB_SYSTEM}" == "systemds" ] || [ "${DB_SYSTEM}" == "all" ] ; then docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/${QUERY_NAME}.dml -nvargs input_dir="/scripts/data_dir" fi fi -echo "! "$(docker ps -a --filter name=${PG_CONTAINER})"" -if [ ${DB_SYSTEM} == "postgres" ] || [ ${DB_SYSTEM} == "all" ] ; then + +if [ "${DB_SYSTEM}" == "postgres" ] || [ "${DB_SYSTEM}" == "all" ] ; then #Look more in the documentation. #https://docs.docker.com/reference/cli/docker/container/ls/ if [ "$(docker ps -a --filter name=${PG_CONTAINER})" ]; then @@ -78,17 +156,17 @@ if [ ${DB_SYSTEM} == "postgres" ] || [ ${DB_SYSTEM} == "all" ] ; then echo "Starting existing container..." docker start ${PG_CONTAINER} fi - docker cp data_dir ${PG_CONTAINER}:/tmp - for table in customer part supplier date lineorder; do - #docker exec -i ${PG_CONTAINER} ls - docker exec -i ${PG_CONTAINER} sed -i 's/|$//' "${table}.tbl" - - docker exec -i ${PG_CONTAINER} psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "TRUNCATE TABLE ${table} CASCADE; COPY ${table} FROM '/tmp/${table}.tbl' DELIMITER '|';" - done else echo "Creating new PostgreSQL container..." docker compose up --build -d fi + # Load data and copy into the database + docker cp data_dir ${PG_CONTAINER}:/tmp + for table in customer part supplier date lineorder; do + #docker exec -i ${PG_CONTAINER} ls + docker exec -i ${PG_CONTAINER} sed -i 's/|$//' "${table}.tbl" + docker exec -i ${PG_CONTAINER} psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "TRUNCATE TABLE ${table} CASCADE; COPY ${table} FROM '/tmp/${table}.tbl' DELIMITER '|';" + done # Change query_name e.g. from q1_1 to q1.1 QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/_/./') echo -e "${GREEN}Execute SQL queries in PostgresSQL${NC}" From 2a33bc439e40e32aa1a63c077258f9d623b5b821 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Wed, 28 Jan 2026 00:38:24 +0100 Subject: [PATCH 20/22] Added duckdb to the script. TODO fix some problems in Postgres. --- scripts/staging/ssb/ReadMe.md | 67 +++-- .../ssb/other/dia_ssb_script_structure1.jpg | Bin 36854 -> 37704 bytes .../staging/ssb/other/script_flags_help.txt | 37 +-- .../ssb/other_docker_compose/Dockerfile | 80 ----- .../other_docker_compose/docker-compose.yaml | 46 --- scripts/staging/ssb/shell/run_script.sh | 284 ++++++++++++++---- 6 files changed, 283 insertions(+), 231 deletions(-) delete mode 100644 scripts/staging/ssb/other_docker_compose/Dockerfile delete mode 100644 scripts/staging/ssb/other_docker_compose/docker-compose.yaml diff --git a/scripts/staging/ssb/ReadMe.md b/scripts/staging/ssb/ReadMe.md index d322a7e416b..4824a9f7355 100644 --- a/scripts/staging/ssb/ReadMe.md +++ b/scripts/staging/ssb/ReadMe.md @@ -4,9 +4,20 @@ ## Foundation - There are [13 queries already written in SQL](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-queries). - There are existing DML relational algebra operations raSelect(), raJoin() and raGroupBy(). -- Our task is to implement the DML version of these queries and run them in SystemDS and PostgreSQL. Therefore, we provide a shell scripts. +- Our task is to implement the DML version of these queries and run them in SystemDS and PostgreSQL. - There are existing DML query implementations ([Git request](https://github.com/apache/systemds/pull/2280) and [code](https://github.com/apache/systemds/tree/main/scripts/staging/ssb)) of the previous group which are a bit slow and contain errors. They also provided longer scripts to run experiments in SystemDS, PostgreSQL and DuckDB. - +## Changes +1. **DML Queries** +- In this project, we improved and fixed errors of some DML queries. +- The major changes are + - Switching the join algorithm from `sort-merge` to `hash2`. + - Using consistently transformencode() and transformdecode() for string comparisions (Before, it was only used in [q4_3](https://github.com/apache/systemds/tree/main/scripts/staging/ssb/queries/q4_3.dml)). It leads to correct results. +2. **Test Script** +- The main purpose of this project test script is to simply run the queries. The focus is less on benchmarking the execution times. The reason is that the queries run very slow in SystemDS which cannot be compared to PostgreSQL and DuckDB. The main bottleneck are the join algorithms. +- Thus the main differences to the last group are: + - Using the simpler [ssb-dbgen](https://github.com/eyalroz/ssb-dbgen/tree/master) for generating data. + - Not a full testbench with detailed views for different execution times and database. + - Running PostgreSQL and SystemDS in Docker containers instead of using it locally. See below. ## Directory structure ``` ssb/ @@ -14,14 +25,13 @@ ssb/ ├── Dockerfile ├── README.md # This explanation ├── queries/ # DML queries (q1_1.dml ... q4_3.dml) -│ ├── q1_1.dml - q1_3.dml # Flight 1 -│ ├── q2_1.dml - q2_3.dml # Flight 2 -│ ├── q3_1.dml - q3_4.dml # Flight 3 -│ └── q4_1.dml - q4_3.dml # Flight 4 +│ ├── q1_1.dml - q1_3.dml +│ ├── q2_1.dml - q2_3.dml +│ ├── q3_1.dml - q3_4.dml +│ └── q4_1.dml - q4_3.dml ├── shell/ │ ├── run_script.sh # Main script -└── sql/ # SQL versions + `ssb.duckdb` for DuckDB -├── other_docker_compose # Other alternative compose file +└── sql/ # SQL versions & `test_ssb.duckdb` for DuckDB ``` ## Setup - First, install [Docker](https://docs.docker.com/get-started/get-docker/), [Docker Compose](https://docs.docker.com/compose/install/) and its necessary libraries. @@ -41,9 +51,7 @@ For more options look into the original documentation. Our script will depict the following structure. The data is generated by datagen and stored locally (on localhost). -After that, it is copied into the database containers (SystemDS, PostgreSQL and DuckDB) where the queries are executed. - - `TODO`: DuckDB locally? Change the diagram. +After that, it is copied into two database containers (SystemDS, PostgreSQL) and a local DuckDB database where the queries are executed. ## Run the script ### Before running the script @@ -62,18 +70,23 @@ $ chmod +x run_script.sh ``` ### Run the script To run the queries, we can execute the following shell script `run_script.sh` (in ssb directory). It has the three following parameter flags. -1. `q`: (QUERY_NAME) Name of the query to be executed. - - **all**: executes all queries +1. `-q`: (QUERY_NAME) Name of the query to be executed. + - `all`: executes all queries - **[QUERY_NAME]** like q1_1 or q1.1: Executes the selected query like q1_1. dml. Both formats q1_1 or q1.1 are allowed. It will be automatically translated. - - Currently the following queries are avalaible {"q1_1", "q1_2", "q1_3", "q2_1", "q2_2", "q2_3", "q3_1", "q3_2","q3_3", "q3_4", "q4_1", "q4_2", "q4_3"} -2. `s`: (SCALE) The numerical scale factor like 0.01 or 1 etc. + - Currently, the following queries are available (q1_1, q1_2, q1_3, q2_1, q2_2, q2_3, q3_1, "q3_2,q3_3, q3_4, q4_1, q4_2, q4_3) + - Default: `q2_1` +2. `-s`: (SCALE) The numerical scale factor like 0.01 or 1 etc. - Be careful: Please do not experiment with large scale factor over 0.2 in SystemDS. Its join operation is currently very slow. -3. `d`: (DB_SYSTEM) Name of the database system used. - - **all**: executes all queries - - **systemds**: SystemDS executes DML scripts. - - **postgres**: PostgreSQL executes SQL queries. - - ( **duckdb**: DuckDB executes SQL queries.) - + - Default: `0.1` +3. `-d`: (DB_SYSTEM) Name of the database system used. + - `all`: executes queries in all three databases. + - `systemds`: SystemDS executes DML scripts with basic output. + - `systemds_stats`: SystemDS executes DML scripts with extended output (--stats). + - `postgres`: PostgreSQL executes SQL queries. + - `duckdb`: DuckDB executes SQL queries. + - Default: `systemds` +4. `-g`: (GUI_DOCKER): Use GUI docker desktop. No arguments to pass. Set only the flag "-g". +5. `-h`: (HELP) Display the script explanation from ReadMe.md. No arguments to pass. Set only the flag "-h". The command line could look like this: ``` $ ./run_script.sh -q [YOUR_QUERY_NAME] -s [YOUR_SCALE] -d [YOUR_DB_SYSTEM] @@ -82,14 +95,6 @@ Examples ``` $ ./run_script.sh -q all -s 0.1 -d all $ ./run_script.sh -q q4_3 -s 0.1 -d systemds -$ ./run_script.sh -q all -s 1 -d postgres +$ ./run_script.sh -q all -s 1 -d duckdb +$ ./run_script.sh -q q1.1 -s 1 -d postgres -g ``` -If an option is not specified by user, the default values are as follows: -``` -QUERY_NAME="q2_1" -SCALE=0.1 -DB_SYSTEM="systemds" -``` - -### Further considerations. -To compare the correctness and do benchmarks, PostgreSQL can be used. \ No newline at end of file diff --git a/scripts/staging/ssb/other/dia_ssb_script_structure1.jpg b/scripts/staging/ssb/other/dia_ssb_script_structure1.jpg index e6b43ee691704997003cf810cec4bb67871dafa8..a2e4a7391957df122c81bfd0c77c4c4383a08a36 100644 GIT binary patch delta 23770 zcmcG$c{G&&|2KY(U1;pP8KTHGWi88$5|Tt4AyoEc3mG-JWG!n3^pDA?3829w<{r&zv_kDin-1qO?=l92$bLQG#*K>J1AJ50rCV}OB0n5Vz*g|Dt zAS*tb<%B*Ogse&#gS&t*S`@2G@t~{0qwPKD`KfYGyXIuRM~-8=+q%*oU8J3Po1?a@+(Azu@n&h~w8z zOW|L+X;w?{ynx}2Ek*IyRnLH|&uP0rI8KwM+4{>#faZp`a=Ur%MCw7_^Q12ck}CK2 zJ7*ls^)F97$bN}Mi8Todo^lq#g6-VY5X#p@D~j-gncmt7RTKDwms3d1aP*}$`OBfb zNnAV2E>~5=Ig5K?M(lu$601sEmJ;<>{PO;-iGX5| z{RHdwbuH@`Kfaz33FhjBeLE8;@ABGo5w?I9rrE|e`H@HAqFCDt@n`#Mm7<1*1zb5K zWFz)UI9?KQy&_>}cZr1ykQ{`~GUTC+$ujxNjnwp*<=P4*6Va_;=ml`-XE^1+>p--q2vQbio)2s)aoQh}0sq!r0GZ#-KBQ%2 zhRBfiT|mTa=c}JEa<0hw&lx5r^LGL8;*?cfnG@F$nY`E@BBh#iRWfg_=Gd^=&JG($Vej}T07l=MaQy+e=yO|3`1@!Ch zpxSm1dxr4Ug# zm~mYA74WPh*hgvhXv!NK>0maEv~#eJI6Ww~d=wka5Wv}B_t6rc@daEhGC@A}rZ@=8 zN;L-0res};fZ0K zAohH#=!67X(leHPpgw!?CtYWF8Rx&Xnxe(?@_Lx>qw3#18q0&1E1%VVln_X}DD{oM z1hzI&jfqCKpn)L+Jp(Ng-%Erx?fkb(JITwHU9MYSM@OIJzGFG?Amw2Z$-3ByE1L_v z?#DucM#&dmpg2t`yd~WW^T)h0gJ{8l;Pol%It_Eqk9J%M=-JD-?74(qHe?On35spK z2_5fhbCUbIqJ~MNm8((_{a2~qu@QbgtV05!~{{{hN^*VKNHMpsg@wxuBf>_9kpy=flpelg}*`{P_f$)!)%4Q)Oey|Yj?k&OjF4Zq1?$^u}QaxGS|Ac)l`V`c$3$T$9 zdOO4*)cso|SuiF{UJy%YpH%u`vo$vSrPgh2vlyYP@UY$DrtSSA*7cjr14Z$+=S1-* z+(hy3uAT!CireQbUR)0o9l%LkEgW0K@NZb;Vj}hp__B!&*(Vq$8lAov{k94AQ!C+R zjLu=Vec&SKc5N%~=^v4tU9M;)*9+gnhbh_P&r_AMaRG#ui{Vr6gsDUo!* zqt5k8<1tyuT|l~76N@s3p61~W4mXQtM5}7ef%i+mEv#W7roElF9A%cMI~h%b$A-8%gZjE0Uq*I1})7bfnppC58Ln?aegY^;l;^mfmvZYHaqLdkm_mVEmr#ct$+pxrUR z!HQ%yxw4+&Q&#@~m9cF@eBHkiG}OBRvJtm0vxsJJt4YEP)E@-!M!TBqQ~ZgZbO*$5 zVQTAJz6CK?Ko}dpjb{8hQBPwZw7GpN_l}RtRjpLh^z+}#N?soFi!pM0a-y_;b~NST z-ZnC^e4KlQuvHU4@pfyX4 z@d*OSs7laxerR|Xa1;HhjpYBGW&13#IwUQ#Zc1OwkR1pHt6>S2upt*Y(wNOWB{KaXX=t&-XM|8JuZM=-p z9V+KvNk3JU5-M- zG`yq)0Q>=#a8*3|!hR4og{f7H_QCm(kvx!>!TA@STb~$e1LZa1+M~J+1HnGWznR}m z(>fcbyMP1^l%TQCbVbh5d}3p3I{IF& zFKn0%l)c=adH(G{Zc}@RP-d~9kknI_E)NxjJJk<)EifWbWCueVIy|M49M>#2jKUoa znd0%D=Zsyc$$Xk$9sZ`asWxLQODYoa-tF)s<6ji+*V@1(=prq1^YPFNs#zl4#HS3M zIJF|WbhaTGynv9Z8+aMw)$io+A&=Xsbw-Di@Oc~Rfy;${$UFXfo6s zU4wk4;4)oqVCf=(z1)qM;`8`gT0p6T{CoLnbxPhO=&#FvB%9aKyG4(qGJpn>ZDVjI zIgq!hRa~?5YidW?PWQEo#eifSZkVz64-8 z8VkM?qw{Y(KZAmOfab`k)`~bj*9k#}1onJKS-}JLl;I!;@1HjmMv2`&Q|f%bJ?+%2 z4eYh%RKs6=uncnM&SxMVke~1CX=6zBEeDmMtfGBZRB!n7^W4Ecx_a(NFS`kAH~?6) zFaEh8+ro$Jy*%v%WHSdAy|#_%K#Dhs<&S%?^v^lgS;U~@1@1224bSEUq3iHS!Jmao z8RJE8<9x`$hRCM& zp&=QMGk44*)pIEOPR$)u#9V>|#yL1tvX5%^iid=ebV>Y@X=6!DmKQ*N5`jTr>U^#f|V}wQ6Z*gDy%t2@f?1_<00!-doS?;r7(k$p915508w* zMo48kj=QKPRC)Dwc3SqU7`4f!zvny(ZXd+)6O)f8YVif0K}!uFO$J>1adxE(RVLSd z{4QfPtR3`GehT=Fi%!E&0rUGHA!-aV_8d;p3bs(#T7A+Zt))%(slV$_zP3~5k9uaW z(kbKalTx>gw1Bl(CVXj@RG}Xptk5QMdxlt)IKNA)Uerq~3oRbk+S8FHU^rd%zLMkU zkFODP&)mcHOYkii{eNx!I*6_47?+@`M+Q?sv8A+(GwJTxp92GBP1v3(CoM1giwKKe zGXhfCfzT!=1!#hHIy8%??;7Mseq%IJ3dQxTNPg`{Qew}17aY8;-L3iar$fR`mz$vRhxXV&j1CMdZsV!*(n&k6hrZ-Ocm>#NncUfQ$CRD;CHkISTgXY z`Ayvj6Jw@yyd`a>?tvFIY@MVSMXXlKum2j>ebqZ8NWCyf^Cey?_GD9ff~-KQfbD$Nxp*Y{S2Rq(MwKi}@H-+p1Tn!NDcd@e0LaOwLM+|pj#{h~XxaywrNlM0A z^xi2nrJc1iOb~aQHXOO4y~Dfks{s*77J*VI&y|n*Oa@HFcU$_MP4m8#@AH}0hH@#M zokumvoXt!qG+ZVC>j6U!YH2!FhQ(Xe47mrm@x9auyIW(q>1nfH zBGDn>`^>ZI%J+^%*@$@j*iD1~DA6oz_hdAe70l~hTz&I$yv?I824E&>Yl_ev#x8)xoV!p9Llth2*~5#auBH zH~`P+2kuHajipNeHqUqL1nBGp4_F=KsK^kwu&o;#JW3Swid+d?s~=R)f1NgKwEaut{cwmB z<(iAz!yLya*06MDl*^q=kC-?M?!~Gs?*gRTmr3#uf>quOjh;#0amz31R^d760Q~mn z8dh$?<*EaS$ej5GkC=5!Ok0Cm{mA-``jCps$&l*b{_Xb!LJq6?mvERXOr?6VY%on7 zk1^Hzm(B7}HOQP=2dVh_2(+MdZG9r$OxnKc)?MW<5-496zs>zRG>2ExDK~H5G~#^O zi^p8>#XoWt!#B%&da{0OI?s5#eFmpjgVC^ubUQ-YZ`rYr#|t9MNAi)UdpS$~VY2@5 zEChyP#Ygw-MQeEz1e&EG+ZHF;0o*b2U7-_cYJMGJw@+^ld~;mHYbv)DI;!4ZD#~W7 zdg$W2_gFS_;TDw-!tevi2lb^3I}+tL3!xlpb*D4$E2!Y;0tg4s!I0ucUVP zogK^AqUt(3MNEV>hb;7A>T?L=1ZHHdGbBa#d-L#YP47zPl61c|g`qP+HsWdAEF3z@ zPgkL4``#m*9sYRNU)31d{)gt_;_O@* z7jLZE9POe!f{o+JLgJuwVLU^Xt_s~vnIb+wh*6gUqQ7e2xnrZS4s-Q4Hw>TdvGqPE4#q^KZIPf@62rXq-}@8V@xoyQZ^D3HIRe6 z&m_xQH@=={ph1saAnsN2HisQ^SzfG?!i5-;n3lR{OAL?~dYTLSl{NQHuc=b8(ZbJ9 z;kscrY|Th=TzV(FnG;*}b6fKmR7&!Jp={B5F>K#g-?<@!V?`YjCz=M!rGmc1nX5?BD?9V(l_rv#*?WhWOoA4^ z^r{Fx@1PjILP!>$i0Xui<1ZZ213%BjRw%gC9v_cBf8*2b{rx7XMm7(B7#X_RIU473 z*hsKRM<~HOnaK4oAp7$*hYO&=Tb%jU9}hM8QS>d9Np@_9xXr!F-3<3dC~%nP8K!Ex z8%6{ErBlvHVVE0qLn?c6#Ck09K=qVsNaMp>o;8^vb~+nO<|Q7yFMTC!jZ+pjLCuCg zu*#%`P-5fh-i>Jk(Q})oshDru`;*h>6t{Iz7(+-LSfa~OGszxp3>-IgqLqR0y-9LP z@w?uhl~?1*t@y?u16Z8@d#@j8$SB;wDIN?n)`>zc=0huI%uew zk`%2G+DZMi8KU7KjjU3KI~@h)oA_wyEolA00_s@PuMFJ>WiCH`oAk>@wbSzVfVktk zz>>=@P*5qBuo-Cdr-X;$DSk|%y2bu<1NbWeXOZt|?+QHi(^h%Ea^+aUPlYI@sqt_F zl3${eB5hnXj0L-A>9V;${K3Y2!GM+$hpCgsqB<%`tv@#Tgy! zkSqaJI1xIUwTbb|zw|byc?4o6^qadqSY~{DT7czqcJKM*-?s$=A%lP63HUdd22yr` zM3Qb4-3t?bh3c1JU|eeNbt|KzfBv>DDRq2&*5OIE`e+P~_VX^pgEb?GU4Tj?F2>O> z?gFiTyFlHaAj7lGD5~girXSb^w%q{a|Je@U|NSZcug(DepWWvFhih@VeNqKx`nlpP!&us_D+0$;E4Tqk%ix_}J#pL=XD)k(#oLKZ4K*RRG`t3WXE`p5 zxw%a2oj+iTl6I5SOo`wI{Ni^3ueEzw{7|TF#{mdW9&>4X1uoudFm_zf`~1A!eKGXh zVR2B)?2)ZT!|HeWB-55rBbn1g(Pohrij02idHQK;-tE;&;Xp5v%aTs(>W`xwF4(K6 zW|OP$mAT~e0n9}xpW1bt#!262KiBTZvp|~tM%dO^v#pO=yHQ>&@WZZ+Jl1EMWcP?S zRf)w5Esryd1v>@khUr~lGFHOf$V@~GDUa1UXtdG4er#5t{@F_B(d<2SL{WzD1U3W7 zK7zBPKJU8ki@;L)uML%DZJI@x#kpN#T|01e-(zXmj~d^-dmaLMAOBRbTz;x6+EV7r z+B}zheSBR!-6LG#(v`2;R1T6KMI7Ud-~+JJ%t8jFJ5n`24>9&QT=sSnjMjC;p;NE0 zH}Iz6%^vrxN`pPQ1UXvNAwdh^HYU6YBwIbm60f6$L_!TSC-IsW)N!fpx~r^l$KsH{a#u>0JoMTyw4N!FF!m1@S5d%>kQ zVFRy9t!`e3u{indFuT3!gFV%N7X+W|ilHM*&8O6^{aEX8(rP$fkKR9|(|t=gLCfYN zU~vSqkgzB2cHL|@Kiud)69x0pS;CKkNV*o)s(t047C~Y@V4UZc#)zZo3g1lBw|5ys zvl>;SAN37xl#L$A2H||PFU8v?IB8;itP|9jTi+)8LVz+#tJ! zmq zkRKCc>yz)=p!i8x#PHc+xn02a7x0>Ow-#?kIN5*xZ8dht!Kke{Kqrkg9GI8 zB`iXCER#3s;dzqMr^5ACXJYk)9Iaq zh!k%OiM7*an7gpG3J#mEKY8|s>UST#XO)ECcivb%HJ6CtydY&a`yICOZ@Gv?uE2{N zAs5PAiyy+A>?gdqg}ztp?QsroMBQ1fP%9#wP6d^oQflJNS!xo*^aG3#KC}Z9T^K=k zD20}53h9s=P!C^?iek&rG|=f&Dz9q?)g!(_izj>{KyBo z?o{X$^ch4(H_I*^UaG{3*$u3gH}P+IFA8{(saPbJ9le$glCs zW$Ek$Q9$e}WJw98CFM^#3)H-6m!;c=-@mg0ccU1b@5l1}Mn_uUgq@IrH~Fw6RM zsbiZdhYxEhl?tf?de2d0bgsvwl>WD78K;hNi*4gEC@GZV5oK;Uk@A3c`^;FYkjJav z#Guw=Uj{OO?f(Wj2F{c-v6xo4kW*MMg0JrNU9?*Q_NPhh@g>kb67(FTB!JyxRk%+^ zpL*rxzJJ);3%^?RwtDd8ZGg=MX z*p4dla^^irU>e zl*La?yjNFQ3#!Q*q-s@v|1w=uFZ6N5%RS{oNXi3dyXmj5ON{RuS~1Wh?^v~`)4KQ9 z;XSn1KX7rFCMl@59FrA_KJtOpumcF?j3QoN;cS<0O>f@EWNSHFGISG3{c1e+!#=9^ zor?~6d_Gc!&%e)dq$>-3xxoI^R{y>*Y*iSRNCvIMppZ_|jV)q~HoT7Ha9x*lp(@l0 zFDG)nn;@*3333Av+)zaUc57p09mD5M8Imz}>Yp1t5Ft_* zF?Wi%EnL~Cntw0tn0#BbeyXC{%tPVuTO*t~3;zIQvj9K85I?oDL4nU~@B2PT$4MB; zaJATQMsQ6=8X+{vDlXd?Y>y@6GJA3Yjz($Gn@BhHMQ*de}oOdnK` zpZQ@EZdmdM#^Uiult4J)#x8K9g70${^>liL)4>3qlce>@NRCiDe1}`J^gW z?Qn5~>u|FcXS1cABq!!S{we@-cRvE`&C392ARpsv7!OSv%n*Hn6Pa>EwKgj^N1o}$ z*%mESn&dB@KXYsJM*HUg$1U5c8=yW6%g73#sBXnC)U zn6ej(qJ7EL;-R9~Se4T6sNUZN7PwfJ6*OM^I8cbQMw}vg9Dv1*iRxWx6T!H;pNyDj0aYoLMi_&bP zaB@WsP9H+2ES5@nIHUPHgdf^{4w^l8N#wlrUXI?b0Ba)&S=NF-iGuI|jp=I^!>XL4 z&D~e4@YUJ@4_@=itv(T&i7OT-$vn3DG-76(Rh%x=xUGgmL2znYr=)Wn&ZpQO zt25-lhX}Z}Wj1o4Dr<3ex$8CPpvvthci^eY+HMaZ(O=!ePHbR5QW5jN{&TvuK*0#f`gkPgJrXj@k=O7 z#%5>+hnbSO)hTtyJGAUYb**=O8tF*coO>DCeRJ<5z z7eNj?i7%(NtyA*A)?>4zyO}HCj~;M@F*5!8neRmX+^7*!)eMm-*5Y$l%XtO4JOSAd z5s4{-O%t`5fn|1}c*3~fnn7!{h2QV$XFoH59bw34vS%+O)GDf{7_%%&5<3>`KNH^_ z9Jnu1^{2jI|EZOqnw1K8X5YW)pKw3T4E7wT2NQ5@0!L;=<=9wjwO4)4F>jgUVs<|2 z>6i{-T8HefQ$R?ol{BWE7=sq0Ee{?4ICr^!Wp#B$1lBX69tTIWewaOKoHPBSAY701 zU*RheeVSfg~kOVk+d1jXqdx{;{hm1}pBIY&(jM!%hYxWsHChYk&_Bns*I2LJpK zBJi+@*+mL6IJi@L>+*N7%>ry4*#%5zomVi+;7VV$AGl0dzU;dTq#YR}j9ar|_{tcF zQAS@H$-I*KD>>Lt5S!b6ORptWzYzn9c-2*xk<-+C`#wE%@+tyiH09z9%Wm}LBC^Ga zoHv@;MzL08cRbCv%cPFVi{N=1!4n{s?%sUJq1~b8u^g-9lc*E>cny|+H#>jM0ETy- z(nY+dc#@+vChROg@C98i)23hUt9VsGG2)f|(4wnLY?`kAqq-q#!!V)HuMu+%&QwAMaq%%vv-xb)pIRg18D-Gh4Cjek37yhlq`FN z-&R)US9r4l#ItEnReTVJ#DR|9W0>|WWSn**bR$aq4Q+oJqttLv_ZIud{8908BWz#7 zJ#vc9w2~~sgW5$A3#W5ZF!zW;xDd(?ytNrYGa2sS^X*;Ki#l|y&u{fGYEClbBnM|y&Jk%Jy_ zz&IZ1edJ$qM;e69IH7g{{A0*tZWkzGwke%E!kl+-ze%P&^w_!oU{N`v!+JpyvzD)E zW_ooirh5DQwo5zm+C(DJlX+Ok*&jc%4mZkpvd-=TFd2axVN#goxy@=?Iz!;?6}k!M z=j{;Y(Urcs;EmseAGDpyO$_Su9Jm$=Hhu;i`?c3bM%zH%4M%6PQ_7T#Lq`fus}Z47 z-@E3k=IWKX+A@Y)I*2(F|BYx@v`@=1?uKFYP2fViCEWmCv`LIv8a3*PJ~GT~r^_tq)KfpE-$JYR zs}(vtPSu|`Pn8-25l0x8e*HchYRRhPAsv2aV%e||(ly^91dXY*>V>7a>`yB$9$nld z;SRU!j-32&qC+`B{*{my=TFeft zSp=NyVI}}(grpJa!eX0-J1>Zy@0--7;xW9~!*sK}%WXNQgQT_#vhMH{x3Z$1bFnBn z+1`?+7ps1H7d^?kS`i8I;Pk0hy?({vs>GbGR|9jKy_dfvA90t~$mduRm4y8L)Q!8S%orM+!KJPD2 zaWkYZ{DHPg-LI@qvcD>nko9;CX2Z4maNd5j^mhEhIGnbYNLSthV}l%TdHO7`=d!)J zevLeN^mPQ8{#A5UnxzPB*;v&XZphPyKz^;^@`%4k>#V1kK8;k!$kRDnBfIxvShpXS z?8fg%?W5~=*1k7|UHq<(d(Q|4r7GWqxqW;_wPf}c z|Iru1W-!$VkbI#T!&Ei5UZnI?gTq9+f@Cb7Nuh5tdjiZv{pWpu!kN*=2yGyDHzoeF z#?Nyucdi!LkJt+fejj%lYxD5t_mVgpVih2m=DsI8^$a_36G=^H^bb2XRYFpe@j-@S z=b%V~b3E{(NQfrSrIBX4*+6yf+67_-=?DAEw36><6r^j3ArMsX zL4|>tXl{X-Kd*-4x1K_WE=kVRg%hKWmBdHixPdK?&CA2+BTy`5B4GnVl4`jv7EO0i z497{Ut7_eu3KUD<$)Tu~?*iBp)9D=Asn)=GqUu>A^s~R^4v-D9grmc#Yb$uNkB8BK zK|m>PKa@!^+b76--NwY^G1{m6%^xmSiY&wV4lUG2 z7b1J3s=Opc+#l)PeI(WWC`MsVcC~BeyO{?H&r+GSk_CApxdwcYC4pA%^{J4pkxVPh zztMQY&R|q(VR`0=&DEL@zB6ZTUMQA3a{&k%1fKq>NciH!ubg{$BBj{ei`4LoPN z;f+N}W8L#FM%!e{;^CY7PdY8!riF}V0~-zh$VYW6J%C%(- zxtSv27mI=2A)fPk%ZcN&y>~{k-k#(v|HtgG!z(7cz`m|~#N`4kW*4vmMb}|JV;(Tw zmaw@RaYYwN?RTcu5-EY7h$O-Yaul_htjov&M*$ zD{x}@EtHCwFy)%BK46ead}C>GMROM$&0%pXYFTv6+B%)DGS@EMwVj>^z$U)o4pA}5 zHN^X*!h|5DV7nm9y=FM!v_s6bv=HRguQI+am&qPBi8`myLBAtX-c-SiJ9V33gX?S3 z*PUg2D|`(CgD2na1)~Q}4%E_9b&@6cav%LpTp0W`^U1ks7X&U6r#k>m3j3GzY@9dZ z0IjkQXG;UFBSiSfqXQv@*sL!zSE_A&SST#C?;|xa=KB*&Q_{Cjy@%#u>$=OiIS@}< z##*PNRGXxFbpCZJ&ZybPbf|Mk>P~chkXemJ2{3NQr)g@%sc!ogxXb>J>%Yg@)7G7N z!w}wRP(z1!qp>FX>cEXiV3-v3rPAKM9E`J@jyt@%4x46h?qr0au!`>*P*Ljy(p;Ag zU!h%>aPi(q@h+L z+&$ic0QG>mDp#{Wt9$o+y+d&dIHE|EvUZW89%!`V8FJ(gOE}n-c zgtpgb!ws$8P3&o6FT1=z;FqF!+z;C?WwvdadAA6<=esPQUeh;9y1)D>5_0_!e+Z?2 zYI>S==$|AZ>%b%2j6ZL2MD)j{RBm;)E2n63?ibVsi`1uiCq-Sp_i{dg@DC5lfShx` zqR@wN8Z_Ny(J%n((4lAh0KE@NZ@ZlyquPWjJfmKb`?fI^n^F*GP;<_W*#&OWhSF$#^ypN)PP`w zN9$plL%S92UT{71i72qr6>}c6@R01cAa$KN(4lgnQ-b%J!m$?j2ZCu7fhVoq-w1`C<~Y8Nj&A*Pr0jj-Q31K8RH zHeIM5u{bfRW=rq9C=XjkU3}s7hR1D(Ln|uWs-w&UG*d0!xA41WiCi>ga{q|@|CCi+ zlzm&pq_O4(YNby8p|LzY{!-O)DUYZhkI6~aNxm=kuBE_Ug$2 zWF-HYkmg#lcVWNpblB&^;ZbjPfm=Z-R+s%fS@&=FB{7lUGjodmS=#wF?q6#)rAo_m zW6vgBKk1<8ueM_M$oKw-1iATu+lmoQfn?PE1~#-3^~_tJQZKxI`+@L>E|2#P$DOz$ zc~sz@P2#&?*%%6Oq66V&+yj zA41*txnbeVPSlTqvo#S+!)=I}9h%k7kRbchx}|taRrX4pl#Mtd+siii&w?9=hI$p+ zTY*K6#y;LYQcRZp$Vp0~som4$J=rYg27nT4mm+kJvX&I)`+$0GcFKhM=tJESh4*2E zI{&oRZG$tIusKH9_fYO<+0P<`S^Mbhlkhz_O=?Gd6i%vuM-Bb0CN?Nhby(==n8MFf z(@g{Vf<5x7pW0K+v-ie@?*ads)*p4dI9+{s-A}+7cNS6!b}N8N*5$s|WG={P5d_#A zOE;h2%iwvKt?byDAHJtbwMgUwtJOPDf_`+0&)$k-Ni-x(gqB|-`i44GS*KE!?UrA6 z<&@L0UBEKi<7{7D-38$Ge~ZPKbkkoX*bOn$5XCtnOnkw-6x(w->C@$-aUc5x3Yv(S z68xl5Q%Q#hR<4P5jZ8d0_jz&`kiy*wM0GrubmoNsy${}GR)5J~cC73Uw%89R;u+6! z-T8I_wdXB|ia4uHgso3JY?JJgzAhqQDssUeV9n0v?EUAue=i^Em)tDnZFAeE8q?t% zd6x7IYEtsCI(<{Fl6FG*NW_x#{&&(Q!M`4o9fDl*fm;ZGHWABEfb0i4nx)eGsxPO< zT~e3-v8uW%e-rOug}gMIWa1iZC}DN>^B-SILL>ge)Q}*Xo9(%zoX0U-@=OWm5ATY5 z>1crlf;|MM)Kg8``)k{`q+dz(z@1KZ1&UUw7K}XExMLfyklHOb8Lox%rePwbmM0_` zqKhP9^OoTzbd`68K6BWpQ@rEr>v{8xF_+6zpKb^l$uR3#gOJIJ`+Ap#ns_K<$aYZ) zlCRe2H*oc=!}d$-=Vgn9Y*(h4zVKhdy?@??xD$#t5z&nF^0223#|MG%Zyuhn2iIKc zO#N1iZ|ds4NWiDuGm?fu!xT^wSxpy)97??_F|1BBmJd1e3yl7!xw^Kj#T-(%5-XDr z-{b@Ih9}*6;>)v#nhmgJbiV*zurz=pR+yuwZRqB~@A)?(55I9UL@dm-041GR08v#^ zlp}DwBmv-H-d4@O*75f%u=xlSI$7~R^3;5@E_J4$)JHhNlQ;Y2P=8Wti?V=%#@-$& zmA#$eM=nxryD)$b8bLRzq!m1jobGG7v6>!`)Gb=(n|J^mC(>ws`p=TbuUrY&XNb|L zeLtGj=_nc)Q75hQLUk9YUB@#fNXxjM{TR}llSfzHzvpx&!{aeLXj$5%)dMDDB^2U_ zvaN5pVrFflt03?lVFCmP{k=(7D7j-?n0lA685@tY%9~HiRqPq`yH=N+MAS-Pian09~Shv&SdBtCB zJ&Y2}mK*=cE>~gL_y4f=Yk%K2!i>$BT(QEUrdExJq}dO7lO1I7Ne*)Ve)fUW$WhLzp$FTC=+_``>L^1@tiAEQc4lo8?M5<1EL2si z@PHCYjX984Hql^NsCdK3MUVqPI<`;`@D#~6)Iw9VQ=aa}6mFVPu6$fHHD-}-VGGOv zK)MogtISyhq~E13-@|9-GOc_>;;XgBmf4Bc{clh6(;O}-Afr1=~H>J7X zD~24}LD6Oot&NEIF_QZ%3~nSIIXRK6h5F5$*7CHH7-=Wo>~fA7 zxLoIa`hym8Mli7l>SbB^BRgZ8*h9Z0mg71Y#Ig%qmYOW2YB3iAq-}LGj)T=-9}ns6 z0xsAu|tV7R%G+_?+%ZfG&*9YxnVxG{r_kGlXr zp3$tGycfl|SNs-Gq$}>;V%yNpey(t{Ct+o>Yk0Z$y9@O*AO&8(>k_HkO5DS- zcu`vhS2%E>{V<&!kb`vpDEQ2JNJ4)Ud=r(ua(@(j!)k)YQcfsxAVUjQ&(r{7xajZ_ zKCq3!@rp6Z%4tVLfyJ-Zb<@=p=4iKpHGCh7IJoybbDE61Srm6{t_$RALQxJZlL}J& zg5D)XY2805Unw02%y2G$rndGlOu7jYkPjFm1Mh6;NUYjX%YDNSPW_R;<&;w>-BT4E z4sOU`ux^roOz1s|dGKew;=U4?@;6U-Ts3@~d&G>Xep?Y`s^28qTIzA$3{A-7>C4Dp zsdHVVrn8u86TblI8-{TyD@^s9%nl)p6MN?8w$(Anf^2cuEP@od;NOjf78)ofUJf)D zBN+&0bP<5PV+j{!s?vl(_`#hV&TlnLf)UEXq}>&waK?T@?M&2-@o89nQ9I5oP4E3T zLGmB*54j%~vXj~jgX*WO?$!1PLxLy)Zw$YIBUA3oExi@}ssF=)OwL$st&_RGV^eqo zc~nF1oD7ioAj=P1XKMT<@|aYXZ{nmC68-_ZjRfzRVT&o!DgV;ITJItu1_Tw zL=^pjbs}Vf@v7KTn}9Q!+_5q6$2e2C`Tyrn;Q!6}jX^I7h6ORHng0Rit;?i@JpO0; z*~cJ_1PoAZTe{(V*vIrc6h9H~Fin~_NlJXX+K%RLXcJ~!E%%G8q7d*L^GnILnZY%XSzYqe}z&_sh< zKA`VWLIh?HkjgaPgZN-uQM?9oT+Qd4G~U_sEI8vVfth|mXYwso_d=Q6d-+=$loAKHKP7Hgh`giRAgDxSQ<2 z(Vz*EcZYK`k*0f(4!b_4y1vA{vD6Dc?pxrEJsZAAbbsiP%dhbVLhX^ zj0q(pMd-(dJjr({e|#_lv;?p7l*jx)Ezn+-gTL!ybE;v3OtKpUhu)@f)7%UH`S ztW9yY?6{Par+XZjfb2mLocs@*CA7cjvV$UDDRf}j+XiC%w6AKrotI2bl6Zyvj#XeS zqoO*TV+n%&a}0ihh~1*DD6imV_iW6M5BJ&BI%;y<@4Y(Ke!noqNXG4df{9EdW4ofZ ziyu2J_-_g2g{L~P;jadWf~;NIQm+|2^ebl6yTL~`c2=0v;uw>7D*5$Mg;(*ntTz#> z`>MKn_8a?P+PNr9?NDt`Jcb>X-qHDsf{Bnj*YB-!>$>)i^5OB>A%0QDp`#iOFHT|R z5)M9l4Q!zJAiNPRCfbDp@_6fHl;ew!*Ndjtr9P0$(Cm90%elqOSN(XxRk?o&)bpta zB|*eR8lG_q^^UNuBu3bZTN;EdG7&I`-~Fbe*-+#0W@^ua^-Ra}rv`Jp?=2lw9BUDV&FW-HKqJ&Fxk9YyW*dbwTJ4ouP+1tm!N>QbDp-$u7Y1eV=&} zB4jcgo`SjFGupNpxs$=r$9i^9XV*PJ^nn2vJDgN`Pr+xEFR2X@%jGu|(=N4V=AJ$i zWr660-=-EuHEB=YWf^E?2=w^`m8o~=9P{zbyx=u&i3pQja2_3Jp~XC)IBJH*uQZtE@J&m-W2<;7q%p@P?eRLB?<0_3ykw8QU)S?; zap)FLXrRdB%7*9rgf5@F)g|aF9|%^GFWj^`_CH#=uBfKAbsdTnY0@Q>NJmkS-V;RW zO>u*Cklqm#2ohK*U8;m4QL0qw(iAWV1W^z{dM}|!m4r=+0`V;O-sg-n?$f3isxT;|w)o^W$+;E;0lgR)sJz^S^NghH`zur$+s}J5xFmEEky)~x)K&C)T@ zgR=H9RvyJq$_!I{PWXoJjY64+`onS+un9oT^nyOrh3Ru$8+iGAS_ThY=I=vCn@Vcv zvL5WRb2`FjcU8+8=1Q$y55W)8TbK1mymJ}7?93;+S7VjtX#1$CBeZ^ zwQL($E?3B``_oS+AFpQER(N)AO#(vW&X8XJnBg2ppf;#Sja4@V30o*KqPxeQbIftH zG1r7uqk5Dn<0;ET+O?0DG%tX5Z&poc#(yvzZ6?a)7bHEAAWrT(^Nn28dH{J?{ZEge zMtrVCFzsF)Bde@qsfQH+>4c;4!#utr>F-N&CKK6giDAmD&&AH8HoEm!V|b|=+^L6X zg9szqO25<(p8mgZWRM!K85M!qDWri-~I zZVGpDy;Q7rcRe#(Od*&eb8q`2zj>xj;R+-WAs_|6h*D|#+PJ`q(oDox_v9zF3s-tV z4Q8w?y-S0EcbNj>dKSuO8dITkC9Y@mo~bF@4pGY>yKOTSK77s2XlSEB#kiV7KA+aegJCy$LZn{rB}rayj)@4! z*3rCZ?65+z*qg@>Qcd&BZ$$~}2T?PIO7idsLLmHrXdg;@1gnfXOGNZ6KuE&ft%36k z>p!)3V}e4PcykVTtEB)b-j@m`6)AH?PlCJXfrn}6M#dr9iNBgYus+g07D6zPWxZhx zMee3dq^|Mh9xPvclCC?l;nHUF)5AYJ#amjnH?8sCsJs{<4^RJ7Un$n|8>Ixw?iZ6- z3|gB~zRsNlEVemG`3k)?MkCS22_N;%GnDT*rO=ROB@6GEJ5p>s0LO>FZ7v24(f$V% zApNLbh96Eg#AkO7Q2P0HED3guttOpLL1zqFEgv1sh^#L@{J`P`Iq>>>;-@Sf-@!uW zLV1OOYOD_-*ms2L9c&vXHm7#ZbK<9OcTWJLTy_ISWtoL{_SH(v-Hw`hh%lk!Jt8{q z+)%sr(wumgq(*IGfwymjvB}{wVAeV&DAQuiY~Gn%%^5Ci`xPeh=i)F_{rjh~XDrf?kz>^G6jYr~z5 zTVZxu$G4iteT&D`uLT8!--lxu>oT*}f-!p4V>!__NqSYda_S*!Of9f|Oc2JqMtN_Q z33rSB9C})>M(Q|rwEqZ^v1K!cXfH`mH7v0_Zr3j( zbr=o@_0QSpS#g`ZF_}=m{b~zwS5`22ni~5yN6;y{(1oHoUG6b>Z^Zm_;Ejo$BSu8` zE*P5*zk3t0%aZHHWow23)kE5(MOa4Ljc;`_6@wfUTk;UJql>^lhC#`O8}~0!gK~pU zWL4#%ppICFb1QT1h$zD=}#oL7Q)J@*MXkjJ!1th5jzP`>yX$zu68;-Oj@j3+A* z5sxF3PD}-u-ZhULq_VNZ3Gg(J);O~3mW8(--+EMIVzWo1`cTVGda;;4)M1-a(- zyx(6VcjCx>09XOdR1Ppw8(#WJ1OrN&Ae~CSh@2k5abesND&go_P>DedBtQF-7Z(fz4?D?zsqfc%DBt5tk=o~h#XFRmd{lyDOtbM#^HvFZ}#%1s*#CDSl$3D@2 zPn8X#;e;-{3;PFBUIHjUzIuo17=h9v)q?B~%#QSs!O+mod1*eObORbQyLA|&W;E3Mu?JNn;jrQO4k>+jr292+qVQFn;uO8qKND7Mc67UC z)EPTQ{vzC!J zg6vzcaRx&D#V1#>t&eteKM5|pWBqa)6hbwxMP5|uG)=NUPb>y&2MVz`85I6oY{sbo zj=ZMV&Ajeu{%lUH@T|9%uQRah<=gCkXXcbcx0pFA3)#89#L|<>vw>j83W0*yvbZL- z4uQ>SaH7^l^~!5~Bw5(lE~aN&N-*xGTe?2r>>8Nx`l@Z7CJie|@>p}zfi^-6`H+Pf zna(72bc+C{u0)|=U^+l|<)+1lF0$nFvE%!OLP7@hI`uk7%$zSpUdQ%8!qFvjb`G9d z*)6Ru(?LBr)_#Q-yS$%#whJ0fzBUNDid0w6xmG1^mLirg>+LAO628>iFp>N6K-FBPUpn(IUa zej@oGnH$7K(GLb`f%SPo2c{Y1bDU}>=#lUoe>$i>G%Z>jyw~N2`quO&O(m03#5?Wt z)h?d^;Y5Kw6Lb#&ygWn;e7sSk}Yn?BHJAKT*x*U(ROz(4^udpAEp}MBw%SF)?TI|Y?r|cE=*|iwrEc4tU66+k+mEx`RXD84;0z!|By121EG!# ztBpQL>^KMdHYfb|bN2glwZ8D?_y3_5L58pCSxb>OKM2mvAoG5Osu!hk%s(gf!mlYm1z<;M zgahgm6RL z$sr0bJMR&wZqt=M)&!P9r9lP`92fIYE|60k4kN)6u;)Nl(mZER{t(PfPLbf~binygEybzG!JaI4|N{ORcuHvM*}9+F!OH64>~Ze$2vASiJv^d0L$a z)s6$G2F8c)CZSZZ?8J-_-DP9S;b}wC_o|f!47tx$Y{cxG0hb<*nAGl5Sd`-y6X`Jk zdHDCW%!`LbbpT8#E9rz%qIitsDDMZZU<>F+jFrJ`x7(gZ4R^4x$Gj(x{neWDmF|~{ zAud+rFZndvC!&=kN%FZrgUY-oWb8N19K4i5LR#*~`0TlbeJb(JmlcMdU-&#H@U^Z+ zg7VpEHkqy_!LzV?$#C*aN`+Jtb0JMzdC#lxR?8ER3j(f*bNdaRXKmgvSuwwzRsdm$ z;5&U0!ALUj5GCwJMc}UZ)z&&DDDXCL$l1RaY;{R4k-$1he}+Kn-H*3FYbsWe*-&bA zO@!Rcj@ad6#H0O*VR?z-H(58D8p0m5cs!GPDj}ou%BXFS=Zn4wRX$;1yPu%e_0%i) zr)lI8QUUP2WJiOP-u3e@N;N{Cx@OP^2v;{XIJ%wJ5}fXk?z>7M@cdZ%rCC|h4xWClnCJ)mK*^8^< z8U0sq{bSayNg#tn6RBbHUxm~NK>8;fyz@VdB#MH%S!qfyCDo9~#)SF74+5f%8NE`4 zvM=Z((UvKz7`3`OQQVQk{aq^XInSw#EF< zKJNrkj7G`*=LIuEvaeIqBUN^p$^8*m2x4eo+U|}d!$L3ghLWm4MaXHSc!IA8_Dx&N z1CsIkW7cNxw)jQnks_VGs?cIXCo7uo)l zz3}}e?K9M+(`TsADilA#KQ`hrX}wn*V6AnTjepkubi!kSkT+6x+3*D ztU|n&yRwSVNO%UJCZ}Q@3mx7McfZx#fvs7Y|i>e=iW|%mgePq`c7I!l4_aSZV*23 zCajIcUZ@|FinxTj^R{Uee!=@@D0A~?`@w6kMPxU|wIkIiApJ5FWodqb=FZ6wjo*9p zzw0)*W2MiK2Nr168J2g#@HJ>)ktEgbjyL?VyBoBlKmGl{+Csc9R-b-P$Xt+1`7rra z!8e`)h7Jxgx9PHqJgA#T>RJg15GUd+t`}_y$qD%cmKqwCcM60TmvEy(sX6ax)jWAm z35(0;#uK?Jz;=N?@Pf9>Wb{%h zyK~$!f)`}E)*c}ePP}WUJ)tg)FthVF51rGtbgE*uWn+mx8La1-_tT z1UKr|(06$j>J25<`E>!-R>X$V5nU&~W9;U=lJk<9!v^g&>Eoz*m9xy(92n zEW;HCia%?6rBhTnuyWap50MQw3Ax3$bfur5uJLDn#HfXyQ=PI@)w!=;{=R*AHHM|n zs5sP-pF0+dAP?9x4_#KjvIMZ%c!CK`lrvV-HXBNbd@|+lZ>rZ$RH%rfHK7h@%BTnE z-N$ma;0S2Plwfydb|d1#s6yhf;*95XL~4lW*h+77tf)++-DHeynyii<&jrfe?%Fua z&S1O)ez(WvvBUX!&GADh$}F@Q3#|{NZ*1{PmAq?o1fdd!RK|bhiqug3ooxxI)C!!a z5JGSR=RNYS4s(5%cwy`vz?XVvE@}Dupl<+=m19P5x}t@^8?i`b*VN^XuI|^fK-VN; z)FH?vivh-FT|hq9>soRMS*&jXEYe;K(R9QD9;Pt@B4dJt*?AE-jL0#?P(VtVHWt6- zMoX=GrthT@MeiWjz6!MrH5)*&zW<+b6o?}hk|RV;;3R(+Ok8TaXAvoaKINlHVHn>P zxo;H07HY*#_bAIn(2){UvHpn>2;!kH_Uws5J6A2(S9BRKGlcF}tI!@5HU@FG*nK>m zm8rW>HK|g-WJ-6!Q8$9R&Yg+^@VpCrq%ICg|BB69#lg^@_cScJ1`t6WbjngnM%@&v z4D1Y(Ie(lH2*lz z0KI9`#m0%N$vgw9wNzK6v(D}*>}gF{TE$O}w?j(sw#!?do}Hyu*j zk~EuE$6hZE&WV(Qk>`EBn}M+Zy))^-1yHw3FTJDY+_TUHi^BlvEX#GovDrY_v8&WG z4q%~M^Y;V4AibkMTu$?@#p(lYWIf+B)&^2o&G7Ux=&J_Af_1B9cG3)kBWcy0Ks;*>!?@uj&w(yE1baPaIhB z?Veh%w_H~pUkT~%q-b*}fAeu?5VW}>B^Nuo2II18E0Lz1*5S?7S7#J!mA&5?1SC1x zeCnUoA3Z(NFs=9n zhrUA!HdjKlPV?c=jp?l10RFk$)w1Yq!MthYq2?ToY0nX{)g}J) z`hN}pBC2Zd64eTFSXquIkn2$VUIcO$1YVd9>plQjx2^Lw+q@f6{1gUh3(kAEq2`9l zJCE5(;ks>jUq)%XA*1#G<$+RTbG-0i&&=rMtVJc&9`Xr!Ufksy;^#%)q0NO9uNe(U1fosQuP=;Ukt*`6Wr9Zu4J z{{cVaz!eF0>Ur|p=_s7Tjq%wl57VU+jQDnH3mFqgEOu)No(l>^38!p<1YosQ3_IPm zBtfCWi9z-E@KE&8kN~CN0d6Cbk52F43NlNtZ(2M+DgqsPc;;LWVm$z`LKrfg1KyKA zC+HiH_E-Wj?A|+bnUA%DwNwn6znwAwTPoQ5^P`I5*6%D@0OH3ema#2jMXk-Gb2 zUzuHx8os?}H<&)9Gjgw1Pd@ylbOJD0z+NHKEiCZn<;K0^+4ny$K|$+!N*7e*ru%Zu zv6b9+-!Tqe>YBZ1G8A!_8DL7FUV4g^>U1O|QmyarC`lRDXYA~(z7NqawKmuD1q~fu zA&93?6NoayLqG^qvc4b*8HRled6ay+$d1SmcZXK)_>^hQPJ>!*osLq4pd-9`1`b@F? zlHW!#NLWVyCNWbWE5rf7zD1lq08$BY2LQkO0q|zeCh-6`J6H#Romvoo+Z5DHDmwrc ztti6a7wu=^NXWe^_6gOTav6Toj$%r4j0PmgLl~sVA_d)TeEymL$1dku}R z$}vx7Xmto=INE#o0f9>&U5SmD?iCQ0hpxPuuJ~0JS0Q08zZpApckQ^`^l4%gx(f+4#gk7y6}8$d5rTyV?QF+VTysP+uk~6P$+=02 zmULFd-P>x6!w?0#Jt9;N&%T=$5RS6`W7~SH`ZS`s(IsB~)0v$Wya?C8qebCx1Kn^r zAM160&uNNht+a0B{JDk}|Z zWaU?0!=|2+wXf#r$3K*Z&JG9-TzQ%<EyP32wTaP2JYv?!QsU&Zhn zM5oTs21J7_E1ciHNzez5dgchRCKt12t|S1lJI~2VRmrXV=Z=yC9Cxapd*5@O%4xI| zv;(y&HQ8LH({ROUah>1Xdf#(mhk#RiW}4e^I2n~FMVux{%w=rs05~&=GxE4ra$%$N zOWPda>~!{Ma&7;oj;&)gs_D=E1Q9fi0IYUY7#kGYP2hH|v5au6&+c>@PRLl#+PzRk z{IgY_90BQf)VwVkt-{~*<9peC^kwW!GkBQ@FrMvg=Lqe@uqa(6w{_mXr{rV@IjuUY zIDG%P7q49T*=ayxb3QcURB&M82d4nst0dhXtq58AZErvpC_o;e#5}0PNlkCJD&Ta- zZUu@7T0dNXIy-ja-+bc)d@uB<9snxd1tHZ)2s*p^`(^Vx8K|EkEOY?4w+Z0PjUg|R z{d4Vn;Wq@M$(u#&3^iRBSqjlhc5ag5NheGgteEZ-kxY=!I~TDUa!FKO;(hJ&uaZpo zPE@LeHj7to-W}Y^a+l69J07luA2}?f{sEnYAz$ELt(yOR8jAIO#sqngRAu6+7wA%t z`Bg09Hf&uE0PpnGyJqZX8joQU!qFyp#APA;U;+>RmHQYc7m~w<@P5Le*Aahs-oMrdD%!rt6}SZ2UU|pU&GpQ>|iSTP-%Fju;k28WSOR*`&R=i(TP{{XL%aNl+t2raO#jf+|QHW`NLU6voc?;xg$oUjHBpdrAKGFI% zr;uo*^00PJzgo5>xv0x~XYH(qo0XXx1Hu>WB@x*;=GD-!B;D$>3ng7!Mz8MfNL|fsb!{XwcL`cMe4R_) zs!xs@GZS{UyxZ{dReP%tBI_DY<#OIFV?#Y;3|z~9R>eEEC2J&Wn@wykx94tn`kM#4 z>8mcDh>*k01+jMhS2?rIG! zIo_I2+Pw0U4>5TF)V8NsAap`XwzT)u!iJidHs>iA=aDv#;h3hN;u5Tcb8~G=9r;O= z`n;uB)R1U(Up1qNEh3VRS(4jO!U{ipNBlnpw^m0enjVgHOOQ!~VnPZ_Cbyl5*!lM1 zv(4l$rw1ip?j6hGSgXLj%Z%W{iqd`9ic02R?6Wx@u8tC<>P`iO&JV}6ac?Wtqs2E~ z#Z|6F9-lr}$MHsELusL$`L;tJheqTe4i^_Lp7`I2{=aKKf+XD22=1NnnWoW`?}I;g z2%H*;u{9LbTc0DzzsjQY+{Vg_T+W<3<&+nVdesvgl6;t{_Ul|=OnS-OS6!Qj0ztM` z#c%8DTdwOojpvg;D!(HCG8_;EfC9Gvv=_jNDHH2YyFegG?Bfpr#slDMZ&v_fz3yKh z0Xw{9Ms&i6U`;w`ps*$1mt0gWIx%3Qf{`O==v&>QlT`!HmEz<<&r_;RMhYI1G0CO_ z05el6HWJ3aTvV}2>Zvz=(Q|&0t;%0pSsE!C!TCZH#)U=LJd<&i$RR|6r>p7}wU#tF z?BD~BKga$U0~-2gDTS*tU2MK!4~7wUD&)?}AF9(e;+85H;A5bB-}>i<4~zT6cEFcy zRk5zM^)pdDYlhXc@B76!KhpCIcZW$*z~nG??E6(xRb1e}GN&`tF}I5#?)e^H^SZgB z?CYoG-slmt(pw{R-i>P?inzsLD1hR}x$hSTmbP%Og4|9%mW9~8vrQAX3MZLjw(2&- z<^r;_zI%ambCm6GbTvo$UYiLO2Bk-xWgS6~mtFRs>pS?9L(L~a&(Mbw{H|C%%{%wH(c)z0&02&-WuP=q$c73xYFOp=48ng!CZQfoARKNL!FKlEnZ>N{mbCli z_qtxikj{b;Kz^?qiau6xMUk-kpLaBkXXbKaO zmRC}Mw85)HiQ`vXE~tnSQo?ws@=(E)1x1CJ9j)&;=b7dp7JRm$O{>GAnc8+z#CO?)ffdv5F}WV{jw$cHB zLx2lh0N^@6uh4~3BsWIgJOJwS?MTiFXX`)z-c9}(G)kFXOM74_9tL`2!tYiKTXy*v z+&R^o_tfc%cQx-9@v)P9gr^0_9JIG3B-{FQ4@Esoj;R8PRh)}h!n@WZby+Y*I6u*H z^2L|zhTpf}vt4m=UNlVqAbfJdeAN4qGq`dY19dKX!Uy00wUL_!~Xp&uQF}W%wK*Sew zKFbtb`|jSSIv~QBV!vpL{b&o6J!LORI}6;Ik~s7e9V09erzsIBE-?O0->J;8_oi~o zrHbXH{iob1eI?D07p^c{=_g5HvSkiCcZV}&Q>Dm_o3W)-X)=0q*p-5Lm#@@DPP)_Y zEAW8SS1=)X`E1&Pvy;flDH4-M*2F^74FjA{~Vgo7Jo8@Pd!d0|=Z z<=_5dvAAQ&l7Mn}&hsbxxO49yZ(W4tD#rsu{ucE;Eh;}btQj5GqPOy}z9oyT%>Lo> zqPd?NfB#U2!+cAOg6fgI4u(_>M9IoB9Zo()h1JJ}xv}<5J6|sc|mjn-U3*uc^5Cx6Z! ztq#FWJ;2Af#C={yly9xyBt-gS3Ie*V(#!u%-ptRVq5so^#SJm?+!QY zBSrssxBGukHx!%1b6M-+6%;)2D|8It$pb(X+URUUY6wu6;=UcTpYz;O@~t3wRr*yH z=_Jf89s5(9In4Uf!0a7EI1(Lc1Wi-y+2Ahf)7Zw=Vj#aJhBVGv^1bDT{;_w@*0f?+ zVtwmwCz?XO2XG8?(1t-qoa(*6M9bJ+H(Jfi56ucUr^}Paq@M517fNsS%YcW-EII{9 zBb&=saZhcG0eXNH_+cqR`+#O@Jw`PhYn8*DAt#}ZP5xBiuO3^O6suHJe3$vrD`PVC zp}V`=U{Z0lD7I(GP#*HKfwLh_Hi6%c_&d@NxQf^>B%x&KCrkf!+kcuaGFB--f^Fac zVCV+9*0ltx*PCR(=kV4{u7Muxhp*zz(}V?c?yFDT z()%UjNPZ{iYsiFd%pB%?gT$x;+dJ{k^%boua7{MH8o>yU>TV<20*yV0<_DLGfvQ%iOtLJpp{Qs16Jryg&s1(RVf-E6=1~>7bb=C!{oLik=~Q9srR3h! z{cF&+amyuMazpMMA3GEnyA?rO?h*zhee zr`3#2x@ATeDTU)+Gx%h>psd4@r$&A{t9RZzPB==(V#dHNeaHAIH@BgU)d_$ezVs6RuK$@>{2NO#yh}JIKAi86 z0OAv-K9!o=+he&WLpyQ+j0gOLVHQ5oOBDa%hS!bnRY}WVU34LpqClrq(V;uC2f%d> zHk*eMdr)=Q{|01yh*g?2dATQI9P3MIN&w8|7`0YD3ImO`l8Q}HHU04j(+EDnOYH9lR{ZXJ2hLQ*{UWMr zp;OrR>31lJx6^@)Vp4D0-0G_>U&JV#PoLn3g-YqCJn>KIRPp@@{BSND1DTje+bPS; zROCxB`Y!R4uW0l1w@1tAm)CC^oPAQ$f0D`*uFBM7Delwu{RnM1OuZ$(_0vw9=JVK_ z%l2#ORf3cTm|6v#m+Mj0eX&GLG<^X3Zc6K=jR)5c(~iOvlOYNXPpo~GJJB`u!R?7aWB&w#g8YZwzG!zNrhPKG5 zy}|_}`smd>4f{RTUW?1n^O6x*%kLC^n+fU0wGmQ$tt74{CmbhmU z*%)uMZ~OqA3po*2J_DREzvn?Iar)d%sB_**Xk3o%Knd537oz5?!saE#iJhSd@STPE zM@DDd+yRi(dVE7UjPe?KP~I*=!1`8RPbAN#dc%jfZAVzpnmcx*d-rkJ$cuRgfYECW zm~1?O6UIP0qk;44v*lE~GAFftb~||!B524AdQ@n2ekPZ6rzSM97k+z9iJp2XNJG37 zDMBaY0+!I}1HkhM@(KNFSY%aK+xLNP6pw9A?|$&B{8_HOKkNpjh&FOPN)9sl2t6U2 zTj$T}_+r}aF!Mn(v>%Q&-S>)8i771MCq+^j!UMw&LG|Qi(xruYNTGX2H~ABo#`weX z*wfa!VN2AMRjtv){hyA&#(%mseG;Xr2NT`K`AzB-B{n6XjN|kz$?-Ewf&1+}clgUg zpFdo!Iii7h-;EI66x=YksJ^}TM6^zk;&d!6eb&gUN*aJm}#{nbi% z00G&`{Y9Li?JY&KdmrY0<64}`dZZ?(CS|()A-BUno_TP(*ALJy=Xz<3)KN6&;#~U| zZusZU69lQZMq6Z7&8YNDk+^FlF~@Q9x+&#sqp{YBtexL#3*CsIg%b|t-*kRj-UEi= zn+iwhWjj(8jThXnfioAg{OfF(JX}@aPBt$Kz~}#e9q~VV0Qk?R_@6z;|Lqj&RHMBM zUzT`}k7%O#pj4ttHyekHac!PeTXSCLBlA#fT1$rZwyt~`@1`vMY$^V!s3=@n#X%;$ z)ny6j5|I$240B#e=N-Iux?M$Fz1&~te3)5@kv_DRGZSq(BH%6wSO6hi5=g2uW%&IR zO>_Q=XDd&Rwj&jla`kM7C_mml6S=)`01Q8qIHDe_p2iv;D)!{b^Cw=1VhBDQ^%ZqK z>?ck?kV?=zY-1xT9}WPwfdZCN5GFSjpKZ40iG&1te%rh; zUh+e~c}DqtZ>!kI`6L&e_Zl8j3_QTHAZ>h^UjGRByI+C@MW(ZMvLS&b2a8wDDLp1%hcwKgFysmf3H+u2o zUG^Sw2L_kWnRc1Wpo~_TuS#bDKS2cc0O%m_>Z|&awi(y2|F(RiB(x}URN8?UiJ>C;hYk;7Lvil1z6SC(a;$*hs6f>32ds_9NbuJcSn?D8)oc zg6h%Nd1Now#N^wW;w1$b!VUHPyTatiggAXuoZg;A*qAsgIiRii8u_y+1*dy6q2>^F ziuR>nY;l}l>SNL=dWCBl)-jS2ry%o=B_8svNS^{o%5dzbuQ)&B(zHcG3Ns+JEnmPk z%w+I|{`%j-LoafQLWrbZw8p8D-im;BYW7avaf&LW*NbkquQ)9vC;9Wm$elL(?KFVn zty`gY|5vf%H0fPtX%(khq%1%KomPvBvo~het;~*9KE2+H)~sJp#J^j}cMa4)e4?pH z)NQv{9IMo4QX+IeVH6Z+E?mDt@pM@5q*VyP!Z-~2GnE-||4CjBv6 zw&0CShJA*mOfK-X6z_?W#pfvVz*_GiJeQ;e14~tp(dgR?^s{5Cqg;)XWP)~!2sfmO`Eg>?|Etkr^>@pTtJUM4U-}$?#rQz9q zP|ft2N5Yt#H0#)jD_h+_NMlzXC`vWUBboFhi>xeF+$l06E;4mEUPvVw(788HT(hvL z^!r6$oMraWUMj~?R#f5rh)&O8MqpKjyqx?wbybrZCICg2_f~eVID1$&HIa?JR+XMD z@gAIxU^BgHYnO5&%1C~{c;Xgof87W@NYk{(ZgdN0r?}TZex!LJ%cP$MVRM~l;EXD< zAKyCu>Y=x`?!)Tlx&c1rL*~?oFC`Lcv=d~yzv=}uTjY1H?T@oC>0R)o-~2bV&I(Y#cwJF=^wPHo*)zmy z%@Ke8$x>Y%5}0JG*avet3NI!6m7#+Af2h){8z#m2OY0dL#Jt}bzE~S+Jzhy{s&9&S zkHB8hzuWVUKkyn{hqz4OI+XuUR))j+EbnFV{?wBd3}R+(y;YaO(>Lc^+fLnmxs%=&)cAM6(HchDT>#>i^tCKVRwkYq1O zo6ud?nT{5cPD5%!K2JS!e{XT@otoLm)l4b$)trt!njz90kR!&KE`GU;j@h8S04WBA ztorePC2;6VkvE9LsVycYjvPKcKF{W`uAMu(4OO89GaG77U$;e;6daxAj8jG3K7DF` zU>!T?3DRNcvD-Z;bqb-qDjB6s+3!gX^&^+`H0GB3dAa9?Z7A%kopgI*E0b<(nPGP3 z(p3@9uCsi7$AC=~f@%vjmQvlw@YsOEkW%@uk*6t`h+jx9li(_>`+#xMlH-E?@hz0j z>4n4~^QrfXFYfHM=!vZWK`v?(rf!T7&K;!Kg|{Fn$B_*7kYK0qtvWWZtZ5IMmbV#X zl{xnHbB@Lh_Sd*mjMcQa1w9Lpk5NLni*Q3zc>wpRmP3{HU#EkXIm3|03DUbadF9Ps zY&2@wojz?hn4ukQqm}&!<(^bneI4-yo;K6>&e=naz4`oP+iKEVGnC+r1L8$D=s}Dt z|7G5PvwL;uW&N-2+HT)?0rbx^sovdEAxX>f;eoucAz;<6i{nyn?ovDjeG0ncM_v+> zMCFyXd-*?iw$#e;E4|YC6x;s9H@u0fJZoEAKl}c08o(q%d}~KSMz{E`sxV{iNVkf8 z11Tb_H|JO;{mO41cRqf&ML#zRtZ{ld|E%SEag$TklQl`^5n5UA1fv-Uoj@zu$|?;AwMhv_-rf-&}2O& zxHZwUYv{w~cSwtv?p-!0_;S|!>0wLu1*g2k+4I+r@IHb?BTf6rOdYSQ;IOn6(?kH` z)6z-Ex59IG6Xx>5IJ$j2NIk59zVc@Jc@n%Gh2l0nIw>DzyX@>0TNJjFG6AvRaQ%=1 zswug&m@=jw!)?ubk`fC`oZc=VQzQUWj@f+hrxuB_Jjrc>G*>YSt zrpnh=?%qCGfGmJ#19m?qY+Gk5n%tVVie~YaxHbAKF|euT&36|St>a(jrET1Hxj2BE zJWQ7iS)|iDMa&gm9;F)`ybXw%X&^{6k^PrcLDESLQ0WY$(9?v8#@XCwd{O{aS&v^& zG@p3>c&lMHK9{7ClTdD{s9Kh7$Z2nNXee~&;fTgQ=rq!zK;GzW)wX2+;HwGKh#X$3 z3yw2aUdCYA%oKg~ON-4-Zq?Le0pYU%y`L$xP2THr91j0k#o=;geFSGbuF2tac0#F8 z!u`9TVi%*e@{R-d0WijOXvWm-_Pl93@Kg8SJdG$QZnk&&2*su(8pU45%hY9X(`&b> z>e93M&SR6`2|LG3FHSa5g6B(tU&S_OnG^UgUjm zsv9ko;$M@|ksWc@6HFD?n*8MySzn|u(R;eSE|p|YLyW_%OLVx4An|3Ht%d&u_D7ch zINc~5_OM!GboQD4Q0wI+U(WX>Yj+;MT1B4&erz*inO1^R!la zpe1c;QqhHi8ufCk&)ecT`-G&gAU4y>7_KuMp3A>e_#6WZA=P z@u|}Sgx{uPIzunP(jA~%x4s(q@U=}yMqR(16-*hh(M^pGdk^8ccG_~b%_{G}&$21OmucCE7h5hWYI`P`**p@GEIinCM4_rWEF2_?<@rphJ3hXr@~cti$nXFTw_&24jjB;@W1n}s2moW0Mq zg6U{upYyc6rybprN&Z%^9k=NgI-ldA`?%qUlUbntFnAKC-p-)Ir4jc-Abuu7~`r;D(YNrqEHLP(ZR&NuUC<-TtOG|ni~B!!;1Q z(|n7-LeuifS9)a=*-7B7`n~&|hVTu!G+DGTYd;W4WWO4GxIj8s3(rlZcr* z0Fb^g$Q|&No`FJpn_%|DnxrRowOFAo=|1quh+*LAYg=wIRTV zHb`Twr$qO5cjM{?_)3joJ&`~ zN^jQ@rcVAGaT6z49QFYez?tqo>8UQ7U8M3-M9oWEGa0h2ix&WM=9cm9XF`w9jU6^g zJHf1w#<}+nBaS*2s|!0OsojofsiG;T?dQ`S)~{yo~R6##B|Me3PSYNrRB*i&I@@a(Kz zE)iGor|s4rW=E-KdyAk>b3r$tdO!VX+22(H?($^y`#Em96I~i3>qm_%1dEcqw}7zT7$@M<{>Gfm2620w^$2+%ama1QNsMv3GE&b@?3@- z%HVOJ5&mx8pd09Aw1cia04B8e^!9_$cT&-Ve|{LUlgF^EojdtldA>}&!aHrNZ{4S8 zLKnaFcF&!y+qhd}a-T)bD73IwVgM9B0OVEV3f$ON(~zP2v2`#v`xRaLEw31BVmYRG zqh?Z5@4m~tWsHua$j|0<2bnWKd50?Vk(jNgde_4ZzW-}b|7!ti!jsCuy2ezP6||j& zE{6IO{n$az1K`%eLTAJ0;mJFn{;0NsfuBAHp<{&#IIFhM_2_*B*vb#C3|qq>l@GXI zvQu&Z6lix;QH9pg)a&6Wqy44>z->4IE%a+S7R`&hUj)gIY7n@-q7F~GjnuOh`umr7 z>qcqyB^{4H#iH2{$cbJtHNC0>>~sBh5%!-DLgt3);m#xyReH4`yz-;nbn@>G32xA( z*FA%IXt3OQn1>w`X-Y!*>3&w^q~@Wwrck-yeoEGs4Zf^$2Z*z=1nLvM?eE!p4h_4Z0pX zCUHaE-7#6hYySLQm*&kB@hxe|4>|RpHsB-HE1`A*GZ|vR7u6*d0eD|jPm<-Bf0hhHVj>uB+h{hTiGNu8d~|Y+fhfM zJdz6gKED4oZBjg}_C?KNJat$ap~ipW&zXmamUkCM5Yv#tPBc_iHjmQ>;iEYOJ38TX z?H0Qh@K(9wgI@IH&%SRw9@Ve&J<>YSkB(SN`-r-$%XXU8eWXtwY4buR5x8(Fs|FIb zQ#fuSnj6XFz59qv%?&DctAg&XacLFy>c80eIfK!*U6VH9e#KcXOaEp2LAJPb3#7S{ zxGcDFVtwxIhe$Vr$xYD8j5#Nt|LS0kxq{%_WpzQlN9x!7>wv#jHJH4x-n)$O-9uxC zyX1q0h(7!ji`tax;pgYFM|E}$Y6u0`b_tiGnq|4%xc`jQj5{vGd&X-EG9+!H}D5SjUF$ykR;#hgh);~EHP>IJ0|W|vt*&6 znscJ{M!J0Esq2NgFu&**C&6mht6Lw_Rzf{ZiH!O-a7xJ)aw|} zct%ZyqtD(Gw5G2-4i9kQAg4sGNU(gNv5@PtXEl;p-;1H%b9{Pl#B<2Xp~imYZdb+y zGwEEQn7rLPhwd6?C-8e}O{Y0>m|q%eaDJwk@0PvfO%qb?x1OPv(BnoopSerK=vJsMg z4AYYRphqK0zK(+Fald#KimiBJWa_>i;DmLpuf9g>dcn4+;-b740Ejoj*EuA1OV#y$ zAQ|2Dd)KXYVet$1JT{9B+v5_VG4QBrN)wKxgRb6EIRRCs+~y=1(VgMVK<8W8eb6Ob7WmkNG{L2qyT|sc>ShtUEGASE&@??0%Wmgy)}ViUvFhOuQ{Ajd9&pRL&zt1MpAZd%&cCJ zRd5QM-z6n1Vb1k-*1)T6t71!gIq92U%q*GJz0^lZnr9@jRW|MbZ?5co z6XQo!^$L{~3v$GGDsbLMw1`sDr@}Q`j=iW#)GVbQli7-=1b5Qp(#?^aQ0y6UPu>e1 z32Zmtob2A|q@;e9^pg_W8i+3#ko=r>^Z*#0q1qe(?S9eV9}7b!svIFssjIEtquW8S zX+O4;W%qdsa>G#IC;EXSM@q9R$&U})3Day7b^ZE#GSk2OkG*YKEOycnP{5o3*&ZdT z=$ZESF%|z)jtm=6DjwWfib~8c6?O+K=E7rVj7ey)G)2mSFT~R|b&#heh2O?1 z!;{W0F{o9bhCup7NcyJy8T#Q@I)9ejaA^%rKr^11IgXQcE#Lrekc_rCW}x}Qk5)%u zU4J;rmDCtVSf$(WqvV81&8M=1&$B*grfybgEpKA3FTnD!`yY|y59q+ux<7=L0;E3q zp#*n}Rt8jdxv74l{MY<;+%8kcTEg+O@CMYp-J~$%BlVjZp*_64qFo#j(($+5@2 z1$%;eSB5zul@*W3rz462ESkQ87bXK0Vy^$dxD0qOXOKLkv(qCa`}O5tCcz?}!MXo( z!HC*&xy(AO+xo*+@gi;8Mr#fAb~UCH`x&mj_gFW{v{NgQ|0n=g2t=m80)WQ>AarE` zMNsR*F;zI93~Z}C_4;c@AFKe(qHGw}Jh`vjx{>F}RENLACjug1)qp3kpQZxk{@4M@ zzC#^eQsQXjBx1LaY)y{B(+fZDO?7czbk9p(2r@L|U)v2Pc z->u9!9}dx?`Co&DDt>DHj}5K-Kknxo0LiZqn*sPcl7b*p@)vD3N+IcIn0mvkb_7ji zz~AH9g4{xG{@bG(g28h6>_OEU4SgG(W(Fe%K-SeX!CvI&q^_Wl3p9U`J1Hz`}TpZ6CERlIQej%F`pdvsdw^T6wvS zn`8tteA{`g@OSxmK?KLAPp}?rKzkbaR}c2p7#}3UdhlOIn|wHnG#QqHMF}LP#MYC4 znq=C|^O6NSE*+n*ic>qy{MsmO^b192e~<8I@6N~Lci|;GE2IfoFkD}bJlrX;#7(~S zNU?73`k2_5M(vgJp9Ani3K>7nZ<_0;n5*-^dn>x~04V#{bfU8zzs&xV5Z|QqHgj_8 zLuT1mZH3Y=_a?JOgNv5sYsx7C3=)|+Vu$(C#YP|49t}Ua4Nz^Tu1+0+JScgV^&|RT zF-{_nQWn*_;vNm|aCQ;Nr$X&eH$%|(RTY4Hng6we^>2a}-f}pWC3`gFbGj6mqN)7t z6~#X46QLF_&S}YyAunH7=>?#UvKEhmYKY#q9c1bCS7vYm5JFPb;;aP#mzY+6A)%NqAm7l1>$@LwS8(0wDzSCi$+ z0Ts6I*#;D5T$a|Buvfd%EqGtpnOsVHe@Zg4oiSX$YNcWkhGwuA5XUg}?VAirX!Zqk zeQnJ)%e)G?pdO-{eam6X&B535SW8_)(8;WDgS-EMxc@g7A%4V5lJEbpfB&EEZwXLJ zzUg5gd-;h|Lh#BF-WkXKhMYXHJN*viZ;Mbqo>?g~@+o`?d+)ok-lO_-H?6_mvbh!q z=`05%1EO#h6Hh(7><&$n8*N9PUEq96{Dr?Vy1f7VM2~Vs@9FbX8^WK(U9aW>+teeg zZHI8D#(L^kFalp_lCM6D-+8{&dp7}GUc4T)1t~;I5 z;#3HQb?Tk%DG3#;Q3nIf)8D-1yYV=Q6<9&`wDCeC`*|xR9g9xR@sRNEH~|Z!iRI42 zA)%(%Vx}kN>wJ?!64N{pQ*HFHqau*8p0e8Qw;CGf<`f5N=oIWp#Fl@&^yb_wqwC%G z_YB4dA`mu4?|@&K|6j)WZ@8-JyRHyesHaBU$-~J3YzD69pJWaSs(xcR`nV`=`QAys z@0CSrX{>3+p7b9WQz^YfRr-$K8MsU2Ivj_(ZA;2>mzGz=9gdBy7$^V%-om{AK2=KV zTQPo)L3>I>FG)>4BX_U0ED2LIT98O=hSYQKGf5r;F*idt`u-5rj`VekjxZJeZ)|@E zFbWRwDw!Jy1>qcSVRdnMemgEfH_P>%_*%Gr!^Tb$`FXv5u3h_dKM^ZG?KaIQCk)A~k8*rm9cB&$Q7G;U~sbjN&qj@P-idHirJhW^XV)w5!p^Rtf? zo6;FPII9gq-_uOnT;0<>et&x$wAfvp=0j}!19{m1AgMxZZ4VyW zMG9d%JLl?`vb_oywPIiHXDTag6~D;>HtIf>xm@x8LW7YNY$9Z$mDzP;sB-8%XLt>@ z09>!f!RVy}cBIz--3MAX2N@<*#T5@BrQr_RNW- z7zUFF=Fb}NAc^;KJb}kL9)TZ6m0-7Wtw1tqi2f3}F^;B$vucA4GAHZy_RLyL9=>kO zWHl2B+b&c%f$9l2$i*APio*tgW8m#DWr+8$HSy^*_0) zp1cZvxtIHQ-Iy(y6h&Bv>8Uug0{|o38!LDK7)~sO^U`3?X6kg?clQ1K@(Ta)J&&Fs zvg^i<=PgypJ)ENrwu-jr{k4h)%ICsX(XV*eDoPj5MHQi}!6Jy%=Pj01y}rEdnJ-*( zo?@N6|n5)Y*B@rTMuftzqC|ZOS?edMU;5@-Kt4K*-E~e!wzP434z0 zquAkh^12RQa}mp4%iq9?%w0=%g)O@7UjlD;86GVk!(e`~Gu_dniPm7VzMlhZWqFS-mC2Y_OW67iTr+P#q9KNboJ z;1Sr2f*OZ47a#vXfs)maldVVARZpbf(!YkAOEO(eqs*o$60e3Z%5r`Fassgq?}<8| z600abrP|3Z&!e+y^)i9+uenUDl6&b5L~zRJ(u zEJ7g;)V~`I_|e0M-7{MCX#E&~>O?7dH}^RK$08V4Plyv&hVOY)%Kg)8UN`U-k)64@ z($>_@zL=BIS^95_bdo-%F$25sG7b9}N(|CACdW8EjM1_Iv z0N+!AzT&+$=Y3lm5>{MCC$xVRu);oru-Fv|R_|7I%CUh{&pmf)oEOxTf4f+PCjK=@MzkP@6xsM zlx){b0CpXB$n9BxqqiWKkBa%W0b|y`M(6**=fMB5lZfn@4u5y129l%9_BSMeeo?~X zrX-jtOBrD#M6`0+V|}`+R+r^?v1LvaLZZvaOmEr7am0>Zknb=@5toUm)|a8iJC~P; zQapN_7XbrSJmQ)#9*5<{---1CZ$QnbF?`S^NAf_Qlr!eSn^(%lr{k~R=$9bN1D`;T z#rrK5WbJqN21g%Nx|4q;1;hLJ9D}#Z{d+85zE3Gx^rV+seP&X+0lV5&Zzt@`4R^8m ztG#-UcYE$F{4hj}q7-%J-U+7qzse{)|1+gR_%4F5COmowf9@p2q7>!HcCkN~VFT=$ zsZZHXYI4mA=Z?(F*WN zkjpw!{@q7@Nq)%)1dKH6NAOtLBcS*Z$`5-eQl+aZYMGPDO_91xc56%A{)PRex2VG?XPvkt_)_zhozq$dYPoWjDz>F_d*Mj0oA62~ip$ z%g>hVV`m~uvP2ogj3PA5C`_7k?w;rQo##5|I_F&HxvuB`xjxt2*L{D#-_Q5`dA;9i z$VPn?*hq{F^fp43Ey10U^AgU9MJysXupSKA`4nYUJ`pnp6kj=mYMgqvBb*bGX=l2R zWbtqTC7mx2Q&^yIYH%I-kYf}Z#MtSYKeW{zuFsIrE^Ee(syb=dr_YcRI;LP7a&`3$ zP1n$yj;43m&ObUooWYvSFE}1V-5`gGT&MdO%{VY4wh`hbxr@A#N5~Er7%OQ{S3~|> zau0RcOtSCWbF=&ydb!egxR@O#fwf>XkzR){e0W5ke;KL7yi*L4n&wmYn*54UuOmV{ z(XKMvUuAGM8IJpl9uyMe4e_pggYi*puUO8^UIu=lDOT?^M#P@^xM{aPzU^Ewrl_|A z7IAt+yzEPL5C>5An3Swibj$M|pb z?#2~WU%D}#oF_46=G$0Vw=!xVexPmo0cmwuw|#%t-ELqv`#%K(H0~>;9~no^iRpl` zPc%TRgMIaTjpmVYK^ zRsDRt7-%D0r&O!Cg}ZL4&*B3@cTzgN=3_wcq(cUUJvL z4URpuT#0`mcm3M1Pj{^nW@H^WjX`CBQ!m7QXtJmS=!`Rw`Y7<*8Rzv+XB-E}v@p4; zah;24ar69nDQyZH9J6=l-)ZXRS`UM*LQ}5*AdU_fM8&g@%+lI5>YMR+8^|tbB^*%2dD==b%nM z&9aR;aDJGrY7rJSFQlhAi}7KFA~}$s!)#KN)uwfoZ;j8{>AP^{M);n8Vutxh%rkpb z=B6kub1UP9+cJjB8qTf!VM>5wTV3YHg_b(+aLsku)DeHE zl$~y4&kJHtO7q$0=G@Gk?LZv(@#*w*3a|{Q?T8)$=1!lOb5dGO*VRZyEy$&4T2fUsOtmeBV8c|jGCIUfZ!uzgpVP<(3v5&t&`B8 z1JDKj@BRbl@}DOPiQ>H=E#qhV$iU96zuJM#8gl>M6<7SMnG(2_o8q`rnN5qh)Idn$ z3#{^ZB;T8|p*vo%`lPj#F!G49v+TvQ_c>y6Vjp?;ncQxUXg#V1ID*?{0%K+`W#ShA zU4!q*fldTl!m-ZJbMVb|a0W{C`kgQH|NI5+-%;dWs3A>cK4TPJZi+Cr{`okF>jkdy zIQrED%NP|Bc*5E7B>)Ul{ICC^f0I4cqyQ9I`V(02MFiwM2tE-j#j^3`J@O62yDu_N z>~8MY+GoPauMdDhC6<3F13Um|OCxMrM1m%$>GN&iJlMv$0AiVasx)EYiCU(XU{O)j zwFk-k`m3inUN^v|OBZm^XAw|vjKx!T;dxuLPD#snfp%b>#!fyh*?nd%L6vRu9nB21 zmo9hyEC8Ec17mu1i+rr^0m_=4?-t)Bu&aODkTkGflFrrCSl^smbWiE_36=QExyFz> zfMc@LedBKA;h*_>Q04>$7u|3LXFlPB-M8>v80~2h-jbyL6i1c1uzcV!wn%_f7L|=| z1>H@%zW;n2W`EBUwV#E78`E(R@}G-~sn4i(_3bKaB>Se=hI71^QhQeW$aZ|BKZej~ zK9XgqK#MU-!jHp0{R$8cPk`U50Wbm9lPCQkm*{YlDne40kM~4W*ef?4C+Vwq&w{2HEOJ(w|#Fc9$ z8>5w;+<1^bVVELqIA$~=G-q-H&_djktnVP;3ac{~(N{NY-racH<`7x|9^+lvyTVB> zm0+{HBQngRWctZL#x!dDm8%E2+bswOe6B~5<9qD0w_rWznsCy`x-B zuIQtg9*w+bh0;$Jg<9QrBTvsnk$_+VPlxP#mIFPuGgTFL$eVaItIQ|oD%$yjrc6g8 zrYHmIdNEj9?t`@-ne_BTaFz$~+ZvcO=(_8YpT3Rw;8^JR81(e_)OPm0`D3|IW zzFTZ~QEf>aB$IWP&b^)SXyIDvmjP-VGpHD2_QhhHcbsZND!z8)NKgE~LHnn1lCZr|c-;;O zmbeFJfM7#GeSO1}-m%Ylz8PgsBb}aj+vii&u&EE$mhQH1);aWgTHQ2f_3eRSW(pm@ zC`q#_2N&jn`CbOS_v$!HfC`6@ALV<@STwi2-SU!-3HzCqd$H0?;HG26D<|XZX1ORG z1jHNOfJ;zT*42Ua;+&3jNlxoYv^NnaVL>_(MmjX)7pl%#xAZe!v?<3ynzo`4&s{CK zDBbXGBv+)mKI6zjC+SMa!gAurN!GCnlMd))&@bSALdBeuc$NCnxI=~J@lOS*lQIu4 z4MvLw-?P5^{5(L5B1xUUyG9xKRvNl6n!{GMa+#oN5a!WmXb248ioTq!4{!i$*wNT|q+^I7tpk;>Ld zIfeAp15UkChu-P&vuqhp@C+TQeBvGtCfur(v0haGV@YY3RHtZ;t4_Xmbnc-Z4dOz{ zWUJepdW1WM-o20-WN%?B)mWGD4 z5Q%L@_lfJU4!bxb^`i9mfE)bCo-$_Mb}UkoX?&R;nL-t!&o$u^+6>B(^?7~yD2cDH z`3mig{gmXEN5UNvIC9OUjam-Ajs>bH8)?oOkErtT9YRck;h1v6E7rAe${YRx)Kc3K z!@QHL!A-&8508=(ZywC_J7rj&-IX2{=1JN}2!r>$W}k7K418^Dg*ShTu7AYKr+Y!Z zwKc!7U#x!9n8y>xy_9L_toYjODW%mppSV@?W`}lm7iK3 zR2+4Y^fC?c3B+j(xjQbD`#k^x@O#r@Mx}J}M*s0+O!q~cxJC!Pv4{%iZEWyH9Ta;r zE%jA4<&u-|g_vb&gi@Oq?LazU!cAl%Y+D(SIMw{%Uw|p$a?4vKN=x=ga7yvrd=5cZ zapK{#ENJiSC>RGQ$kvZFV`i1Gw8m++?-7;WMA9$7Wwc($h<(|*RVqWw^WH_EEwgDr zO!2J5A>4W0kly*)x$%|0H9Ih@PowwUyRWY!0*aVtN#iSd!UUyp)wF$CjXjn^9#V(> z`vcNj8>9H(OgCQKt`-i1$^@dr_tPdBALGc1`*^EU+uU4E?K`r z zPJWtFzY;0aR9=d_tKo5 zi>Q)k2fyZr(7qlCo?Q%EE(4jcJ+c06TafY0W?ln3x$OL3K>u&xCa(5pHvID^5EY}i zm*N?NIZqFt8K!mYjCoOLd^w>!|sDak!q3RRM` zrKYMqsB1mLBgk$l#RKd{|3hCk!*Rzk2@W7C>IFHBWU0{QiULY8ye9cyST0lt)64#= zM-)hTMg-|i$hDScs~@!wi}X;z_c*2l(c-4`r4081FKu)qp=4!Lw(8ll`kR^sv0%E@ zDT*}!utz~hxe3hWE&_iLQlK=Ujg0}FHw>-c_%cLOv>$hy{8n+lw+>f1_c5y-CLDr-u;fWYpk)%E^9{Ad(&mNhu%n&5*wNOHR!F6Ovys%QJ zu#mVdadh#hktsmNB{ZMS2?o~M^y%!2q)37GINIJZ{Y7Pub)A%Oo0u;CMnW|y^(A_v zvZOiFu83@-MdWMeP(1bh6vq?##+N4QJUdDq>_3?#qHwt$KH%wU4(>fWCVSj$EveZC zac>qYvP(!>(Htyd$K~lph8eWae9H~$T%&{>bzJz z+bb-BiOVg-Lg|~{Vp$DSMYss<_g=f_664z^YaUBN$(%Cm{hZk{r={y!tgqi-djo=w z-c;UWmdn)Am*J0`45(ZU-05pRFTZ>fmo|XryV-I)=Ra7-FBZy06}qlD$}#cz@5du} zGmP0m?%Q(G$4#w-WhedF6(Qd+iT0I-$FP%1@aa%dPdMxs;P-|MU-SXrQfIVkXH=vh zgR7|oD?ODRKA^g$P}0=-E^pj!`Gtq!jTzN%k5$TN51Un{d|UxDAJkttf9x2Za!?q*cTnVSD%lMeS?6VBDpDd) z0qY6t`o{2hO3^VXOa=AFXdT52)!Kl9p- z71&}NdiG=*CsgyH`x@)&o5uZwHC-*D3PO=Jmi`F2DAi8tNva!XEEIlpf{9tlyKA@Fy;oKGC!>^ReQ zx9W>3v-m3-t9J9_5tQ%T@#{Ug>j0M#n~gxBG7uK@*Kj=4k7mHRx%cLS7BO1n;Yyy> zYK_D#yN{`3*`Jgr&ws47eF$iS()ENgl;%fs>3rUjxI|u3%M*Qidqw(>!57aLqW4Rs z?pI|~!&4H4PcZy`o~07Hk7+~l!u>>UY1b#Mpe5rQ971qC4l`hPco*M&T0ApR_^)gx z1b>u1C;M*)^rj*r8cYgPUxP>g0)&0r&OXh24T;}EpFZb$<%Mh#x?vx+IS&3X?bKyg zb3vopD189D+V7vIUDlKojTFk*pifx5*(2cLT$cZcWQJN?gri#1=032{%m_~0hY2v) z?ILVhy+#G3itE{$7T~e!Pxz;NS;#6T!It5f*ud*vN3fe5mNudF=KMI93UP{BqPgfw}1q_L!?ZqO{VK7-J=WCH;)as2g3;`$28=*=SuxMmZY? I+pp>W0IU}vzyJUM diff --git a/scripts/staging/ssb/other/script_flags_help.txt b/scripts/staging/ssb/other/script_flags_help.txt index 7ac66d4578e..3e82ce27a73 100644 --- a/scripts/staging/ssb/other/script_flags_help.txt +++ b/scripts/staging/ssb/other/script_flags_help.txt @@ -1,17 +1,22 @@ From ReadMe.md: -To run our queries, we can execute the following shell script `run_script.sh` (in ssb directory). It has the three following parameter flags. -1. `q`: (QUERY_NAME) Name of the query to be executed. - - **all**: executes all queries +To run the queries, we can execute the following shell script `run_script.sh` (in ssb directory). It has the three following parameter flags. +1. `-q`: (QUERY_NAME) Name of the query to be executed. + - `all`: executes all queries - **[QUERY_NAME]** like q1_1 or q1.1: Executes the selected query like q1_1. dml. Both formats q1_1 or q1.1 are allowed. It will be automatically translated. - - Currently the following queries are avalaible {"q1_1", "q1_2", "q1_3", "q2_1", "q2_2", "q2_3", "q3_1", "q3_2","q3_3", "q3_4", "q4_1", "q4_2", "q4_3"} -2. `s`: (SCALE) The numerical scale factor like 0.01 or 1 etc. + - Currently, the following queries are available (q1_1, q1_2, q1_3, q2_1, q2_2, q2_3, q3_1, "q3_2,q3_3, q3_4, q4_1, q4_2, q4_3) + - Default: `q2_1` +2. `-s`: (SCALE) The numerical scale factor like 0.01 or 1 etc. - Be careful: Please do not experiment with large scale factor over 0.2 in SystemDS. Its join operation is currently very slow. -3. `d`: (DB_SYSTEM) Name of the database system used. - - **all**: executes all queries - - **systemds**: SystemDS executes DML scripts. - - **postgres**: PostgreSQL executes SQL queries. - - ( **duckdb**: DuckDB executes SQL queries.) - + - Default: `0.1` +3. `-d`: (DB_SYSTEM) Name of the database system used. + - `all`: executes queries in all three databases. + - `systemds`: SystemDS executes DML scripts with basic output. + - `systemds_stats`: SystemDS executes DML scripts with extended output (--stats). + - `postgres`: PostgreSQL executes SQL queries. + - `duckdb`: DuckDB executes SQL queries. + - Default: `systemds` +4. `-g`: (GUI_DOCKER): Use GUI docker desktop. No arguments to pass. Set only the flag "-g". +5. `-h`: (HELP) Display the script explanation from ReadMe.md. No arguments to pass. Set only the flag "-h". The command line could look like this: ``` $ ./run_script.sh -q [YOUR_QUERY_NAME] -s [YOUR_SCALE] -d [YOUR_DB_SYSTEM] @@ -20,12 +25,8 @@ Examples ``` $ ./run_script.sh -q all -s 0.1 -d all $ ./run_script.sh -q q4_3 -s 0.1 -d systemds -$ ./run_script.sh -q all -s 0.1 -d postgres -``` -If an option not specified by user, the default values are as follows: -``` -QUERY_NAME="q2_1" -SCALE=0.1 -DB_SYSTEM="systemds" +$ ./run_script.sh -q all -s 1 -d duckdb +$ ./run_script.sh -q q1.1 -s 1 -d postgres -g ``` + For more details give a closer look to ReadMe.md. \ No newline at end of file diff --git a/scripts/staging/ssb/other_docker_compose/Dockerfile b/scripts/staging/ssb/other_docker_compose/Dockerfile deleted file mode 100644 index c2f2b1bb6c5..00000000000 --- a/scripts/staging/ssb/other_docker_compose/Dockerfile +++ /dev/null @@ -1,80 +0,0 @@ -# Help: https://docs.docker.com/compose/gettingstarted/#step-1-set-up -# https://docs.docker.com/build/building/multi-stage/ -# Star Schema Benchmark data set generator (ssb-dbgen): -# https://github.com/eyalroz/ssb-dbgen.git - -# We use multi-stage method. - -# First create the datagen docker container -FROM alpine:latest AS datagen -ARG SCALE -RUN echo "build and generate data with datagen with scale factor $SCALE" - -RUN apk update -RUN apk add git gcc cmake make musl-dev -RUN git clone https://github.com/eyalroz/ssb-dbgen.git --depth 1 -# Build the generator -WORKDIR /ssb-dbgen -RUN echo "build and generate data with datagen with scale factor $SCALE" -RUN cmake -B ./build && cmake --build ./build -# Run the generator (with -s ) -RUN build/dbgen -b dists.dss -v -s $SCALE - -# Second: use the systemds docker container -# And execute a selected query. -FROM apache/systemds:latest -ENV QUERY q1_1 -WORKDIR /systemds -COPY queries queries -#COPY --from=datagen /ssb-dbgen/data data -COPY --from=datagen ssb-dbgen/customer.tbl tmp/customer.tbl -COPY --from=datagen ssb-dbgen/part.tbl data/part.tbl -COPY --from=datagen ssb-dbgen/supplier.tbl data/supplier.tbl -COPY --from=datagen ssb-dbgen/date.tbl data/date.tbl -COPY --from=datagen ssb-dbgen/lineorder.tbl data/lineorder.tbl -CMD ["queries/$QUERY.dml","systemds/data"] - -#TODO: Currently only accepting one query each. To expand to accept more queries. -#COPY helper.sh helper.sh -#RUN chmod u+x helper.sh -# COPY host_data_dir container_data_dir -#COPY data/very_small_s0_01_dataset data - -#CMD if [ "$QUERY" = "all" ]; then \ -# ["./helper.sh", "A", "B"]; \ -# else \ -# ["queries/$QUERY.dml","data"]; \ -# fi - -FROM postgres:latest -COPY ssb_init.sql /docker-entrypoint-initdb.d/ -ENV QUERY q1_1 -COPY --from=datagen ssb-dbgen/customer.tbl tmp/customer.tbl -COPY --from=datagen ssb-dbgen/part.tbl tmp/part.tbl -COPY --from=datagen ssb-dbgen/supplier.tbl tmp/supplier.tbl -COPY --from=datagen ssb-dbgen/date.tbl tmp/date.tbl -COPY --from=datagen ssb-dbgen/lineorder.tbl tmp/lineorder.tbl -WORKDIR /tmp -RUN sed -i 's/|$//' "customer.tbl" -RUN sed -i 's/|$//' "part.tbl" -RUN sed -i 's/|$//' "supplier.tbl" -RUN sed -i 's/|$//' "date.tbl" -RUN sed -i 's/|$//' "lineorder.tbl" -RUN cd .. && mkdir -p queries -WORKDIR / -COPY sql queries -WORKDIR /queries -RUN mv q1.1.sql q1_1.sql -RUN mv q1.2.sql q1_2.sql -RUN mv q1.3.sql q1_3.sql -RUN mv q2.1.sql q2_1.sql -RUN mv q2.2.sql q2_2.sql -RUN mv q2.3.sql q2_3.sql -RUN mv q3.1.sql q3_1.sql -RUN mv q3.2.sql q3_2.sql -RUN mv q3.3.sql q3_3.sql -RUN mv q3.4.sql q3_4.sql -RUN mv q4.1.sql q4_1.sql -RUN mv q4.2.sql q4_2.sql -RUN mv q4.3.sql q4_3.sql - diff --git a/scripts/staging/ssb/other_docker_compose/docker-compose.yaml b/scripts/staging/ssb/other_docker_compose/docker-compose.yaml deleted file mode 100644 index f6bfa54407d..00000000000 --- a/scripts/staging/ssb/other_docker_compose/docker-compose.yaml +++ /dev/null @@ -1,46 +0,0 @@ -# Use -#https://stackoverflow.com/questions/35231362/dockerfile-and-docker-compose-not-updating-with-new-instructions -#docker-compose --build -#docker-compose up - -# Example -#docker-compose up --build -#docker-compose up - -# Create .env file and modify before each docker compose up. -#SCALE=[OUR_VALUE] -#QUERY=[OUR_QUERY_NUMBER] -#Example: -#SCALE=0.01 -#QUERY=q1_1 - -#This docker-compose file is linked to the Dockerfile. - -services: - datagen: - build: - context: . - args: - - SCALE=${SCALE} - env_file: .env - #environment: - # SCALE: ${SCALE} - systemds: - build: - context: . - args: - - SCALE=${SCALE} - environment: - QUERY: ${QUERY} - postgres: - build: - context: . - args: - - SCALE=${SCALE} - restart: always - environment: - POSTGRES_USER: ${POSTGRES_USER} - POSTGRES_PASSWORD: $(POSTGRES_PASSWORD) - POSTGRES_DB: ${POSTGRES_DB} - ports: - - "${PORT_NUMBER}:5432" \ No newline at end of file diff --git a/scripts/staging/ssb/shell/run_script.sh b/scripts/staging/ssb/shell/run_script.sh index c3d612f3ef1..e73a9e6b699 100755 --- a/scripts/staging/ssb/shell/run_script.sh +++ b/scripts/staging/ssb/shell/run_script.sh @@ -19,17 +19,22 @@ DB_SYSTEM="systemds" isQflag=0 isSflag=0 isDflag=0 +isGflag=0 + +dml_query_array=("q1_1" "q1_2" "q1_3" "q2_1" "q2_2" "q2_3" "q3_1" "q3_2" "q3_3" "q3_4" "q4_1" "q4_2" "q4_3") +sql_query_array=("q1.1" "q1.2" "q1.3" "q2.1" "q2.2" "q2.3" "q3.1" "q3.2" "q3.3" "q3.4" "q4.1" "q4.2" "q4.3") + # Colors for output GREEN='\033[0;32m' BLUE='\033[0;34m' RED='\033[0;31m' NC='\033[0m' # No Color -echo -e "${BLUE}=== Test environment with SSB Data Loader ===${NC}\n" +echo -e "${BLUE}=== Test environment for SSB Data ===${NC}\n" #https://unix.stackexchange.com/questions/129391/passing-named-arguments-to-shell-scripts # Parsing the argument flags. -while getopts "q:s:d:h:" opt; do +while getopts "q:s:d:gh" opt; do case ${opt} in q) QUERY_NAME="$OPTARG" isQflag=1;; @@ -37,48 +42,77 @@ while getopts "q:s:d:h:" opt; do isSflag=1;; d) DB_SYSTEM="$OPTARG" isDflag=1;; - h) echo "Help:" - cat < other/script_flags_help.txt;; - \?) echo "Option ${opt} not found. Try again." - echo "Please use: $0 -q [YOUR_QUERY_NAME] -s [SCALE] -d [DB_SYSTEM]";; + g) isGflag=1;; + #h (help) without flags + h) echo "Help:" + cat < other/script_flags_help.txt + echo "Thank you.";; + ?) echo "Option ${opt} not found. Try again." + echo "Please use: $0 -q [YOUR_QUERY_NAME] -s [YOUR_SCALE] -d [YOUR_DB_SYSTEM]";; esac case $OPTARG in -*) echo "Option ${opt} should have an argument.";; esac done -echo "isQflag=$isQflag" -echo "isSflag=$isSflag" -echo "isDflag=$isDflag" -if [ isQflag==0 ]; then - echo "Warning: q-flag [QUERY_NAME] is empty. The default q is q2_1." + +#echo "isQflag=$isQflag" +#echo "isSflag=$isSflag" +#echo "isDflag=$isDflag" +#echo "isGflag=$isDflag" +if [ ${isQflag} == 0 ]; then + echo "Warning: q-flag [QUERY_NAME] is empty ${isQflag}. The default q is q2_1." fi -if [ isSflag==0 ]; then +if [ ${isSflag} == 0 ]; then echo "Warning: s-flag [SCALE] is empty. The default s is 0.01." fi -if [ isDflag==0 ]; then +if [ ${isDflag} == 0 ]; then echo "Warning: d-flag [DATABASE] is empty. The default d is systemds." fi +if [ ${isGflag} == 1 ]; then + echo "g-flag is set. That means, the docker desktop GUI is used." +fi echo "Arg 0 (SHELL_SCRIPT): $0" echo "Arg 1 (QUERY_NAME): ${QUERY_NAME}" echo "Arg 2 (SCALE): ${SCALE}" echo "Arg 3 (DB_SYSTEM): ${DB_SYSTEM}" -exit + +# Check whether the query is valid. +QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/\./_/') +isQuery_valid=0 +if [ "${QUERY_NAME}" != "all" ]; then + for q in ${dml_query_array[@]}; do + if [ ${QUERY_NAME} == ${q} ]; then + isQuery_valid=1 + break + fi + done + if [ isQuery_valid == 0 ]; then + echo -e "Sorry, this query ${QUERY_NAME} is invalid. Valid query names are 'all' and ${dml_query_array[@]}." + echo -e "${RED}Test bench terminated unsuccessfully.${NC}" + exit + fi +else + echo "All queries: ${dml_query_array[@]}" +fi + # Check for the existing required packages. If not install them. isAllowed="no" +echo "==========" echo -e "${GREEN}Install required packages${NC}" echo -e "${GREEN}Check whether the following packages exist:${NC}" -echo "docker 'docker compose' git gcc cmake make" +echo "If only SystemDS: docker 'docker compose' git gcc cmake make" +echo "For PostgreSQL: 'docker compose'" +echo "For DuckDB: duckdb" -#. for package in docker git gcc cmake make; do if [ ! "$(${package} --version)" ]; then - echo -e "${BLUE} ${package} package is required for this test bench. Do you want to allow the installation? (yes/no)${NC}" + echo "${package} package is required for this test bench. Do you want to allow the installation? (yes/no)" read -r isAllowed while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do - echo "Your anwser is ${isAllowed}." if [ "${isAllowed}" == "yes" ] || [ "${isAllowed}" == "y" ]; then - echo "sudo apt-get install ${package}." + echo "Your anwser is ${isAllowed}." + echo "sudo apt-get install ${package}" sudo apt-get install ${package} elif [ "${isAllowed}" == "no" ] || [ "${isAllowed}" == "n" ]; then echo -e "${RED}Sorry, we cannot continue with that test bench without the required packages. The test bench is stopped.${NC}" @@ -91,11 +125,12 @@ for package in docker git gcc cmake make; do done fi done - -if [ ! "$(docker compose version)" ]; then - echo -e "${BLUE}docker compose is required for this test bench. Do you want to allow the installation? (yes/no)${NC}" +isAllowed="no" +if [ "${DB_SYSTEM}" != "systemds" ] && [ ! "$(docker compose version)" ]; then + echo "docker compose is required for this test bench. Do you want to allow the installation? (yes/no)" read -r isAllowed while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do + if [ ${isAllowed} == "yes" ]; then echo "sudo apt-get install docker-compose-plugin" sudo apt-get install docker-compose-plugin @@ -108,17 +143,90 @@ if [ ! "$(docker compose version)" ]; then read -r isAllowed done fi +isAllowed="no" +if ([ "${DB_SYSTEM}" == "duckdb" ] || [ "${DB_SYSTEM}" == "all" ] ) && [ ! "$(duckdb --version)" ]; then + echo "duckdb is required for this test bench. Do you want to allow the installation? (yes/no)" + read -r isAllowed + while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do + if [ ${isAllowed} == "yes" ]; then + echo "Your anwser is ${isAllowed}." + echo "curl https://install.duckdb.org | sh" + curl https://install.duckdb.org | sh + elif [ "${isAllowed}" == "no" ] || [ "${isAllowed}" == "n" ]; then + echo -e "${RED}Sorry, we cannot continue with that test bench without the required packages. The test bench is stopped.${NC}" + exit + else + echo "Your answer '${isAllowed}' is neither 'yes' or 'no'. Please try again." + fi + read -r isAllowed + done +fi + +isAllowed="no" +# Use docker desktop GUI +if [ ${isGflag} == 1 ]; then + if [ ! "$(gnome-terminal --version)" ]; then + echo "gnome-terminal package is required for this test bench. Do you want to allow the installation? (yes/no)" + read -r isAllowed + while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do + if [ "${isAllowed}" == "yes" ] || [ "${isAllowed}" == "y" ]; then + echo "Your anwser is ${isAllowed}." + echo "sudo apt-get install gnome-terminal" + sudo apt-get install gnome-terminal + elif [ "${isAllowed}" == "no" ] || [ "${isAllowed}" == "n" ]; then + echo -e "${RED}Sorry, we cannot continue with that test bench without the required packages. The test bench is stopped.${NC}" + exit + else + echo "Your answer '${isAllowed}' is neither 'yes' or 'no'. Please try again." + read -r isAllowed + fi + done + fi + if [ ! "$(docker desktop version)" ]; then + echo "docker desktop is required for this test bench. Do you want to allow the installation? (yes/no)" + read -r isAllowed + while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do + if [ ${isAllowed} == "yes" ]; then + echo "Your anwser is ${isAllowed}." + echo "curl https://install.duckdb.org | sh" + curl https://install.duckdb.org | sh + elif [ "${isAllowed}" == "no" ] || [ "${isAllowed}" == "n" ]; then + echo -e "${RED}Sorry, we cannot continue with that test bench without the required packages. The test bench is stopped.${NC}" + exit + else + echo "Your answer '${isAllowed}' is neither 'yes' or 'no'. Please try again." + fi + read -r isAllowed + done + fi +fi # Check whether the data directory exists. +echo "==========" echo -e "${GREEN}Check for existing data directory and prepare the ssb-dbgen${NC}" if [ ! -d ssb-dbgen ]; then git clone https://github.com/eyalroz/ssb-dbgen.git --depth 1 cd ssb-dbgen else cd ssb-dbgen - git pull + echo "Can we look for new updates of the datagen repository?. If there are, do you want to pull it? (yes/no)" + read -r isAllowed + while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do + if [ "${isAllowed}" == "yes" ] || [ "${isAllowed}" == "y" ]; then + echo "Your answer is '${isAllowed}'" + echo "git pull" + git pull + elif [ "${isAllowed}" == "no" ] || [ "${isAllowed}" == "n" ]; then + echo "Your answer is '${isAllowed}'. No pulls. Use the currently existing version locally." + break + else + echo "Your answer '${isAllowed}' is neither 'yes' or 'no'. Please try again." + read -r isAllowed + fi + done fi +echo "==========" echo -e "${GREEN}Build ssb-dbgen and generate data with a given scale factor${NC}" # Build the generator cmake -B ./build && cmake --build ./build @@ -129,53 +237,102 @@ mv *.tbl ../data_dir # Go back to ssb home directory cd .. -if [ "${DB_SYSTEM}" == "systemds" ] || [ "${DB_SYSTEM}" == "all" ] ; then - docker pull apache/systemds:latest +echo "Number of rows of created tables." +for table in customer part supplier date lineorder; do + str1=`wc --lines < data_dir/${table}.tbl` + echo "Table ${table} has ${str1} rows." +done + +# Execute queries in SystemDS docker container. +if [ "${DB_SYSTEM}" == "systemds" ] || [ "${DB_SYSTEM}" == "systemds_stats" ] || [ "${DB_SYSTEM}" == "all" ] ; then + echo "==========" + + echo -e "${GREEN}Start the SystemDS docker container." + if [ ${isGflag} == 1 ]; then + docker desktop start + else + sudo systemctl start docker + fi + + if [ ! "$(docker images apache/systemds:latest)" ]; then + docker pull apache/systemds:latest + fi + + echo "==========" + echo -e "${GREEN}Execute DML queries in SystemDS${NC}" QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/\./_/') - + + docker desktop + #Enable extended outputs with stats in SystemDs + useStats="" + if [ "${DB_SYSTEM}" == "systemds_stats" ]; then + useStats="--stats" + fi ##all: {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} if [ "${QUERY_NAME}" == "all" ]; then echo "Execute all 13 queries." - for q in {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} - do - echo "Execute query ${QUERY_NAME}.dml" - docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/${q}.dml -nvargs input_dir="/scripts/data_dir" + + for q in ${dml_query_array[@]} ; do + echo "Execute query ${q}.dml" + docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/${q}.dml ${useStats} -nvargs input_dir="/scripts/data_dir" done else echo "Execute query ${QUERY_NAME}.dml" - docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/${QUERY_NAME}.dml -nvargs input_dir="/scripts/data_dir" + docker run -it --rm -v $PWD:/scripts/ apache/systemds:latest -f /scripts/queries/${QUERY_NAME}.dml ${useStats} -nvargs input_dir="/scripts/data_dir" fi fi +# Execute queries in PostgreSQL docker container. if [ "${DB_SYSTEM}" == "postgres" ] || [ "${DB_SYSTEM}" == "all" ] ; then + echo "==========" + echo -e "${GREEN}Start the PostgreSQL Docker containter and load data.${NC}" + + if [ ${isGflag} == 1 ]; then + docker desktop start + else + sudo systemctl start docker + fi + + if [ ! "$(docker images postgres:latest)" ]; then + docker pull postgres:latest + fi + #Look more in the documentation. #https://docs.docker.com/reference/cli/docker/container/ls/ - if [ "$(docker ps -a --filter name=${PG_CONTAINER})" ]; then - if [ ! "$(docker ps --filter name=${PG_CONTAINER})" ]; then + echo "AM HERE" + #TO DO solve here. + if [ "$(docker ps -aq --filter name=${PG_CONTAINER})" ]; then + if [ ! "$(docker ps -q --filter name=${PG_CONTAINER})" ]; then echo "Starting existing container..." docker start ${PG_CONTAINER} fi else echo "Creating new PostgreSQL container..." - docker compose up --build -d + echo "$PWD/docker-compose.yaml" + docker compose -f "$PWD/docker-compose.yaml" up -d --build + sleep 3 fi + echo "AM HERE2" # Load data and copy into the database - docker cp data_dir ${PG_CONTAINER}:/tmp + for table in customer part supplier date lineorder; do #docker exec -i ${PG_CONTAINER} ls - docker exec -i ${PG_CONTAINER} sed -i 's/|$//' "${table}.tbl" + docker cp data_dir/${table}.tbl ${PG_CONTAINER}:/tmp + echo "Load ${table} table with number_of_rows:" + docker exec -i ${PG_CONTAINER} sed -i 's/|$//' "tmp/${table}.tbl" docker exec -i ${PG_CONTAINER} psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "TRUNCATE TABLE ${table} CASCADE; COPY ${table} FROM '/tmp/${table}.tbl' DELIMITER '|';" done # Change query_name e.g. from q1_1 to q1.1 QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/_/./') + echo "==========" echo -e "${GREEN}Execute SQL queries in PostgresSQL${NC}" - - ##all: {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} + #all: {"q1.1","q1.2","q1.3","q2.1","q2.2","q2.3","q3.1","q3.2","q3.3","q3.4","q4.1","q4.2","q4.3"} if [ "${QUERY_NAME}" = "all" ]; then echo "Execute all 13 queries." - for q in {"q1.1","q1.2","q1.3","q2.1","q2.2","q2.3","q3.1","q3.2","q3.3","q3.4","q4.1","q4.2","q4.3"}; do + for q in ${sql_query_array[@]}; do echo "Execute query ${q}.sql" + echo "docker exec -i ${PG_CONTAINER} psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} < sql/${QUERY_NAME}.sql" docker exec -i ${PG_CONTAINER} psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} < sql/${q}.sql done else @@ -185,22 +342,37 @@ if [ "${DB_SYSTEM}" == "postgres" ] || [ "${DB_SYSTEM}" == "all" ] ; then fi fi -#TODO Add duckdb support -#if [ "${DB_SYSTEM}" == "duckdb" ] || [ "${DB_SYSTEM}" == "all" ]; then +# Execute queries in DuckDB locally. +if [ "${DB_SYSTEM}" == "duckdb" ] || [ "${DB_SYSTEM}" == "all" ]; then + + echo "==========" + echo -e "${GREEN}Start a DuckDB persistent database and load data.${NC}" + #https://duckdbsnippets.com/snippets/198/run-sql-file-in-duckdb-cli + # Create a duckdb persistent database file. + duckdb shell/test_ssb.duckdb < other/ssb_init.sql + + # Load data and copy into the database. + for table in customer part supplier date lineorder; do + echo "Load ${table} table" + duckdb shell/test_ssb.duckdb -c "COPY ${table} FROM 'data_dir/${table}.tbl'; SELECT COUNT(*) AS number_of_rows FROM ${table};" + done + # # Change query_name e.g. from q1_1 to q1.1 -# QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/_/./') -# echo -e "${GREEN}Execute SQL queries in DuckDB${NC}" + QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/_/./') + echo "==========" + echo -e "${GREEN}Execute SQL queries in DuckDB${NC}" + #all: {"q1.1","q1.2","q1.3","q2.1","q2.2","q2.3","q3.1","q3.2","q3.3","q3.4","q4.1","q4.2","q4.3"} + if [ "${QUERY_NAME}" = "all" ]; then + echo "Execute all 13 queries." + for q in ${sql_query_array[@]}; do + echo "Execute query ${q}.sql" + duckdb shell/test_ssb.duckdb < sql/${q}.sql + done + else + echo "Execute query ${QUERY_NAME}.sql" + duckdb shell/test_ssb.duckdb < sql/${QUERY_NAME}.sql + fi - ##all: {"q1_1","q1_2","q1_3","q2_1","q2_2","q2_3","q3_1","q3_2","q3_3","q3_4","q4_1","q4_2","q4_3"} -# if [ "${QUERY_NAME}" = "all" ]; then -# echo "Execute all 13 queries." -# for q in {"q1.1","q1.2","q1.3","q2.1","q2.2","q2.3","q3.1","q3.2","q3.3","q3.4","q4.1","q4.2","q4.3"} -# do -# echo "Execute query ${QUERY_NAME}." -# #TODO -# done -# else -# echo "Execute query ${QUERY_NAME}" -# #TODO -# fi -#fi \ No newline at end of file +fi +echo "==========" +echo -e "${GREEN}Test bench finished successfully.${NC}" From 799da543cedd512fca71d273ced51524615f5b00 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Wed, 28 Jan 2026 21:10:26 +0100 Subject: [PATCH 21/22] Updated version and problems fixed. --- scripts/staging/ssb/ReadMe.md | 128 +++++++++++++++++++++++- scripts/staging/ssb/shell/run_script.sh | 54 +++------- 2 files changed, 138 insertions(+), 44 deletions(-) diff --git a/scripts/staging/ssb/ReadMe.md b/scripts/staging/ssb/ReadMe.md index 4824a9f7355..59ae5931b66 100644 --- a/scripts/staging/ssb/ReadMe.md +++ b/scripts/staging/ssb/ReadMe.md @@ -34,17 +34,16 @@ ssb/ └── sql/ # SQL versions & `test_ssb.duckdb` for DuckDB ``` ## Setup -- First, install [Docker](https://docs.docker.com/get-started/get-docker/), [Docker Compose](https://docs.docker.com/compose/install/) and its necessary libraries. +- First, install [Docker](https://docs.docker.com/get-started/get-docker/), [Docker Compose](https://docs.docker.com/compose/install/) and its necessary libraries. The script does not cover Docker installation. For Ubuntu, there is the following tutorials [for Docker](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) and [Docker Compose](https://docs.docker.com/compose/install/linux/#install-using-the-repository) using apt repository. You can add [Docker Desktop](https://docs.docker.com/desktop/setup/install/linux/ubuntu/), too. The shell script covers the installation of the following points. We use Ubuntu and Debian. For other OS, please look closer at the documentations. - +- Docker compose installation for Ubuntu/Debian (For other OS look [here](https://docs.docker.com/compose/install/)) - Docker version of the database system [SystemDS](https://apache.github.io/systemds/site/docker) - Docker compose version of [PostgreSQL](docker-compose.yaml) based on its [documentation]((https://hub.docker.com/_/postgres)). - [ssb-dbgen](https://github.com/eyalroz/ssb-dbgen/tree/master) (SSB data set generator `datagen`) - -For more options look into the original documentation. +- ## Structure of the test system ![diagram](other/dia_ssb_script_structure1.jpg) @@ -98,3 +97,124 @@ $ ./run_script.sh -q q4_3 -s 0.1 -d systemds $ ./run_script.sh -q all -s 1 -d duckdb $ ./run_script.sh -q q1.1 -s 1 -d postgres -g ``` + +# Example output +Here is how the (abridged) script output could like. +The script does the following steps: +- Loading arguments and environment variables +- Installing packages (and asking permissions for it) +- Generating data with datagen (SSB data generator) +- Loading Docker images for SystemDS or PostgreSQL +- Initializing Docker database containers and duckDB database +- Loading the SQL scheme and data into databases +- Running the selected queries +``` +user@user1:~/systemds/scripts/staging/ssb$ ./shell/run_script.sh -q q2_3 -s 0.1 -d all -g +=== Test environment for SSB Data === + +g-flag is set. That means, the docker desktop GUI is used. +Arg 0 (SHELL_SCRIPT): ./shell/run_script.sh +Arg 1 (QUERY_NAME): q2_3 +Arg 2 (SCALE): 0.1 +Arg 3 (DB_SYSTEM): all +========== +Install required packages +Check whether the following packages exist: +If only SystemDS: docker 'docker compose' git gcc cmake make +For PostgreSQL: 'docker compose' +For DuckDB: duckdb +If using g-flag [GUI]: docker desktop +========== +Check for existing data directory and prepare the ssb-dbgen +Can we look for new updates of the datagen repository?. If there are, do you want to pull it? (yes/no) +yes +Your answer is 'no' +========== +Build ssb-dbgen and generate data with a given scale factor +[...] +SSB (Star Schema Benchmark) Population Generator (Version 1.0.0) +Copyright Transaction Processing Performance Council 1994 - 2000 +Generating data for part table [pid: 1]: done. +Generating data for suppliers table [pid: 1]: done. +[...] +Number of rows of created tables. +Table customer has 3000 rows. +Table part has 20000 rows. +Table supplier has 200 rows. +Table date has 255 rows. +Table lineorder has 600597 rows. +========== +Start the SystemDS docker container. +Docker Desktop is already running +========== +Execute DML queries in SystemDS + +Execute query q2_3.dml +WARNING: Using incubator modules: jdk.incubator.vector +Loading tables from directory: /scripts/data_dir +SUM(lo_revenue) | d_year | p_brand +# FRAME: nrow = 1, ncol = 3 +# C1 C2 C3 +# INT32 INT32 STRING +72081993 1992 MFGR#2239 + + +Q2.3 finished. + +SystemDS Statistics: +Total execution time: 9.924 sec. + +========== +Start the PostgreSQL Docker containter and load data. +Docker Desktop is already running + +Successfully copied 282kB to ssb-postgres-1:/tmp +Load customer table with number_of_rows: +TRUNCATE TABLE +COPY 3000 +Successfully copied 1.7MB to ssb-postgres-1:/tmp +Load part table with number_of_rows: +TRUNCATE TABLE +COPY 20000 +[...] +========== +Execute SQL queries in PostgresSQL +Execute query q2.3.sql +docker exec -i ssb-postgres-1 psql -U userA -d db1 < sql/q2.3.sql + sum | d_year | p_brand +----------+--------+----------- + 72081993 | 1992 | MFGR#2239 +(1 row) + +========== +Start a DuckDB persistent database and load data. +Load customer table +┌────────────────┐ +│ number_of_rows │ +│ int64 │ +├────────────────┤ +│ 3000 │ +└────────────────┘ +Load part table +┌────────────────┐ +│ number_of_rows │ +│ int64 │ +├────────────────┤ +│ 20000 │ +└────────────────┘ +[...] +========== +Execute SQL queries in DuckDB +Execute query q2.3.sql +┌─────────────────┬────────┬───────────┐ +│ sum(lo_revenue) │ d_year │ p_brand │ +│ int128 │ int32 │ varchar │ +├─────────────────┼────────┼───────────┤ +│ 72081993 │ 1992 │ MFGR#2239 │ +└─────────────────┴────────┴───────────┘ +========== +Test bench finished successfully. +``` + +## Troubleshooting +- If you encounter docker problems like "Permission denied" or data not loaded successfully into the tables, try to restart docker or remove the container. You can also switch between the standard Docker Engine (with GUI) or Docker Desktop (with GUI) with flag `-g`. \ No newline at end of file diff --git a/scripts/staging/ssb/shell/run_script.sh b/scripts/staging/ssb/shell/run_script.sh index e73a9e6b699..68d4082ac48 100755 --- a/scripts/staging/ssb/shell/run_script.sh +++ b/scripts/staging/ssb/shell/run_script.sh @@ -104,7 +104,12 @@ echo -e "${GREEN}Check whether the following packages exist:${NC}" echo "If only SystemDS: docker 'docker compose' git gcc cmake make" echo "For PostgreSQL: 'docker compose'" echo "For DuckDB: duckdb" +echo "If using g-flag [GUI]: docker desktop" +if [ ! "$(docker --version)" ]; then + echo "Docker is required for this test bench. Please install it manually using the official documentation." + exit + fi for package in docker git gcc cmake make; do if [ ! "$(${package} --version)" ]; then echo "${package} package is required for this test bench. Do you want to allow the installation? (yes/no)" @@ -127,7 +132,7 @@ for package in docker git gcc cmake make; do done isAllowed="no" if [ "${DB_SYSTEM}" != "systemds" ] && [ ! "$(docker compose version)" ]; then - echo "docker compose is required for this test bench. Do you want to allow the installation? (yes/no)" + echo "Docker compose is required for this test bench. Do you want to allow the installation? (yes/no)" read -r isAllowed while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do @@ -145,7 +150,7 @@ if [ "${DB_SYSTEM}" != "systemds" ] && [ ! "$(docker compose version)" ]; then fi isAllowed="no" if ([ "${DB_SYSTEM}" == "duckdb" ] || [ "${DB_SYSTEM}" == "all" ] ) && [ ! "$(duckdb --version)" ]; then - echo "duckdb is required for this test bench. Do you want to allow the installation? (yes/no)" + echo "Duckdb is required for this test bench. Do you want to allow the installation? (yes/no)" read -r isAllowed while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do if [ ${isAllowed} == "yes" ]; then @@ -165,39 +170,9 @@ fi isAllowed="no" # Use docker desktop GUI if [ ${isGflag} == 1 ]; then - if [ ! "$(gnome-terminal --version)" ]; then - echo "gnome-terminal package is required for this test bench. Do you want to allow the installation? (yes/no)" - read -r isAllowed - while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do - if [ "${isAllowed}" == "yes" ] || [ "${isAllowed}" == "y" ]; then - echo "Your anwser is ${isAllowed}." - echo "sudo apt-get install gnome-terminal" - sudo apt-get install gnome-terminal - elif [ "${isAllowed}" == "no" ] || [ "${isAllowed}" == "n" ]; then - echo -e "${RED}Sorry, we cannot continue with that test bench without the required packages. The test bench is stopped.${NC}" - exit - else - echo "Your answer '${isAllowed}' is neither 'yes' or 'no'. Please try again." - read -r isAllowed - fi - done - fi if [ ! "$(docker desktop version)" ]; then - echo "docker desktop is required for this test bench. Do you want to allow the installation? (yes/no)" - read -r isAllowed - while [ "${isAllowed}" != "yes" ] || [ "${isAllowed}" != "y" ]; do - if [ ${isAllowed} == "yes" ]; then - echo "Your anwser is ${isAllowed}." - echo "curl https://install.duckdb.org | sh" - curl https://install.duckdb.org | sh - elif [ "${isAllowed}" == "no" ] || [ "${isAllowed}" == "n" ]; then - echo -e "${RED}Sorry, we cannot continue with that test bench without the required packages. The test bench is stopped.${NC}" - exit - else - echo "Your answer '${isAllowed}' is neither 'yes' or 'no'. Please try again." - fi - read -r isAllowed - done + echo "Docker desktop is required for this test bench. Please install it manually using the official documentation." + exit fi fi @@ -216,6 +191,7 @@ else echo "Your answer is '${isAllowed}'" echo "git pull" git pull + break elif [ "${isAllowed}" == "no" ] || [ "${isAllowed}" == "n" ]; then echo "Your answer is '${isAllowed}'. No pulls. Use the currently existing version locally." break @@ -247,7 +223,7 @@ done if [ "${DB_SYSTEM}" == "systemds" ] || [ "${DB_SYSTEM}" == "systemds_stats" ] || [ "${DB_SYSTEM}" == "all" ] ; then echo "==========" - echo -e "${GREEN}Start the SystemDS docker container." + echo -e "${GREEN}Start the SystemDS docker container.${NC}" if [ ${isGflag} == 1 ]; then docker desktop start else @@ -263,7 +239,6 @@ if [ "${DB_SYSTEM}" == "systemds" ] || [ "${DB_SYSTEM}" == "systemds_stats" ] || echo -e "${GREEN}Execute DML queries in SystemDS${NC}" QUERY_NAME=$(echo "${QUERY_NAME}" | sed 's/\./_/') - docker desktop #Enable extended outputs with stats in SystemDs useStats="" if [ "${DB_SYSTEM}" == "systemds_stats" ]; then @@ -300,8 +275,7 @@ if [ "${DB_SYSTEM}" == "postgres" ] || [ "${DB_SYSTEM}" == "all" ] ; then #Look more in the documentation. #https://docs.docker.com/reference/cli/docker/container/ls/ - echo "AM HERE" - #TO DO solve here. + if [ "$(docker ps -aq --filter name=${PG_CONTAINER})" ]; then if [ ! "$(docker ps -q --filter name=${PG_CONTAINER})" ]; then echo "Starting existing container..." @@ -313,7 +287,6 @@ if [ "${DB_SYSTEM}" == "postgres" ] || [ "${DB_SYSTEM}" == "all" ] ; then docker compose -f "$PWD/docker-compose.yaml" up -d --build sleep 3 fi - echo "AM HERE2" # Load data and copy into the database for table in customer part supplier date lineorder; do @@ -350,10 +323,11 @@ if [ "${DB_SYSTEM}" == "duckdb" ] || [ "${DB_SYSTEM}" == "all" ]; then #https://duckdbsnippets.com/snippets/198/run-sql-file-in-duckdb-cli # Create a duckdb persistent database file. duckdb shell/test_ssb.duckdb < other/ssb_init.sql - + # Load data and copy into the database. for table in customer part supplier date lineorder; do echo "Load ${table} table" + duckdb shell/test_ssb.duckdb -c "TRUNCATE TABLE ${table} CASCADE;" duckdb shell/test_ssb.duckdb -c "COPY ${table} FROM 'data_dir/${table}.tbl'; SELECT COUNT(*) AS number_of_rows FROM ${table};" done From 9240cf2b823d276566b3635d89659242696ecf53 Mon Sep 17 00:00:00 2001 From: Johnn-ui2010 Date: Thu, 29 Jan 2026 13:51:10 +0100 Subject: [PATCH 22/22] License headers added. --- scripts/staging/ssb/Dockerfile | 23 ++++++++++++++++- scripts/staging/ssb/ReadMe.md | 10 ++++---- scripts/staging/ssb/docker-compose.yaml | 34 +++++++++++++++++++------ scripts/staging/ssb/queries/q1_1.dml | 23 +++++++++++++++++ scripts/staging/ssb/queries/q1_2.dml | 21 +++++++++++++++ scripts/staging/ssb/queries/q1_3.dml | 21 +++++++++++++++ scripts/staging/ssb/queries/q2_1.dml | 22 ++++++++++++++++ scripts/staging/ssb/queries/q2_2.dml | 22 ++++++++++++++++ scripts/staging/ssb/queries/q2_3.dml | 21 +++++++++++++++ scripts/staging/ssb/queries/q3_1.dml | 21 +++++++++++++++ scripts/staging/ssb/queries/q3_2.dml | 22 ++++++++++++++++ scripts/staging/ssb/queries/q3_3.dml | 22 ++++++++++++++++ scripts/staging/ssb/queries/q3_4.dml | 21 +++++++++++++++ scripts/staging/ssb/queries/q4_1.dml | 21 +++++++++++++++ scripts/staging/ssb/queries/q4_2.dml | 21 +++++++++++++++ scripts/staging/ssb/queries/q4_3.dml | 21 +++++++++++++++ scripts/staging/ssb/shell/run_script.sh | 20 +++++++++++++++ 17 files changed, 352 insertions(+), 14 deletions(-) diff --git a/scripts/staging/ssb/Dockerfile b/scripts/staging/ssb/Dockerfile index 6c88bad73b6..2500d9722f6 100644 --- a/scripts/staging/ssb/Dockerfile +++ b/scripts/staging/ssb/Dockerfile @@ -1,4 +1,25 @@ -# Help: https://docs.docker.com/compose/gettingstarted/#step-1-set-up +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + +# Follow the tutorial: https://docs.docker.com/compose/gettingstarted/#step-1-set-up FROM postgres:latest # Init the data and load to the database with a sql script. diff --git a/scripts/staging/ssb/ReadMe.md b/scripts/staging/ssb/ReadMe.md index 59ae5931b66..100ce372b16 100644 --- a/scripts/staging/ssb/ReadMe.md +++ b/scripts/staging/ssb/ReadMe.md @@ -23,16 +23,16 @@ ssb/ ├── docker-compose.yaml # Compose file for Docker containers (here for PostgreSQL) ├── Dockerfile -├── README.md # This explanation -├── queries/ # DML queries (q1_1.dml ... q4_3.dml) +├── other # Some other files (necessary) +├── README.md # This explanation +├── queries/ # DML queries (q1_1.dml ... q4_3.dml) │ ├── q1_1.dml - q1_3.dml │ ├── q2_1.dml - q2_3.dml │ ├── q3_1.dml - q3_4.dml │ └── q4_1.dml - q4_3.dml ├── shell/ -│ ├── run_script.sh # Main script -└── sql/ # SQL versions & `test_ssb.duckdb` for DuckDB -``` +│ ├── run_script.sh # Main script +└── (sql/ # SQL versions) ## Setup - First, install [Docker](https://docs.docker.com/get-started/get-docker/), [Docker Compose](https://docs.docker.com/compose/install/) and its necessary libraries. The script does not cover Docker installation. diff --git a/scripts/staging/ssb/docker-compose.yaml b/scripts/staging/ssb/docker-compose.yaml index 26941ad2115..70aec4dcab9 100644 --- a/scripts/staging/ssb/docker-compose.yaml +++ b/scripts/staging/ssb/docker-compose.yaml @@ -1,13 +1,31 @@ -# The docker compose file to create a postgres instance. -#docker-compose --build -#docker-compose up +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- -# Example -#docker-compose up --build -#docker-compose up +## The docker compose file to create a postgres instance. +#docker compose up --build +## Or (if does not work) +#docker compose -f "$[THE_ACTUAL_PATH]/docker-compose.yaml" up -d --build -# Create .env file and modify before each docker compose up. -# in .env file +## Create .env file and modify before each docker compose up. +## in .env file #POSTGRES_USER=[YOUR_USERNAME] #POSTGRES_PASSWORD=[YOUR_USERNAME] #POSTGRES_DB=[YOUR_DB_NAME] diff --git a/scripts/staging/ssb/queries/q1_1.dml b/scripts/staging/ssb/queries/q1_1.dml index 8a55acef472..992e1691453 100644 --- a/scripts/staging/ssb/queries/q1_1.dml +++ b/scripts/staging/ssb/queries/q1_1.dml @@ -1,3 +1,26 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + + + /* DML-script implementing the ssb query Q1.1 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q1_2.dml b/scripts/staging/ssb/queries/q1_2.dml index d376d8e6b8e..599ee849b29 100644 --- a/scripts/staging/ssb/queries/q1_2.dml +++ b/scripts/staging/ssb/queries/q1_2.dml @@ -1,3 +1,24 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + /* DML-script implementing the ssb query Q1.2 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q1_3.dml b/scripts/staging/ssb/queries/q1_3.dml index c4cf18e2a97..4a9484da11e 100644 --- a/scripts/staging/ssb/queries/q1_3.dml +++ b/scripts/staging/ssb/queries/q1_3.dml @@ -1,3 +1,24 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + /* DML-script implementing the ssb query Q1.3 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q2_1.dml b/scripts/staging/ssb/queries/q2_1.dml index 3e1c73730c7..24e70a7d01a 100644 --- a/scripts/staging/ssb/queries/q2_1.dml +++ b/scripts/staging/ssb/queries/q2_1.dml @@ -1,3 +1,25 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + + /* DML-script implementing the ssb query Q2.1 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q2_2.dml b/scripts/staging/ssb/queries/q2_2.dml index 05981cf7370..8636ea67421 100644 --- a/scripts/staging/ssb/queries/q2_2.dml +++ b/scripts/staging/ssb/queries/q2_2.dml @@ -1,3 +1,25 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + + /* DML-script implementing the ssb query Q2.2 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q2_3.dml b/scripts/staging/ssb/queries/q2_3.dml index 35f08f3ef07..d7bde49aadd 100644 --- a/scripts/staging/ssb/queries/q2_3.dml +++ b/scripts/staging/ssb/queries/q2_3.dml @@ -1,3 +1,24 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + /* DML-script implementing the ssb query Q2.3 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q3_1.dml b/scripts/staging/ssb/queries/q3_1.dml index d7a224dd26e..e47e8a87b43 100644 --- a/scripts/staging/ssb/queries/q3_1.dml +++ b/scripts/staging/ssb/queries/q3_1.dml @@ -1,3 +1,24 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + /* DML-script implementing the ssb query Q3.1 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q3_2.dml b/scripts/staging/ssb/queries/q3_2.dml index 0cd6c7deba5..f05c8441846 100644 --- a/scripts/staging/ssb/queries/q3_2.dml +++ b/scripts/staging/ssb/queries/q3_2.dml @@ -1,3 +1,25 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + + /* DML-script implementing the ssb query Q3.2 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q3_3.dml b/scripts/staging/ssb/queries/q3_3.dml index 28f75d950f7..87c59233e73 100644 --- a/scripts/staging/ssb/queries/q3_3.dml +++ b/scripts/staging/ssb/queries/q3_3.dml @@ -1,3 +1,25 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + + /* DML-script implementing the ssb query Q3.3 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q3_4.dml b/scripts/staging/ssb/queries/q3_4.dml index 633bc794b35..278fb2d8c82 100644 --- a/scripts/staging/ssb/queries/q3_4.dml +++ b/scripts/staging/ssb/queries/q3_4.dml @@ -1,3 +1,24 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + /* DML-script implementing the ssb query Q3.3 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q4_1.dml b/scripts/staging/ssb/queries/q4_1.dml index 33178a15e1a..b3787925c35 100644 --- a/scripts/staging/ssb/queries/q4_1.dml +++ b/scripts/staging/ssb/queries/q4_1.dml @@ -1,3 +1,24 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + /* DML-script implementing the ssb query Q4.2 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q4_2.dml b/scripts/staging/ssb/queries/q4_2.dml index cb7794a56af..c873832a2ee 100644 --- a/scripts/staging/ssb/queries/q4_2.dml +++ b/scripts/staging/ssb/queries/q4_2.dml @@ -1,3 +1,24 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + /* DML-script implementing the ssb query Q4.2 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/queries/q4_3.dml b/scripts/staging/ssb/queries/q4_3.dml index 56d0d31788e..384411432b6 100644 --- a/scripts/staging/ssb/queries/q4_3.dml +++ b/scripts/staging/ssb/queries/q4_3.dml @@ -1,3 +1,24 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + /* DML-script implementing the ssb query Q4.3 in SystemDS. **input_dir="/scripts/ssb/data" diff --git a/scripts/staging/ssb/shell/run_script.sh b/scripts/staging/ssb/shell/run_script.sh index 68d4082ac48..a6b78369a00 100755 --- a/scripts/staging/ssb/shell/run_script.sh +++ b/scripts/staging/ssb/shell/run_script.sh @@ -1,3 +1,23 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- #!/bin/bash #Mark as executable.