Skip to content
This repository was archived by the owner on Jul 23, 2024. It is now read-only.

Conversation

@kuien
Copy link
Contributor

@kuien kuien commented Sep 20, 2018

No description provided.

@linwen
Copy link

linwen commented Sep 21, 2018

It is a good optimization point. If a lot of columns will be projected, we can only fetch joinkey and do a bloomfilter check, if doesn't match, no need to fetch other columns.

But in this PR, if bloomfilter is not enable, it will fetch joinkey in the first loop, and fetch other columns in the second loop, which needs a little refine further.

}

/* skip those attributes not in given list */
if (attsList != NIL && list_find_int(attsList, i) >= 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, should be < 0

@interma
Copy link
Member

interma commented Oct 2, 2018

@kuien I do a perf test on your pr, two issues:

  1. query result error
  2. performance downgrade

Details see below, please check code, thanks.

TPCH1G data on my mac, master code

tpch=# select count (*) from part, lineitem where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX';
count
-------
  6088
(1 row)

Time: 3150.873 ms
tpch=# set hawq_hashjoin_bloomfilter to on;
SET
Time: 2.903 ms
tpch=# select count (*) from part, lineitem where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX';
count
-------
  6088
(1 row)

Time: 1512.782 ms

your code

tpch=# select count (*) from part, lineitem where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX';
 count
-------
  6088
(1 row)

Time: 49466.999 ms #<-- result ok, but bad performance
tpch=# set hawq_hashjoin_bloomfilter to on;                                                                             SET
Time: 13.106 ms
tpch=# select count (*) from part, lineitem where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX';
 count
-------
     0 #<-- result error
(1 row)

Time: 1888.176 ms 

Copy link
Member

@interma interma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix issues.

@interma
Copy link
Member

interma commented Oct 2, 2018

@kuien
Btw: If also test on mac, you can generate tpch data via my dbgen tools:
https://github.com/interma/misc/tree/master/hawq/tpch_mac

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants