Add a python implementation of TPCH query 9#627
Conversation
|
When running this example I was getting errors like:
Changing |
|
For what it's worth, I didn't run into this issue.
EDIT: Just kidding. Once I |
|
When I run this q9 implementation at SF1K (parquet, floats not decimals, partitioned tables) on 1x H100 of an internal DGX H100 system, I get the following performance: @wence- , is this in line with expectations from your local testing or unexpected? |
|
The stream changes are probably from https://github.com/rapidsai/rmm/pull/2110/files#diff-83f78faf97018ac3ad1e8b43d42eb56d0a6acc442b499a75fb04a7838301174eR127 (cc @nirandaperera). https://docs.python.org/3/reference/datamodel.html#object.__hash__ mentions that objects implementing I'm not sure what the best way to uniquely identify a stream. I see https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html#group__CUDART__STREAM_1g5799ae8dd744e561dfdeda02c53e82df, but I don't know if that's an option for RMM. If we only deal with owning streams at the Python level, then I think that |
| orders_files = get_files(orders, parquet_suffix) | ||
| nation_files = get_files(nation, parquet_suffix) | ||
| nodes: list[CppNode | PyNode] = [] | ||
| lineitem_ch = Channel[TableChunk]() |
There was a problem hiding this comment.
You're probably aware, but these Channel() calls need to change to ctx.create_channel() now that #631 is in. Just leaving a comment for viz.
This seems very slow. I will try on an H100 as well... |
No description provided.