-
Notifications
You must be signed in to change notification settings - Fork 9
Description
SF1000 Status update
10 queries running on 4 drivers
+6 can run with 1 driver
+6 assuming we can fix errors, add spilling and match cudf-polars
=4.4x speedup
and then we can begin performance tuning
Action items
| Query | Issues | Status |
|---|---|---|
| Q4 | Segfault for 1 and 4 drivers *** Signal 11 (SIGSEGV) (0xf6e560010000) received by PID 22125 (pthread TID 0xf6f40d5cd840) (linux TID 22149) (code: address not mapped to object), stack trace: *** |
tbd |
| Q10 | segfault for 1 drivers, "nvCOMP crash" for 4 drivers | tbd |
| Q13 | see SF100 | |
| Q17 | OOM. see analysis below. maybe just needs spilling | tbd |
| Q18 | OOM. Strangely OOMs with 1 driver but passes with 4 drivers | |
| Q21 | concatenate exceeds row limit. needs a streaming join? for anti or right semi (filter) perhaps | tbd |
"nvCOMP crash" impacting Q3,5,8,9,10,14,20 on 4 drivers Exceptions.h:66] Line: /workspace/velox/velox/exec/Driver.cpp:574, Function:operator(), Expression: Operator::getOutput failed for [operator: TableScan, plan node ID: 3]: CUDF failure at: /buildcache/release/_deps/cudf-src/cpp/src/io/parquet/reader_impl_chunking_utils.cu:627: Error during decompression, Source: RUNTIME, ErrorCode: INVALID_STATE
Q17 OOM
Steadily builds up tablescan-join-aggregate
cudf-polars solves this with spilling in UVM
https://drive.google.com/file/d/1EiYrMthoQHYWleLpNzHocFx8E5vCoSDt/view?usp=drive_link
Q18 OOM
Polars spills heavily
https://drive.google.com/file/d/1c5Ev4fI27yQur5aYlnxbUKuqaMmqipvZ/view?usp=drive_link

System information
GH200
https://github.com/rapidsai/velox-testing on main
Velox on mattgara/upstream-tpch-bench-fixes