Skip to content

SF1000 Status update (Aug 28 2025) #44

@GregoryKimball

Description

@GregoryKimball

SF1000 Status update

logs and profiles

Image

10 queries running on 4 drivers
+6 can run with 1 driver
+6 assuming we can fix errors, add spilling and match cudf-polars
=4.4x speedup

and then we can begin performance tuning

Action items

Query Issues Status
Q4 Segfault for 1 and 4 drivers *** Signal 11 (SIGSEGV) (0xf6e560010000) received by PID 22125 (pthread TID 0xf6f40d5cd840) (linux TID 22149) (code: address not mapped to object), stack trace: *** tbd
Q10 segfault for 1 drivers, "nvCOMP crash" for 4 drivers tbd
Q13 see SF100
Q17 OOM. see analysis below. maybe just needs spilling tbd
Q18 OOM. Strangely OOMs with 1 driver but passes with 4 drivers
Q21 concatenate exceeds row limit. needs a streaming join? for anti or right semi (filter) perhaps tbd

"nvCOMP crash" impacting Q3,5,8,9,10,14,20 on 4 drivers Exceptions.h:66] Line: /workspace/velox/velox/exec/Driver.cpp:574, Function:operator(), Expression: Operator::getOutput failed for [operator: TableScan, plan node ID: 3]: CUDF failure at: /buildcache/release/_deps/cudf-src/cpp/src/io/parquet/reader_impl_chunking_utils.cu:627: Error during decompression, Source: RUNTIME, ErrorCode: INVALID_STATE

Q17 OOM

Image

Steadily builds up tablescan-join-aggregate

cudf-polars solves this with spilling in UVM

https://drive.google.com/file/d/1EiYrMthoQHYWleLpNzHocFx8E5vCoSDt/view?usp=drive_link

Image

Q18 OOM

Polars spills heavily
https://drive.google.com/file/d/1c5Ev4fI27yQur5aYlnxbUKuqaMmqipvZ/view?usp=drive_link
Image

System information

GH200
https://github.com/rapidsai/velox-testing on main
Velox on mattgara/upstream-tpch-bench-fixes

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementImproves an existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions