Skip to content

cuDF AST Expression Parser Crashes on Nested DOUBLE Arithmetic #104

@patdevinwilson

Description

@patdevinwilson

Bug description

Location:* cudf/cpp/src/ast/expression_parser.cpp:244 (25.10) and :233 (25.12)

Issue: Type inference in AST expression parser fails to properly handle nested arithmetic operations on DOUBLE types.

Why SF100 works but SF1000 fails:

  • Different query plans or expression complexity
  • Possible threshold in AST depth/complexity
  • May be related to data statistics affecting plan generation

Type Mismatch Details

The expression involves:

  • Operands: DOUBLE (l_extendedprice, l_discount, l_tax)
  • Literals: DOUBLE (1.0)
  • Operations: multiply, minus, plus
  • Result expected: DOUBLE

The AST parser incorrectly identifies a type mismatch in the nested structure.

Impact

Severity: Critical

  • Crashes query execution - No graceful degradation
  • Bypass fallback mechanism - Error occurs before CPU fallback can engage
  • Blocks production workloads - SF1000 cannot run with cuDF enabled
  • Affects multiple queries - Not limited to Q1

Affected Queries

TPC-H queries with nested arithmetic on DOUBLE columns:

  • Q1 (revenue calculations)
  • Q6 (revenue with filters)
  • Others with similar expression patterns

System information

NVIDIA B200 GPU system with Presto native worker

Relevant logs

presto.default.lte
  l_shipdate
  literal
presto.default.multiply
  l_extendedprice
  presto.default.minus
    literal
    l_discount
presto.default.multiply
  presto.default.multiply
    l_extendedprice
    presto.default.minus
      literal
      l_discount
  presto.default.plus
    literal
    l_tax
I20251218 19:24:06.303020   480 CudfFilterProject.cpp:171] expr[0] presto.default.lte(l_shipdate, 1998-09-02:DATE)
I20251218 19:24:06.303360   480 CudfFilterProject.cpp:171] expr[2] presto.default.multiply(l_extendedprice, presto.default.minus(1:DOUBLE, l_discount))
I20251218 19:24:06.303369   480 CudfFilterProject.cpp:171] expr[4] presto.default.multiply(presto.default.multiply(l_extendedprice, presto.default.minus(1:DOUBLE, l_discount)), presto.default.plus(1:DOUBLE, l_tax))
E20251218 19:24:20.422995   480 Exceptions.h:53] Line: /presto_native_staging/presto/velox/velox/exec/Driver.cpp:577, Function:operator(), Expression:  Operator::getOutput failed for [operator: CudfFilterProject, plan node ID: 2]: CUDF failure at:/presto_gpu_all_source_v2/_deps/cudf-src/cpp/src/ast/expression_parser.cpp:244: An AST expression was provided non-matching operand types., Source: RUNTIME, ErrorCode: INVALID_STATE
E20251218 19:24:20.422998   479 Exceptions.h:53] Line: /presto_native_staging/presto/velox/velox/exec/Driver.cpp:577, Function:operator(), Expression:  Operator::getOutput failed for [operator: CudfFilterProject, plan node ID: 2]: CUDF failure at:/presto_gpu_all_source_v2/_deps/cudf-src/cpp/src/ast/expression_parser.cpp:244: An AST expression was provided non-matching operand types., Source: RUNTIME, ErrorCode: INVALID_STATE
I20251218 19:24:20.461575   477 TaskManager.cpp:873] Deleting task 20251218_192406_00002_w2d3j.0.0.0.0
I20251218 19:24:20.884642   488 TaskManager.cpp:778] Starting task 20251218_192420_00003_w2d3j.0.0.0.0 with 2 max drivers.
I20251218 19:24:20.884697   488 Task.cpp:1156] initializing OutputBufferManager with 1 partitions and 1 drivers
I20251218 19:24:20.884764   488 ToCudf.cpp:87] Operators before adapting for cuDF: count [3]
I20251218 19:24:20.884768   488 ToCudf.cpp:90]   Operator: ID 0: LocalExchange(0)
I20251218 19:24:20.884773   488 ToCudf.cpp:90]   Operator: ID 1: TopN[1997] 1
I20251218 19:24:20.884776   488 ToCudf.cpp:90]   Operator: ID 2: PartitionedOutput[43] 2
I20251218 19:24:20.884780   488 ToCudf.cpp:93] allowCpuFallback = 1
I20251218 19:24:20.884786   488 ToCudf.cpp:475] Operator: ID 0: LocalExchange(0), keepOperator = 1, replaceOp.size() = 0
I20251218 19:24:20.884790   488 ToCudf.cpp:506] GpuReplacedOperator = 0, GpuRetainedOperator = 1
--
I20251218 19:25:39.101271   532 CudfHashJoin.cpp:194] Build batch 283: number of rows: 100198
I20251218 19:25:39.101281   532 CudfHashJoin.cpp:194] Build batch 284: number of rows: 99775
I20251218 19:25:39.101284   532 CudfHashJoin.cpp:194] Build batch 285: number of rows: 100261
I20251218 19:25:39.101289   532 CudfHashJoin.cpp:194] Build batch 286: number of rows: 99338
I20251218 19:25:39.101291   532 CudfHashJoin.cpp:194] Build batch 287: number of rows: 99939
I20251218 19:25:39.101294   532 CudfHashJoin.cpp:194] Build batch 288: number of rows: 100136
I20251218 19:25:39.101297   532 CudfHashJoin.cpp:194] Build batch 289: number of rows: 100127
I20251218 19:25:39.101301   532 CudfHashJoin.cpp:194] Build batch 290: number of rows: 99830
I20251218 19:25:39.101305   532 CudfHashJoin.cpp:194] Build batch 291: number of rows: 99966
I20251218 19:25:39.101307   532 CudfHashJoin.cpp:194] Build batch 292: number of rows: 100315
I20251218 19:25:39.101310   532 CudfHashJoin.cpp:194] Build batch 293: number of rows: 100009
I20251218 19:25:39.101315   532 CudfHashJoin.cpp:194] Build batch 294: number of rows: 99901
I20251218 19:25:39.101317   532 CudfHashJoin.cpp:194] Build batch 295: number of rows: 99996
I20251218 19:25:39.101320   532 CudfHashJoin.cpp:194] Build batch 296: number of rows: 99297
I20251218 19:25:39.101323   532 CudfHashJoin.cpp:194] Build batch 297: number of rows: 100529
I20251218 19:25:39.101327   532 CudfHashJoin.cpp:194] Build batch 298: number of rows: 99965
I20251218 19:25:39.101330   532 CudfHashJoin.cpp:194] Build batch 299: number of rows: 100243
I20251218 19:25:39.102464   532 CudfHashJoin.cpp:212] Build table number of columns: 1
I20251218 19:25:39.102473   532 CudfHashJoin.cpp:215] Build table 0: number of rows: 29998152
I20251218 19:25:39.103919   532 CudfHashJoin.cpp:249] hashObject 0 is not nullptr 0x626058003400
E20251218 19:25:48.652027   532 Exceptions.h:53] Line: /presto_native_staging/presto/velox/velox/exec/Driver.cpp:577, Function:operator(), Expression:  Operator::getOutput failed for [operator: CudfFilterProject, plan node ID: 465]: CUDF failure at:/presto_gpu_all_source_v2/_deps/cudf-src/cpp/src/ast/expression_parser.cpp:244: An AST expression was provided non-matching operand types., Source: RUNTIME, ErrorCode: INVALID_STATE
I20251218 19:25:48.689563   487 TaskManager.cpp:873] Deleting task 20251218_192536_00006_w2d3j.0.0.0.0
I20251218 19:25:48.775806   536 TaskManager.cpp:778] Starting task 20251218_192548_00007_w2d3j.0.0.0.0 with 2 max drivers.
I20251218 19:25:48.775846   536 Task.cpp:1156] initializing OutputBufferManager with 1 partitions and 1 drivers
I20251218 19:25:48.775883   536 ToCudf.cpp:87] Operators before adapting for cuDF: count [2]
I20251218 19:25:48.775887   536 ToCudf.cpp:90]   Operator: ID 0: LocalMerge[984] 0
I20251218 19:25:48.775892   536 ToCudf.cpp:90]   Operator: ID 1: PartitionedOutput[26] 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions