Skip to content

[GLUTEN-10134][VL] Add ANSI mode support for cast string to boolean#11437

Open
malinjawi wants to merge 2 commits intoapache:mainfrom
malinjawi:feature/ansi-cast-string-to-boolean
Open

[GLUTEN-10134][VL] Add ANSI mode support for cast string to boolean#11437
malinjawi wants to merge 2 commits intoapache:mainfrom
malinjawi:feature/ansi-cast-string-to-boolean

Conversation

@malinjawi
Copy link
Contributor

What changes are proposed in this pull request?

This PR implements ANSI-compliant string to boolean casting for the Velox backend, addressing part of issue #10134 (ANSI mode support).

Key Changes:

  1. C++ Implementation: Added CastStringToBooleanAnsi.h with a custom Velox function that implements Spark's ANSI cast semantics for string-to-boolean conversion
  2. Function Registration: Registered spark_cast_string_to_boolean_ansi function in Velox's function registry
  3. Scala Integration: Updated CastTransformer to detect ANSI mode and route string-to-boolean casts to the custom function
  4. Literal Optimization: Added compile-time evaluation for literal casts to improve performance
  5. Test Coverage: Added comprehensive test suites for both ANSI and non-ANSI modes

Behavior:

  • In ANSI mode, accepts case-insensitive: t, true, y, yes, 1 (true) and f, false, n, no, 0 (false)
  • Invalid inputs throw VELOX_USER_FAIL exception with descriptive error message
  • Whitespace is trimmed before validation
  • Matches Spark's ANSI cast behavior exactly

Fixes #10134 (partial - cast string to boolean component)

How was this patch tested?

Test Coverage:

  1. ANSI Mode Tests (CastStringToBooleanAnsiValidateSuite.scala):

    • Valid true/false string variations (case-insensitive)
    • Invalid strings that should throw exceptions
    • Null handling
    • Mixed valid/invalid values
    • WHERE clause filtering
    • Whitespace handling
  2. Non-ANSI Mode Tests (CastStringToBooleanValidateSuite.scala):

    • Valid string conversions
    • Invalid strings returning null (non-ANSI behavior)
    • Mixed valid/invalid/null values
    • Empty and whitespace strings
    • All valid boolean string variations

Validation:

  • All tests compare Gluten results against vanilla Spark to ensure behavioral parity
  • Tests verify that the custom ANSI function is used in the execution plan
  • Tests confirm proper fallback behavior when ANSI mode is disabled
  • Error handling matches Spark's exception messages and behavior

Manual Testing:

  • Tested with spark.sql.ansi.enabled=true and false
  • Verified execution plans use spark_cast_string_to_boolean_ansi in ANSI mode
  • Confirmed standard Velox cast is used in non-ANSI mode

Implements ANSI-compliant string to boolean casting that throws
exceptions for invalid inputs instead of returning null.

Changes:
- Add CastStringToBooleanAnsi.h with ANSI-compliant cast logic
- Register spark_cast_string_to_boolean_ansi function in Velox
- Update CastTransformer to route ANSI casts to custom function
- Add literal evaluation optimization for compile-time casts
- Include validation test suites for both ANSI and non-ANSI modes

Contributes to issue apache#10134 (ANSI mode support)
@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Jan 17, 2026
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@malinjawi malinjawi mentioned this pull request Jan 18, 2026
24 tasks
@PHILO-HE
Copy link
Member

@malinjawi, thanks for drafting this PR. We generally implement Spark functions in Velox, not in Gluten. I suggest firstly evaluating Velox's existing implementation for cast to see if we can just add if/else branch for ANSI support or add a separate implementation (if most code is not shared).

Velox is now aware of the Spark ANSI setting. So you may get the ANSI enabling state from Velox config to implement some branching logic if needed.

With the cast implemented in Velox, I assume most test cases proposed for Gluten can be moved to Velox.

@zzcclp
Copy link
Contributor

zzcclp commented Jan 19, 2026

Run Gluten Clickhouse CI on x86

@malinjawi
Copy link
Contributor Author

@malinjawi, thanks for drafting this PR. We generally implement Spark functions in Velox, not in Gluten. I suggest firstly evaluating Velox's existing implementation for cast to see if we can just add if/else branch for ANSI support or add a separate implementation (if most code is not shared).

Velox is now aware of the Spark ANSI setting. So you may get the ANSI enabling state from Velox config to implement some branching logic if needed.

With the cast implemented in Velox, I assume most test cases proposed for Gluten can be moved to Velox.

Thanks for the reply @PHILO-HE
I’ll move the implementation into Velox and evaluate the existing cast path there, using the Spark ANSI setting exposed via Velox config.

Just to confirm from a Gluten integration perspective: once the cast behavior is implemented and tested in Velox, are there any additional changes or validation needed in Gluten itself (e.g., wiring, config propagation, or integration tests), or should the Velox src/test coverage be sufficient?

I want to make sure I’m aligning with the expected ownership and test boundaries going forward.

@malinjawi
Copy link
Contributor Author

@PHILO-HE Thanks again for the help. I have gone ahead and raised a fix on velox PR: facebookincubator/velox#16059

Let me know if I should go ahead and close this PR or if anything else is needed from a gluten side.

@marin-ma
Copy link
Contributor

Just to confirm from a Gluten integration perspective: once the cast behavior is implemented and tested in Velox, are there any additional changes or validation needed in Gluten itself (e.g., wiring, config propagation, or integration tests), or should the Velox src/test coverage be sufficient?

You can add GlutenCastWithAnsiOnSuiteby following the same approach used to add GlutenCastWithAnsiOffSuite, and only include the test cases related to "cast from string to boolean" in VeloxTestSettings.

@PHILO-HE
Copy link
Member

Just to confirm from a Gluten integration perspective: once the cast behavior is implemented and tested in Velox, are there any additional changes or validation needed in Gluten itself (e.g., wiring, config propagation, or integration tests), or should the Velox src/test coverage be sufficient?

@malinjawi, Gluten has a test framework that allows importing Spark unit tests for the integration validation. We have imported many Spark test suites, like GlutenCastWithAnsiOffSuite as mentioned by @marin-ma. You can add or enable the related Spark test suites in this PR after Velox PR is merged.

@github-actions
Copy link

github-actions bot commented Mar 6, 2026

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core stale stale VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Add ANSI mode support

4 participants