[AutoSparkUT] Fix GpuScalar to preserve -0.0 for float/double (issue #14116)#14400
[AutoSparkUT] Fix GpuScalar to preserve -0.0 for float/double (issue #14116)#14400wjxiz1992 wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
…VIDIA#14116) cuDF's Scalar.fromDouble(-0.0) normalizes -0.0 to 0.0, losing the sign bit. This caused GPU scalar subqueries to return 0.0 where CPU correctly returns -0.0, violating GPU-CPU parity. Root cause: the JNI path Scalar.fromDouble -> makeFloat64Scalar drops the IEEE 754 sign bit of negative zero during scalar creation. Fix: in GpuScalar.from(), create float/double scalars via a 1-element ColumnVector + getScalarElement(0) instead of Scalar.fromDouble/fromFloat. The column-based path preserves the exact bit pattern. This re-enables the previously excluded test "normalize special floating numbers in subquery" in RapidsSQLQuerySuite. Closes NVIDIA#14116 Signed-off-by: Allen Xu <allxu@nvidia.com> Made-with: Cursor
Greptile SummaryThis PR fixes a GPU-CPU parity bug where Key changes:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["GpuScalar.from(v, dataType)"] --> B{dataType}
B --> |DoubleType| C{v type?}
B --> |FloatType| G{v type?}
C --> |"Double d"| D{"is negative zero?"}
C --> |"Float f"| E["d = f.toDouble"] --> D
D --> |yes| F["ColumnVector.fromDoubles(d) then getScalarElement(0) - preserves sign bit"]
D --> |no| N["Scalar.fromDouble(d) - fast path"]
G --> |"Float f"| H{"is negative zero?"}
H --> |yes| I["ColumnVector.fromFloats(f) then getScalarElement(0) - preserves sign bit"]
H --> |no| J["Scalar.fromFloat(f) - fast path"]
F --> K[Return Scalar]
N --> K
I --> K
J --> K
|
There was a problem hiding this comment.
Pull request overview
Fixes GPU/CPU parity for scalar subquery results involving signed zero by ensuring GpuScalar.from preserves the IEEE-754 -0.0 bit pattern for float/double values, and re-enables the previously excluded Spark-derived test that covers this case.
Changes:
- Update
GpuScalar.fromto create float/double cuDF scalars via a 1-elementColumnVector+getScalarElement(0)to preserve-0.0. - Remove the exclusion for
"normalize special floating numbers in subquery"from Spark 3.3.0 test settings.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala |
Changes float/double scalar creation path to preserve -0.0 bit pattern. |
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala |
Re-enables the previously excluded subquery normalization test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case d: Double => | ||
| // cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116). | ||
| // Create via a 1-element column to preserve the exact bit pattern. | ||
| withResource(ColumnVector.fromDoubles(d)) { cv => | ||
| cv.getScalarElement(0) | ||
| } | ||
| case f: Float => | ||
| withResource(ColumnVector.fromDoubles(f.toDouble)) { cv => | ||
| cv.getScalarElement(0) | ||
| } | ||
| case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" + | ||
| s" for DoubleType, expecting Double or Float.") |
There was a problem hiding this comment.
Creating float/double scalars by materializing a 1-element ColumnVector will allocate/copy device memory on every scalar creation, which can be a noticeable perf/regression vs Scalar.fromDouble/fromFloat (GpuScalar.from is used in many code paths). Consider using the ColumnVector+getScalarElement workaround only for the specific problematic value (signed zero), and keep the fast Scalar.fromDouble path for all other doubles/floats.
| case d: Double => | |
| // cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116). | |
| // Create via a 1-element column to preserve the exact bit pattern. | |
| withResource(ColumnVector.fromDoubles(d)) { cv => | |
| cv.getScalarElement(0) | |
| } | |
| case f: Float => | |
| withResource(ColumnVector.fromDoubles(f.toDouble)) { cv => | |
| cv.getScalarElement(0) | |
| } | |
| case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" + | |
| s" for DoubleType, expecting Double or Float.") | |
| case d: Double => | |
| // cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116). | |
| // Use a 1-element column only for negative zero to preserve the exact bit pattern. | |
| if (d == 0.0 && JDouble.doubleToRawLongBits(d) == JDouble.doubleToRawLongBits(-0.0d)) { | |
| withResource(ColumnVector.fromDoubles(d)) { cv => | |
| cv.getScalarElement(0) | |
| } | |
| } else { | |
| Scalar.fromDouble(d) | |
| } | |
| case f: Float => | |
| // Preserve negative zero for floats as well. | |
| if (f == 0.0f && JFloat.floatToRawIntBits(f) == JFloat.floatToRawIntBits(-0.0f)) { | |
| withResource(ColumnVector.fromDoubles(f.toDouble)) { cv => | |
| cv.getScalarElement(0) | |
| } | |
| } else { | |
| Scalar.fromDouble(f.toDouble) | |
| } | |
| case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" + | |
| s" for DoubleType, expecting Double or Float.") |
| // Create via a 1-element column to preserve the exact bit pattern. | ||
| withResource(ColumnVector.fromFloats(f)) { cv => | ||
| cv.getScalarElement(0) |
There was a problem hiding this comment.
Same perf concern as the DoubleType branch: using ColumnVector.fromFloats(...).getScalarElement(0) for every float scalar creation adds extra allocations/copies. If the only correctness issue is preserving -0.0f, consider detecting signed zero (via raw bits) and only using the column-based workaround in that case, otherwise use Scalar.fromFloat.
| // Create via a 1-element column to preserve the exact bit pattern. | |
| withResource(ColumnVector.fromFloats(f)) { cv => | |
| cv.getScalarElement(0) | |
| // For -0.0f, create via a 1-element column to preserve the exact bit pattern. | |
| if (JFloat.floatToRawIntBits(f) == JFloat.floatToRawIntBits(-0.0f)) { | |
| withResource(ColumnVector.fromFloats(f)) { cv => | |
| cv.getScalarElement(0) | |
| } | |
| } else { | |
| Scalar.fromFloat(f) |
revans2
left a comment
There was a problem hiding this comment.
Why? I get that Spark has a test for this, but what value is there is keeping -0.0 instead of normalizing it to 0.0? IEEE treats them all as the same, and spark has had bugs where -0.0 can cause issues. Does creating a column so that we can pull out a scalar from it cost any performance? It will cost some memory at least. Do we know where cudf is normalizing this? I just want questions answered before we put in a change like this.
Address review feedback: instead of routing all float/double scalar creation through ColumnVector (which allocates device memory), detect -0.0 via raw bit comparison and only use the slow path for that specific value. All other values continue to use the fast Scalar.fromDouble/ Scalar.fromFloat path, making the common-case cost zero. Signed-off-by: Allen Xu <allxu@nvidia.com> Made-with: Cursor Signed-off-by: Allen Xu <allxu@nvidia.com>
|
@revans2 Let me add more background on the UT fix work:
The debug code used in the suite was: Test output was:
================================== A more dedicated test:
Repro: I didn't file issue to cuDF as this
|
|
@wjxiz1992 I think you might have misunderstood my comments. -0.0 vs 0.0 is a very minor thing. It feels very inconsequential to have a scalar value be different. I get that we want to match all of spark's unit tests, but if it is going to make the code less maintainable and possibly slower, then we need to think about the cost of being 100% compatible with spark and decide if it is worth it. We also need to think about the change that we are making and ask ourselves if this is the right place to make the change. Here we are making multiple tiny memory allocations and copies to make it so that we can preserve the -0.0. That is not worth it to me, which is why I asked where the error came from. We are not explicitly normalizing -0.0 to 0.0 in the code you pointed to, so I want to understand if there is something we can do there that would let us fix this in a much more efficient way. I also want to understand who added the test and why that added the test. What possible case exists that we need to preserve this, especially when spark strips it out in so many situations. For me I am +1 if we can do this in a way that does not cause any more GPU memory allocations or GPU data movement, or make the code more difficult to work with. I am -1 in all other cases unless we can prove that this is critical to some real world situation. |
Summary
Scalar.fromDouble(-0.0)normalizes-0.0to0.0, losing the IEEE 754 sign bit. This caused scalar subquery results to return0.0where CPU returns-0.0, violating GPU-CPU parity.GpuScalar.from(), create float/double scalars via a 1-elementColumnVector+getScalarElement(0)instead ofScalar.fromDouble/Scalar.fromFloat. The column-based path preserves the exact bit pattern..exclude()for"normalize special floating numbers in subquery"inRapidsSQLQuerySuite.Detail
The
GpuScalarSubquerypath extracts a JVMDoublefrom the inner query result, then creates aGpuScalarviaScalar.fromDouble(value). cuDF's JNI layer (makeFloat64Scalar) normalizes-0.0to0.0during scalar creation. This was confirmed by direct cuDF API testing:The fix replaces
Scalar.fromDouble(d)withColumnVector.fromDoubles(d).getScalarElement(0), which preserves the bit pattern through the host-to-device column creation path.RAPIDS test to Spark original mapping
normalize special floating numbers in subquery(inherited)normalize special floating numbers in subquerysql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scalaTest plan
mvn package -pl tests -am -Dbuildver=330 -DwildcardSuites=RapidsSQLQuerySuite-- succeeded 216, failed 0, ignored 17 (previously: succeeded 215, ignored 18)Closes #14116
Made with Cursor