Support from_protobuf expression by thirtiseven · Pull Request #14354 · NVIDIA/spark-rapids

thirtiseven · 2026-03-03T08:48:19Z

Fixes #14069.

Description

This PR is a huge PR to support a (big) subset in from_protobuf expression.

I will add documents, performance numbers, and other informations in this PR very soon.

I suppose this PR will be split into smaller ones that will be merged over time.

Checklists

This PR has added documentation for new or modified features or behaviors.
This PR has added new tests or modified existing tests to cover new code paths.
(Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-04T02:46:34Z

@greptileai full review

greptile-apps · 2026-03-04T02:54:51Z

Greptile Summary

This PR adds GPU acceleration for Spark's from_protobuf expression, implementing a complete decode pipeline: reflection-based proto descriptor analysis, flattened schema construction, schema pruning (only decode fields referenced downstream), ordinal remapping into the pruned output, and JNI dispatch via Protobuf.decodeToStruct. It introduces five new Scala files, a large Python integration test suite, and test infra for automatically downloading spark-protobuf JARs.

The prior review rounds addressed an extensive set of critical bugs (proto3 acceptance on reflection failure, willNotWorkOnGpu fallbacks, hasDefaultValue flag using wrong variable, shim JSON header gaps, BinaryType default value, reference-equality issues in schema deduplication, and many more). The current revision is substantially cleaner.

Key remaining items:

invokeBuildDescriptor retry is fragile: The Spark 3.5+ compatibility retry in SparkProtobufCompat only catches ClassCastException/MatchError from the InvocationTargetException. Other runtime exceptions that could arise from the binary-vs-string payload mismatch (e.g., IllegalArgumentException, InvalidProtocolBufferException) escape unhandled and would fail the query instead of falling back to CPU.
nestedMsgDesc parameter semantics: In the recursive struct traversal (addFieldWithChildren / addChildFieldsFromStruct), nestedMsgDesc carries the descriptor of the containing message (not the current field's own message), which is counter-intuitive and could cause maintenance errors. A rename and clarifying comment would help.
extractFieldInfo loses primary unsupported reason: When checkFieldSupport flags a type-mismatch and defaultValueResult independently returns Left, the actionable type-mismatch message is silently replaced by the default-value reflection error, making diagnostics harder.
ENABLE_PROTOBUF_BATCH_MERGE_AFTER_PROJECT defaults to false: The post-project coalesce optimization is permanently disabled until users explicitly flip an internal flag, with no log-level indication that it is off.
PR checklists are open: Documentation, performance numbers, and test coverage descriptions are all unchecked. The PR description itself notes this work is intended to be split into smaller pieces before merge.

Confidence Score: 2/5

Not yet safe to merge: all three PR checklist items (docs, tests, perf) are unchecked, and the PR description explicitly states it will be split into smaller PRs before landing.
The implementation is architecturally sound and a large number of prior critical bugs have been addressed in review iterations. However, the PR is self-described as incomplete (documentation, performance data, and test-coverage descriptions are all TODO), and one logic-level issue remains in the Spark 3.5+ descriptor retry path that could cause queries to fail rather than fall back to CPU. The post-project coalesce optimization is also silently disabled by default. Given the stated intent to split this into smaller PRs and the incomplete checklist, a score of 2 reflects that more work is needed before this is merge-ready.
SparkProtobufCompat.scala (retry exception coverage), ProtobufExprShims.scala (recursive struct traversal parameter naming and analyzeRequiredFields guard logic), and basicPhysicalOperators.scala (post-project coalesce default).

Important Files Changed

Filename	Overview
sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala	Core GPU tagging logic for from_protobuf: handles schema analysis, field pruning, ordinal remapping, and flat schema construction. Previous review threads addressed many critical bugs (unsupported field handling, proto3 rejection, willNotWorkOnGpu fallbacks). Remaining concerns: confusingly named `nestedMsgDesc` parameter in recursive struct traversal, and redundant `collectedExprs.isEmpty` guard in `analyzeRequiredFields`.
sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/SparkProtobufCompat.scala	Reflection-based compatibility shim for Spark's spark-protobuf module. Handles Spark 3.4 vs 3.5+ API differences, proto syntax detection, default-value extraction, and descriptor resolution. The Spark 3.5+ retry in `invokeBuildDescriptor` only catches `ClassCastException` and `MatchError` — other runtime exceptions from the binary-descriptor mismatch could escape and fail the query instead of falling back to CPU.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuFromProtobuf.scala	GPU expression that drives JNI-level protobuf decoding. Correctly overrides `equals`/`hashCode` using `java.util.Arrays` for array fields, adds a safety-net logging catch for unexpected `CudfException` in PERMISSIVE mode, and documents that `ProtobufSchemaDescriptor` is a pure-Java holder requiring no explicit close.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/protobuf/ProtobufSchemaExtractor.scala	Field support analysis and wire-type resolution. `analyzeAllFields` correctly records reflection failures as `isSupported = false` rather than returning `Left` immediately, enabling pruning of unreachable fields. Minor issue: when both `checkFieldSupport` and `defaultValueResult` produce errors, the primary type-mismatch reason is silently dropped in favour of the default-value reflection error.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/protobuf/ProtobufSchemaValidator.scala	Flat-schema construction and default-value encoding. `encodeDefaultValue` now returns `Either` instead of throwing, propagating type mismatches as CPU-fallback signals. Enum, binary, string, and numeric default values are all handled correctly.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/protobuf/ProtobufSchemaModel.scala	Data model classes for the protobuf schema pipeline. `DescriptorBytes.equals`/`hashCode` correctly use `java.util.Arrays` for content-based comparison, fixing the `sameDecodeSemantics` false-negative issue from the prior thread.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala	Adds post-project coalesce logic for protobuf-projecting `GpuProjectExec` and `GpuProjectAstExec`. The feature is gated by `ENABLE_PROTOBUF_BATCH_MERGE_AFTER_PROJECT` which defaults to `false` (internal flag), meaning the optimization is silently disabled in production by default.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/complexTypeExtractors.scala	Adds `GpuGetStructFieldMeta` and `GpuGetArrayStructFieldsMeta` that read `PRUNED_ORDINAL_TAG` to remap field ordinals into the pruned decoded schema. `GpuGetArrayStructFieldsMeta` now correctly derives `effectiveNumFields` from the post-pruning child type.
integration_tests/run_pyspark_from_build.sh	Adds automatic download of spark-protobuf and protobuf-java JARs at test time. Version detection auto-reads the bundled Spark JAR, with a per-version fallback table. Previous issues (curl --fail, leading-space classpath, quoting) have been addressed.
integration_tests/src/main/python/spark_init_internal.py	Adds `_add_driver_classpath` helper that merges new JARs into the existing `--driver-class-path` in `PYSPARK_SUBMIT_ARGS`. Previous issues (early return, unescaped regex replacement, comma split) are resolved. `re` is already imported at the top of the file.
integration_tests/src/main/python/protobuf_test.py	3826-line integration test suite covering scalar, nested, repeated, enum, and edge-case protobuf decode scenarios. Test helper gracefully skips when spark-protobuf JVM classes are absent. Previous issues (xfail markers, options-drop on legacy API path) are addressed.
sql-plugin/src/test/scala/com/nvidia/spark/rapids/shims/ProtobufExprShimsSuite.scala	Unit tests for the shim layer covering schema validation, default-value encoding, flatten-schema construction, and descriptor-source equality. Good coverage of the error paths added in previous review iterations.

Sequence Diagram

sequenceDiagram
    participant Catalyst as Catalyst Optimizer
    participant Shim as ProtobufExprShims (tagExprForGpu)
    participant Compat as SparkProtobufCompat
    participant Extractor as ProtobufSchemaExtractor
    participant Validator as ProtobufSchemaValidator
    participant GPU as GpuFromProtobuf (doColumnar)
    participant JNI as Protobuf.decodeToStruct (JNI)

    Catalyst->>Shim: tagExprForGpu(ProtobufDataToCatalyst)
    Shim->>Compat: extractExprInfo(expr) → ProtobufExprInfo
    Compat-->>Shim: messageName, descriptorSource, options
    Shim->>Compat: resolveMessageDescriptor(exprInfo) → ProtobufMessageDescriptor
    Compat-->>Shim: ReflectiveMessageDescriptor (via reflection)
    Shim->>Extractor: analyzeAllFields(schema, msgDesc, enumsAsInts)
    Extractor-->>Shim: Map[fieldName → ProtobufFieldInfo]
    Shim->>Shim: analyzeRequiredFields() → Set[requiredFieldNames]
    Note over Shim: Schema pruning: only required fields decoded
    Shim->>Shim: registerPrunedOrdinals() on GetStructField/GetArrayStructFields
    Note over Shim: PRUNED_ORDINAL_TAG set on downstream extractors
    loop for each required field
        Shim->>Validator: toFlattenedFieldDescriptor(path, field, info)
        Validator-->>Shim: FlattenedFieldDescriptor
    end
    Shim->>Validator: validateFlattenedSchema(flatFields)
    Shim->>Shim: convertToGpu() → GpuFromProtobuf

    Catalyst->>GPU: doColumnar(inputBinaryColumn)
    GPU->>JNI: Protobuf.decodeToStruct(input, ProtobufSchemaDescriptor, failOnErrors)
    JNI-->>GPU: cudf.ColumnVector (struct)
    GPU->>GPU: mergeAndSetValidity (apply input nulls)
    GPU-->>Catalyst: decoded StructType column

Comments Outside Diff (1)

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/SparkProtobufCompat.scala, line 1174-1185 (link)

DescriptorPath retry catches too narrow a set of exceptions

The retry for Spark 3.5+ only triggers on ClassCastException or MatchError wrapped in InvocationTargetException. However, depending on how Spark 3.5+'s buildDescriptor validates its Option[Array[Byte]] argument when a String is passed, it could also throw IllegalArgumentException, InvalidProtocolBufferException, or UnsupportedOperationException — none of which are caught here. In that case, the InvocationTargetException propagates out of the Try in resolveMessageDescriptor as a non-recoverable failure, causing the entire query to fail rather than falling back gracefully to CPU.

Consider catching a broader class of exceptions to defensively cover the version-mismatch case:

_{Last reviewed commit: 16f2f6e}

integration_tests/src/main/python/spark_init_internal.py

integration_tests/src/main/python/protobuf_test.py

integration_tests/run_pyspark_from_build.sh

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-05T02:58:31Z

@greptileai full review

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuFromProtobuf.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-05T04:43:07Z

@greptileai full review again

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-05T05:38:17Z

@greptileai full review again

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-05T06:02:18Z

@greptileai full review again

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala

integration_tests/run_pyspark_from_build.sh

integration_tests/src/main/python/spark_init_internal.py

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-06T03:32:31Z

@greptile please check again

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-06T04:06:01Z

@greptile please check again

integration_tests/src/main/python/protobuf_test.py

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-06T07:47:22Z

@greptile please check again

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuFromProtobuf.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-12T03:27:22Z

@greptile please check again

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/protobuf/ProtobufSchemaValidator.scala

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/SparkProtobufCompat.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuFromProtobuf.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-12T06:47:53Z

@greptile review

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/protobuf/ProtobufSchemaModel.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/complexTypeExtractors.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-12T07:24:53Z

@greptile review

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/SparkProtobufCompat.scala

thirtiseven · 2026-03-12T08:16:53Z

@greptile review

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-12T08:17:51Z

@greptile review

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/protobuf/ProtobufSchemaExtractor.scala

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2026-03-12T08:55:30Z

@greptile review

greptile-apps · 2026-03-12T09:04:11Z

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala

+        }
+


addFieldWithChildren passes wrong parentIdx for doubly-nested structs

In addChildFieldsFromStruct, every child addFieldWithChildren call passes parentIdx — the parent index received by addChildFieldsFromStruct — as the parent for all children of the struct. This is correct for one level of nesting, but once addFieldWithChildren is called recursively for a deeper struct child, it calls back into addChildFieldsFromStruct passing currentIdx as the new parentIdx. Inside that recursive call to addChildFieldsFromStruct, the children's grandchildren then receive currentIdx (the deep struct's own flat-schema index) as their parent, which is correct.

However, the issue is that addFieldWithChildren calls addChildFieldsFromStruct(st, nestedMsgDesc, sf.name, currentIdx, depth, pathPrefix) — it passes nestedMsgDesc (the parent message descriptor passed in from the outer addChildFieldsFromStruct) as the parentMsgDesc of the child struct, not sf's own message descriptor. Then inside the recursive addChildFieldsFromStruct, it calls parentMsgDesc.findField(fieldName) using the outer message descriptor to find sf, which is correct. But then it uses parentField.get.messageDescriptor to get sf's own message type for looking up its children — this is also correct.

Wait — actually the concern is that when the code does:

case ArrayType(st: StructType, _) if nestedMsgDesc != null => addChildFieldsFromStruct(st, nestedMsgDesc, sf.name, currentIdx, depth, pathPrefix)

nestedMsgDesc is the parent container's descriptor (not sf's message descriptor). addChildFieldsFromStruct then does parentMsgDesc.findField(sf.name) to locate sf within its parent, which is correct. But this means nestedMsgDesc must be the descriptor of the message that contains sf, not sf's own message descriptor. For top-level calls this is msgDesc (the root), which is correct. For recursive calls from addChildFieldsFromStruct, childMsgDesc is passed to addFieldWithChildren — but that childMsgDesc is sf's own message descriptor, yet it's passed as nestedMsgDesc which is then used as the container descriptor for sf's children. This causes addChildFieldsFromStruct to call childMsgDesc.findField(childSf.name) where childMsgDesc is sf's descriptor — which is correct.

This is actually fine, but the recursive semantics of nestedMsgDesc are counter-intuitive: on each level it carries the descriptor of the current struct (to find its own fields), not the parent's. This works because addFieldWithChildren immediately passes the same nestedMsgDesc downward as the descriptor of the struct it just emitted. For correctness this is OK, but the parameter is named confusingly. Consider renaming nestedMsgDesc to containingMsgDesc and adding a clarifying comment to avoid future mistakes.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-03-12T09:04:12Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala

@@ -76,7 +121,7 @@ class GpuProjectExecMeta(
        }


ENABLE_PROTOBUF_BATCH_MERGE_AFTER_PROJECT defaults to false, silently disabling the post-project coalesce optimization

The config's default is false and is marked internal(). This means users who deploy protobuf decoding with schema pruning will not benefit from the post-project coalesce even though it is the correct behavior to request a coalesce after producing small projected batches. The intent of this config appears to be a safety valve rather than a permanent opt-in gate.

Consider whether the default should be true once the feature is considered stable, and document in the config's doc string what conditions must hold before enabling it, so operators know when it is safe to flip. Otherwise, production deployments will silently miss the optimization without any indication in the logs that it is disabled.

greptile-apps · 2026-03-12T09:04:14Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/protobuf/ProtobufSchemaExtractor.scala

+        isRequired = fieldDescriptor.isRequired,
+        defaultValue = defaultValue,
+        enumMetadata = fieldDescriptor.enumMetadata,
+        isRepeated = fieldDescriptor.isRepeated
+      )
+    }
+  }
+


extractFieldInfo silently drops the checkFieldSupport unsupported reason when defaultValueResult is Left

When checkFieldSupport returns (false, Some("type mismatch: ..."), ...) but fieldDescriptor.defaultValueResult is also Left("reflection failure for default value"), the function returns Left("reflection failure for default value"). In analyzeAllFields, this Left becomes the unsupportedReason in unsupportedFieldInfo, discarding the type-mismatch reason entirely.

While not a correctness bug (the field is still correctly marked unsupported), the error message surfaced to the user will say "reflection failure" instead of the actionable "type mismatch: Spark X vs Protobuf Y", which hinders debugging.

Consider building the ProtobufFieldInfo directly with the known isSupported/unsupportedReason from checkFieldSupport before inspecting defaultValueResult, so the primary unsupported reason is never lost:

val (isSupported, unsupportedReason, encoding) = checkFieldSupport(...) val defaultValue = fieldDescriptor.defaultValueResult match { case Right(v) => v case Left(err) if !isSupported => None // ignore, field is already unsupported case Left(err) => return Left(err) // only propagate if the field was otherwise OK } Right(ProtobufFieldInfo(..., isSupported = isSupported, ...))

greptile-apps · 2026-03-12T09:04:14Z

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala

+                      val runtimeOrd = orderedChildren.indexOf(childName)
+                      if (runtimeOrd >= 0) {
+                        gsf.setTagValue(ProtobufExprShims.PRUNED_ORDINAL_TAG, runtimeOrd)
+                      }
+                    }
+                  }
+                case Some(parentPath) if parentPath.isEmpty =>
+                  val runtimeOrd = topLevelIndices.indexOf(ordinal)
+                  if (runtimeOrd >= 0) {
+                    gsf.setTagValue(ProtobufExprShims.PRUNED_ORDINAL_TAG, runtimeOrd)
+                  }
+                case _ =>
+              }
+            case gasf @ GetArrayStructFields(childExpr, field, _, _, _) =>


analyzeRequiredFields short-circuits on collectedExprs.isEmpty even when fieldReqs is already populated

The guard condition:

if (!safeToPrune || collectedExprs.isEmpty || hasDirectStructRef || fieldReqs.isEmpty) {

evaluates collectedExprs.isEmpty before fieldReqs.isEmpty. Because fieldReqs is populated solely via collectStructFieldReferences calls on the expressions in collectedExprs, it is impossible for collectedExprs to be empty while fieldReqs is non-empty. The check is harmless but misleading — it could suggest that collecting expressions and populating requirements are independent paths. Consider simplifying to just check fieldReqs.isEmpty (which subsumes the collectedExprs.isEmpty case), and adding a comment explaining that an empty fieldReqs covers both "no expressions found" and "no protobuf fields referenced".

thirtiseven added 8 commits February 26, 2026 15:15

from_protobuf

65e1ff5

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

address self review comments

a225f9a

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

fix schema projection

e517e43

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

address comments

bc1bee7

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

bug fix and clean up

62679af

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

bug fix and clean up

f7d9551

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

address cc comments

802488d

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

codex review and address

030fdf8

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven self-assigned this Mar 3, 2026

thirtiseven added 5 commits March 3, 2026 17:21

verify and fix shim build error

73ce21b

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Update copyright year in pom.xml

685c885

Update copyright year in pom.xml

781e639

clean up

43a4c08

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Merge branch 'main' into from_protobuf_nested

47897e9

greptile-apps bot reviewed Mar 4, 2026

View reviewed changes

integration_tests/src/main/python/spark_init_internal.py Show resolved Hide resolved

integration_tests/src/main/python/protobuf_test.py Show resolved Hide resolved

integration_tests/run_pyspark_from_build.sh Show resolved Hide resolved

address comments

c1ef9fb

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

address comments

8a2c007

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala Outdated Show resolved Hide resolved

address comments

035fd1d

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala Show resolved Hide resolved

fix shim

c812754

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala Outdated Show resolved Hide resolved

integration_tests/run_pyspark_from_build.sh Show resolved Hide resolved

integration_tests/src/main/python/spark_init_internal.py Show resolved Hide resolved

address comments

4551f28

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

address comments

0c9385a

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala Outdated Show resolved Hide resolved

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala Show resolved Hide resolved

address comments

303b4f8

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

address comments

981b75c

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuFromProtobuf.scala Show resolved Hide resolved

thirtiseven added 5 commits March 11, 2026 10:57

Batch merging after schema projection

539af83

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Merge remote-tracking branch 'origin/main' into from_protobuf_nested

46861ec

fix the enum bug

19eb983

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

reflection refactor

3059a50

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

equals and hashcode

2c47aee

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

address comments

cea7594

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

address comments

9467ebc

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/protobuf/ProtobufSchemaModel.scala Show resolved Hide resolved

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/complexTypeExtractors.scala Outdated Show resolved Hide resolved

address comments

c647776

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/SparkProtobufCompat.scala Show resolved Hide resolved

address comments

be7bdc6

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/protobuf/ProtobufSchemaExtractor.scala Show resolved Hide resolved

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/ProtobufExprShims.scala Show resolved Hide resolved

address commmit

16f2f6e

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

Conversation

thirtiseven commented Mar 3, 2026

Description

Checklists

Uh oh!

thirtiseven commented Mar 4, 2026

Uh oh!

greptile-apps bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thirtiseven commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thirtiseven commented Mar 5, 2026

Uh oh!

Uh oh!

thirtiseven commented Mar 5, 2026

Uh oh!

Uh oh!

thirtiseven commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thirtiseven commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

thirtiseven commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thirtiseven commented Mar 6, 2026

Uh oh!

Uh oh!

thirtiseven commented Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thirtiseven commented Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

thirtiseven commented Mar 12, 2026

Uh oh!

Uh oh!

thirtiseven commented Mar 12, 2026

Uh oh!

thirtiseven commented Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

thirtiseven commented Mar 12, 2026

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

greptile-apps bot commented Mar 4, 2026 •

edited

Loading