Skip to content

work around runtime conf setting bug in spark < 3.4#1000

Merged
eordentlich merged 1 commit intoNVIDIA:mainfrom
eordentlich:eo_sparksession_patch
Dec 23, 2025
Merged

work around runtime conf setting bug in spark < 3.4#1000
eordentlich merged 1 commit intoNVIDIA:mainfrom
eordentlich:eo_sparksession_patch

Conversation

@eordentlich
Copy link
Collaborator

No description provided.

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 23, 2025

Greptile Summary

Added a workaround for SPARK-38870, a bug in Spark versions prior to 3.4 that causes issues when changing runtime configurations of active sessions. The fix checks for an active SparkSession before calling SparkSession.builder.getOrCreate().

Changes:

  • Modified _get_spark_session() to return the active session if one exists
  • This prevents triggering the bug by avoiding unnecessary getOrCreate() calls when a session is already active
  • The change is backward compatible and doesn't affect behavior in Spark 3.4+

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • This is a simple, well-documented workaround for a known Spark bug. The change is defensive - it returns an existing active session before attempting to create one, which is safer and more efficient. The logic is straightforward with no complex edge cases, and it doesn't alter any existing behavior when no active session exists.
  • No files require special attention

Important Files Changed

Filename Overview
python/src/spark_rapids_ml/utils.py Added workaround for SPARK-38870 bug affecting runtime config changes in Spark < 3.4 by returning active session before creating new one

Sequence Diagram

sequenceDiagram
    participant Caller as ML Algorithm/Component
    participant GSS as _get_spark_session()
    participant TC as TaskContext
    participant SS as SparkSession
    
    Caller->>GSS: Request SparkSession
    GSS->>TC: TaskContext.get()
    TC-->>GSS: None (driver side)
    
    Note over GSS: Check for active session first<br/>(workaround for SPARK-38870)
    GSS->>SS: SparkSession.getActiveSession()
    
    alt Active session exists
        SS-->>GSS: Return active session
        GSS-->>Caller: Return active session
    else No active session
        GSS->>SS: SparkSession.builder.getOrCreate()
        SS-->>GSS: Return new/existing session
        GSS-->>Caller: Return session
    end
Loading

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 23, 2025

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@eordentlich
Copy link
Collaborator Author

build

@eordentlich eordentlich merged commit 2cf5770 into NVIDIA:main Dec 23, 2025
5 checks passed
@eordentlich eordentlich deleted the eo_sparksession_patch branch December 23, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants