avoid collisions in logistic_regression_objective tmp column name by eordentlich · Pull Request #1002 · NVIDIA/spark-rapids-ml

eordentlich · 2026-01-08T16:03:59Z

No description provided.

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

greptile-apps

Greptile Overview

Greptile Summary

This PR fixes a potential column name collision issue in the logistic_regression_objective function. Previously, the function hardcoded "scaledFeatures" as the output column name for the StandardScaler, which could collide with existing columns in the input DataFrame.

Changes:

Dynamically generates a unique temporary column name (_objective_tmp) by checking against existing column names
Uses a while loop to prepend underscores until a unique name is found
Follows the same collision avoidance pattern already used in params.py (line 118-119)

Impact:

Prevents runtime errors when input DataFrames contain a column named "scaledFeatures"
Maintains backward compatibility as this is a utility function used in testing
No changes to public API or behavior

Confidence Score: 5/5

This PR is safe to merge with no risk - it's a defensive bug fix with no breaking changes
The change is a straightforward defensive fix that prevents potential column name collisions. The implementation follows an existing pattern in the codebase (params.py), is well-contained, and only affects a utility function used for testing. No edge cases or issues found.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
python/src/spark_rapids_ml/metrics/utils.py	5/5	Adds collision avoidance for temporary column name used by StandardScaler; follows existing codebase patterns

Sequence Diagram

sequenceDiagram
    participant Caller
    participant logistic_regression_objective
    participant DataFrame
    participant StandardScaler
    
    Caller->>logistic_regression_objective: Call with df and lr_model
    logistic_regression_objective->>logistic_regression_objective: Transform df with model
    logistic_regression_objective->>logistic_regression_objective: Calculate log_loss
    
    alt Standardization is True
        logistic_regression_objective->>DataFrame: Get df.columns
        DataFrame-->>logistic_regression_objective: Return column_names
        logistic_regression_objective->>logistic_regression_objective: Initialize outputCol = "_objective_tmp"
        
        loop While outputCol in column_names
            logistic_regression_objective->>logistic_regression_objective: outputCol = "_" + outputCol
        end
        
        logistic_regression_objective->>StandardScaler: Create with inputCol and unique outputCol
        logistic_regression_objective->>StandardScaler: Fit on df
        StandardScaler-->>logistic_regression_objective: Return scaler_model with std
        logistic_regression_objective->>logistic_regression_objective: Adjust coefficients by stdev
    end
    
    logistic_regression_objective->>logistic_regression_objective: Calculate full_objective
    logistic_regression_objective-->>Caller: Return full_objective

eordentlich · 2026-01-08T16:45:35Z

build

avoid collisions in logistic_regression_objective tmp column name

102a580

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

greptile-apps bot reviewed Jan 8, 2026

View reviewed changes

rishic3 approved these changes Jan 8, 2026

View reviewed changes

eordentlich merged commit c8f9672 into NVIDIA:main Jan 8, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid collisions in logistic_regression_objective tmp column name#1002

avoid collisions in logistic_regression_objective tmp column name#1002
eordentlich merged 1 commit intoNVIDIA:mainfrom
eordentlich:eo_objective_fix

eordentlich commented Jan 8, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

eordentlich commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eordentlich commented Jan 8, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

eordentlich commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants