Skip to content

fix slicing of coefficient arrays in some notebooks [skip ci]#1005

Merged
eordentlich merged 2 commits intoNVIDIA:mainfrom
eordentlich:eo_fix_coefficients
Jan 13, 2026
Merged

fix slicing of coefficient arrays in some notebooks [skip ci]#1005
eordentlich merged 2 commits intoNVIDIA:mainfrom
eordentlich:eo_fix_coefficients

Conversation

@eordentlich
Copy link
Collaborator

slicing not supported when these vectors happen to be sparse, mainly cpu case, but changed gpu case for consistency.

…ported when these vectors happen to be sparse, mainly cpu case

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@eordentlich eordentlich changed the title fix slicing of coefficient arrays in some notebooks fix slicing of coefficient arrays in some notebooks [skip ci] Jan 10, 2026
@eordentlich
Copy link
Collaborator Author

build

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 10, 2026

Greptile Overview

Greptile Summary

This PR fixes a critical runtime error that occurs when coefficient vectors happen to be sparse. PySpark's Vector objects support single-element indexing ([0]) but do NOT support slice operations ([0:10]) when the vector is sparse. This is especially common in regularized models (like Lasso) where many coefficients are zero.

Changes Made

Linear Regression Notebook (3 fixes):

  • ✅ Fixed coefs.toArray()[0:10] where coefs = gpu_model.coefficients (line 272)
  • ✅ Fixed gpu_model_loaded.coefficients.toArray()[0:10] (line 339)
  • ✅ Fixed cpu_model.coefficients.toArray()[0:10] (line 541)

Logistic Regression Notebook (3 fixes):

  • ✅ Fixed all 3 instances consistently for binary classification cases (lines 262, 338, 568)
  • ✅ Multiclass cases already correctly used .toArray() on coefficientMatrix

Technical Context

The PR description correctly identifies this affects "mainly cpu case" because Spark ML (CPU) models are more likely to return sparse coefficient vectors, especially with L1 regularization. The fix ensures notebooks work regardless of whether coefficients are sparse or dense.

Verification

  • All coefficient slicing operations in both notebooks now use .toArray() conversion
  • No other Vector slicing operations found across the notebooks
  • Single-element indexing in spark-compat.ipynb (model.coefficients[0]) is correct and doesn't need changes (indexing works on sparse vectors, only slicing fails)
  • Changes are consistent across GPU and CPU model paths

Confidence Score: 5/5

  • This PR is safe to merge with no risk - it fixes critical runtime errors in notebooks
  • Score of 5 reflects a complete and correct fix: (1) All instances of Vector slicing are fixed, (2) The fix is the standard, recommended approach used throughout the codebase (verified in test files), (3) Changes are purely additive (.toArray() conversion) with no functional changes, (4) No edge cases or error conditions introduced - toArray() always succeeds on Vector objects, (5) Comprehensive search confirms no other Vector slicing operations remain
  • No files require special attention - all changes are straightforward and correct

Important Files Changed

File Analysis

Filename Score Overview
notebooks/linear-regression.ipynb 5/5 Fixed 3 instances of Vector slicing by adding .toArray() conversion. All coefficient slicing operations now properly handle sparse vectors. Changes are consistent and correct.
notebooks/logistic-regression.ipynb 5/5 Fixed 3 instances of coefficient slicing in binary classification cases. Multiclass coefficient matrix slicing was already correct. All changes properly handle sparse vectors.

Sequence Diagram

sequenceDiagram
    participant User
    participant Notebook
    participant Model
    participant Vector
    participant Array
    
    User->>Notebook: Execute coefficient display cell
    Notebook->>Model: Access .coefficients property
    Model-->>Notebook: Return Vector object (sparse or dense)
    
    alt Before Fix (Slicing fails on sparse)
        Notebook->>Vector: Apply slice [0:10]
        Vector-->>Notebook: ❌ Error (sparse vectors don't support slicing)
    end
    
    alt After Fix (Works for both)
        Notebook->>Vector: Call .toArray()
        Vector-->>Notebook: Return numpy array
        Notebook->>Array: Apply slice [0:10]
        Array-->>User: ✅ Display first 10 coefficients
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 10, 2026

Additional Comments (1)

notebooks/linear-regression.ipynb
This coefficient slicing has the same issue that was fixed in other cells - direct slicing on a Vector object will fail when the coefficients happen to be sparse. The fix should be consistent with the other cells in this PR.

    "coefs.toArray()[0:10]"

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@eordentlich
Copy link
Collaborator Author

build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@eordentlich eordentlich merged commit ab44618 into NVIDIA:main Jan 13, 2026
4 checks passed
@eordentlich eordentlich deleted the eo_fix_coefficients branch January 13, 2026 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants