Skip to content

SpectralClustering in cuml.accel#7804

Merged
rapids-bot[bot] merged 28 commits intorapidsai:release/26.04from
aamijar:spectral-clustering-cuml-accel
Mar 13, 2026
Merged

SpectralClustering in cuml.accel#7804
rapids-bot[bot] merged 28 commits intorapidsai:release/26.04from
aamijar:spectral-clustering-cuml-accel

Conversation

@aamijar
Copy link
Member

@aamijar aamijar commented Feb 16, 2026

Resolves #7293

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 16, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the Cython / Python Cython or Python issue label Feb 16, 2026
@aamijar aamijar added non-breaking Non-breaking change cuml-accel Issues related to cuml.accel labels Feb 16, 2026
@aamijar aamijar added the feature request New feature or request label Feb 16, 2026
@aamijar aamijar marked this pull request as ready for review February 17, 2026 23:02
@aamijar aamijar requested a review from a team as a code owner February 17, 2026 23:02
@aamijar aamijar requested a review from viclafargue February 17, 2026 23:02
@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a cuml.accel sklearn proxy for SpectralClustering, extends the Cython core with CPU–GPU interop mixins and mapping methods, and introduces integration/unit tests, upstream xfail entries, and documentation updates describing SpectralClustering behavior and limitations.

Changes

Cohort / File(s) Summary
sklearn wrapper
python/cuml/cuml/accel/_wrappers/sklearn/cluster.py
Exported SpectralClustering in __all__ and added class SpectralClustering(ProxyBase) with _gpu_class and _not_implemented_attributes.
Core implementation (Cython)
python/cuml/cuml/cluster/spectral_clustering.pyx
Updated SpectralClustering to inherit InteropMixin, ClusterMixin, CMajorInputTagMixin; added _cpu_class_path, _SUPPORTED_AFFINITIES, and interop methods _params_from_cpu, _params_to_cpu, _attrs_from_cpu, _attrs_to_cpu; expanded docstring and label mapping logic.
Integration & unit tests
python/cuml/cuml_accel_tests/integration/test_spectral_clustering.py, python/cuml/cuml_accel_tests/test_basic_estimators.py, python/cuml/tests/test_sklearn_import_export.py
Added integration tests covering affinities, precomputed affinity, fit_predict and deterministic random_state; added basic estimator smoke test and sklearn import/export round-trip test.
Upstream test config
python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml
Added xfail entries for SpectralClustering parameter-validation and missing "not fully connected" UserWarning across scikit-learn version ranges.
Docs
docs/source/cuml-accel/faq.rst, docs/source/cuml-accel/limitations.rst
Documented SpectralClustering as accel-supported and added limitations describing CPU-fallback conditions and missing affinity_matrix_.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • csadorf
  • hcho3
  • viclafargue
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: adding SpectralClustering to the cuml.accel package.
Description check ✅ Passed The description references issue #7293, which is related to adding SpectralClustering to cuml.accel.
Linked Issues check ✅ Passed The PR successfully implements a cuml.accel proxy for SpectralClustering as required by issue #7293, including CPU-GPU interoperability, integration tests, and documentation updates.
Out of Scope Changes check ✅ Passed All changes are in scope: SpectralClustering proxy wrapper, interop mixins, integration tests, xfail configurations, and documentation are all necessary for implementing the requested feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml_accel_tests/integration/test_spectral_clustering.py`:
- Around line 29-39: The test currently computes adjusted_rand_score but never
asserts it; update test_spectral_clustering_n_clusters to store the score (e.g.,
score = adjusted_rand_score(y_true, y_pred)) and add assertions: assert
y_pred.shape == y_true.shape, assert len(np.unique(y_pred)) == n_clusters (or
allow <= n_clusters if algorithm can return fewer), and when n_clusters equals
the known true cluster count assert score >= a reasonable threshold (e.g., 0.8)
to validate correctness; apply the same pattern to the other SpectralClustering
tests referenced (lines 41-52, 54-65, 67-78, 80-91, 93-105, 108-121), and add
additional cases to cover empty dataset, single-sample, high-dimensional input,
fit/predict/transform consistency, and different input types (NumPy, pandas,
cuDF) using the same assertions and comparisons against scikit-learn where
applicable.

In `@python/cuml/cuml_accel_tests/test_basic_estimators.py`:
- Around line 35-41: The test currently creates data with make_blobs and calls
SpectralClustering().fit(X) but never asserts results and leaves y_true unused;
update test_spectral_clustering to remove the unused y_true or use it for
validation, and add concrete assertions: check sc.labels_.shape ==
(X.shape[0],), assert the number of unique labels equals n_clusters
(len(np.unique(sc.labels_)) == 3), and validate numerical correctness by
comparing to sklearn.cluster.SpectralClustering (e.g., via adjusted_rand_score
between sc.labels_ and sklearn_sc.labels_); additionally add small extra
subtests for edge cases (empty array, single sample, high-dimensional data),
test fit/predict/transform consistency on the same input, and run the same
checks with different input types (NumPy, pandas, cuDF) so the test covers
required behaviors.

In `@python/cuml/cuml/accel/_wrappers/sklearn/cluster.py`:
- Line 13: The __all__ export tuple is not sorted which triggers Ruff RUF022;
update the __all__ tuple in this module so the symbol names are alphabetically
ordered (DBSCAN, KMeans, SpectralClustering) to satisfy the linter and keep
style consistent with other modules.
- Around line 49-51: Add the new SpectralClustering wrapper to the
sklearn-compatibility test by importing SpectralClustering from cuml.cluster in
test_sklearn_compatibility.py and adding the SpectralClustering class to the
test estimators list so it is included in the automatic conformance checks
(update the module import block and append/include SpectralClustering in the
estimators list used by the tests).

Comment on lines +35 to +41
def test_spectral_clustering():
X, y_true = make_blobs(n_samples=100, centers=3, random_state=42)
X = X.astype("float32")
sc = SpectralClustering(
n_clusters=3, affinity="nearest_neighbors", random_state=42
).fit(X)
sc.labels_
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add assertions and remove unused y_true.

Right now the test evaluates sc.labels_ without asserting anything, so it won’t catch incorrect results and triggers Ruff warnings. Consider asserting shape/cluster count and dropping the unused variable.

✅ Suggested fix
 def test_spectral_clustering():
-    X, y_true = make_blobs(n_samples=100, centers=3, random_state=42)
+    X, _ = make_blobs(n_samples=100, centers=3, random_state=42)
     X = X.astype("float32")
     sc = SpectralClustering(
         n_clusters=3, affinity="nearest_neighbors", random_state=42
     ).fit(X)
-    sc.labels_
+    assert sc.labels_.shape == (X.shape[0],)
+    assert len(set(sc.labels_.tolist())) == 3
As per coding guidelines: Test files must validate numerical correctness by comparing with scikit-learn, include edge case coverage (empty datasets, single sample, high-dimensional data), test fit/predict/transform consistency, and test different input types (cuDF, pandas, NumPy).
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 36-36: Unpacked variable y_true is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


[warning] 41-41: Found useless expression. Either assign it to a variable or remove it.

(B018)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml_accel_tests/test_basic_estimators.py` around lines 35 - 41,
The test currently creates data with make_blobs and calls
SpectralClustering().fit(X) but never asserts results and leaves y_true unused;
update test_spectral_clustering to remove the unused y_true or use it for
validation, and add concrete assertions: check sc.labels_.shape ==
(X.shape[0],), assert the number of unique labels equals n_clusters
(len(np.unique(sc.labels_)) == 3), and validate numerical correctness by
comparing to sklearn.cluster.SpectralClustering (e.g., via adjusted_rand_score
between sc.labels_ and sklearn_sc.labels_); additionally add small extra
subtests for edge cases (empty array, single sample, high-dimensional data),
test fit/predict/transform consistency on the same input, and run the same
checks with different input types (NumPy, pandas, cuDF) so the test covers
required behaviors.

@@ -10,7 +10,7 @@
from cuml.accel.estimator_proxy import ProxyBase
from cuml.internals.interop import UnsupportedOnGPU

__all__ = ("KMeans", "DBSCAN")
__all__ = ("KMeans", "DBSCAN", "SpectralClustering")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sort __all__ to satisfy Ruff RUF022.

This keeps lint clean and consistent with existing style checks.

♻️ Suggested fix
-__all__ = ("KMeans", "DBSCAN", "SpectralClustering")
+__all__ = ("DBSCAN", "KMeans", "SpectralClustering")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
__all__ = ("KMeans", "DBSCAN", "SpectralClustering")
__all__ = ("DBSCAN", "KMeans", "SpectralClustering")
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 13-13: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/_wrappers/sklearn/cluster.py` at line 13, The __all__
export tuple is not sorted which triggers Ruff RUF022; update the __all__ tuple
in this module so the symbol names are alphabetically ordered (DBSCAN, KMeans,
SpectralClustering) to satisfy the linter and keep style consistent with other
modules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/cuml-accel/limitations.rst`:
- Around line 111-114: Add an "Additional notes" callout under the
SpectralClustering fallback bullet that explicitly states
sklearn.SpectralClustering defaults to affinity='rbf' which is not in the
GPU-supported set {'nearest_neighbors','precomputed'} and therefore will always
fall back to CPU; mention the contrast with sklearn.SpectralEmbedding (which
defaults to affinity='nearest_neighbors' and is GPU-accelerated) so readers
understand the practical impact on out-of-the-box sklearn usage.
- Around line 108-118: Update the SpectralClustering fallback list to include
sparse input as a CPU-fallback condition: add a bullet stating that
SpectralClustering falls back to CPU for sparse input (e.g., "If input is
sparse") alongside the existing checks for assign_labels and affinity; mirror
the wording/placement used by SpectralEmbedding's sparse input fallback to keep
consistency and ensure the new bullet appears in the same list under the
"SpectralClustering" section.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml`:
- Around line 1658-1661: The condition specifier for the xfail entry uses an
inconsistent comparator: change the condition field value from
"scikit-learn>1.5" to "scikit-learn>=1.6" so it matches the intended boundary
and the rest of the file's style; update the YAML entry that contains the
reason/tests (the block with reason "cuML SpectralClustering..." and tests list
including "sklearn.cluster.tests.test_spectral::test_affinities[42]") by
replacing the condition string accordingly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 178e310 and a5e3e37.

📒 Files selected for processing (1)
  • python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

Copy link
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aamijar! LGTM for the most part. Please add the estimator to test_sklearn_compatibility.py for automated conformance checks.

Comment on lines +109 to +119
SpectralClustering
^^^^^^^^^^^^^^^^^^

``SpectralClustering`` will fall back to CPU in the following cases:

- If ``assign_labels`` is not ``"kmeans"``.
- If ``affinity`` is not ``"nearest_neighbors"`` or ``"precomputed"``.

The following fitted attributes are currently not computed:

- ``affinity_matrix_``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the bot comments here, if the affinity parameter is left to its default, it will not be GPU accelerated and it would be nice to document this. Also, we should document that the estimator will fall back to CPU if X is sparse (as documented for SpectralEmbedding).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e66c72f

Comment on lines +209 to +211
sklearn_model = original.as_sklearn()
roundtrip_model = SpectralClustering.from_sklearn(sklearn_model)
assert array_equal(original.labels_, roundtrip_model.labels_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using assert_estimator_roundtrip? It should offer more robust testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e66c72f

aamijar and others added 6 commits March 10, 2026 23:07
Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
…ing.py

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
…ing.py

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
…ing.py

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
…ing.py

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
aamijar and others added 4 commits March 10, 2026 16:14
…ing.py

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
…ing.py

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
…ing.py

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
@aamijar
Copy link
Member Author

aamijar commented Mar 11, 2026

Thanks for the review @viclafargue! I have addressed all your comments now.

Copy link
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM!

random_state=42,
).fit(X)
y_pred = sc.labels_
if n_clusters == 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we generate a new blob datasets for each number of clusters?

Copy link
Member Author

@aamijar aamijar Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in def0dba

@aamijar aamijar changed the base branch from main to release/26.04 March 13, 2026 00:09
@aamijar
Copy link
Member Author

aamijar commented Mar 13, 2026

/merge

@rapids-bot rapids-bot bot merged commit f84a29a into rapidsai:release/26.04 Mar 13, 2026
168 of 170 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuml-accel Issues related to cuml.accel Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Accelerate SpectralClustering with cuml.accel

2 participants