`SpectralClustering` in `cuml.accel` by aamijar · Pull Request #7804 · rapidsai/cuml

aamijar · 2026-02-16T08:42:55Z

Resolves #7293

copy-pr-bot · 2026-02-16T08:42:58Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-17T23:11:24Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a cuml.accel sklearn proxy for SpectralClustering, extends the Cython core with CPU–GPU interop mixins and mapping methods, and introduces integration/unit tests, upstream xfail entries, and documentation updates describing SpectralClustering behavior and limitations.

Changes

Cohort / File(s)	Summary
sklearn wrapper `python/cuml/cuml/accel/_wrappers/sklearn/cluster.py`	Exported `SpectralClustering` in `__all__` and added `class SpectralClustering(ProxyBase)` with `_gpu_class` and `_not_implemented_attributes`.
Core implementation (Cython) `python/cuml/cuml/cluster/spectral_clustering.pyx`	Updated `SpectralClustering` to inherit `InteropMixin`, `ClusterMixin`, `CMajorInputTagMixin`; added `_cpu_class_path`, `_SUPPORTED_AFFINITIES`, and interop methods `_params_from_cpu`, `_params_to_cpu`, `_attrs_from_cpu`, `_attrs_to_cpu`; expanded docstring and label mapping logic.
Integration & unit tests `python/cuml/cuml_accel_tests/integration/test_spectral_clustering.py`, `python/cuml/cuml_accel_tests/test_basic_estimators.py`, `python/cuml/tests/test_sklearn_import_export.py`	Added integration tests covering affinities, precomputed affinity, fit_predict and deterministic random_state; added basic estimator smoke test and sklearn import/export round-trip test.
Upstream test config `python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml`	Added xfail entries for SpectralClustering parameter-validation and missing "not fully connected" UserWarning across scikit-learn version ranges.
Docs `docs/source/cuml-accel/faq.rst`, `docs/source/cuml-accel/limitations.rst`	Documented SpectralClustering as accel-supported and added limitations describing CPU-fallback conditions and missing `affinity_matrix_`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Mark test test_spectral_clustering_output_shape_type_and_label_count as xfail #7791: Related SpectralClustering test behavior and xfail markings; overlaps on SpectralClustering testing and expectations.

Suggested reviewers

csadorf
hcho3
viclafargue

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and clearly describes the main change: adding SpectralClustering to the cuml.accel package.
Description check	✅ Passed	The description references issue `#7293`, which is related to adding SpectralClustering to cuml.accel.
Linked Issues check	✅ Passed	The PR successfully implements a cuml.accel proxy for SpectralClustering as required by issue `#7293`, including CPU-GPU interoperability, integration tests, and documentation updates.
Out of Scope Changes check	✅ Passed	All changes are in scope: SpectralClustering proxy wrapper, interop mixins, integration tests, xfail configurations, and documentation are all necessary for implementing the requested feature.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml_accel_tests/integration/test_spectral_clustering.py`:
- Around line 29-39: The test currently computes adjusted_rand_score but never
asserts it; update test_spectral_clustering_n_clusters to store the score (e.g.,
score = adjusted_rand_score(y_true, y_pred)) and add assertions: assert
y_pred.shape == y_true.shape, assert len(np.unique(y_pred)) == n_clusters (or
allow <= n_clusters if algorithm can return fewer), and when n_clusters equals
the known true cluster count assert score >= a reasonable threshold (e.g., 0.8)
to validate correctness; apply the same pattern to the other SpectralClustering
tests referenced (lines 41-52, 54-65, 67-78, 80-91, 93-105, 108-121), and add
additional cases to cover empty dataset, single-sample, high-dimensional input,
fit/predict/transform consistency, and different input types (NumPy, pandas,
cuDF) using the same assertions and comparisons against scikit-learn where
applicable.

In `@python/cuml/cuml_accel_tests/test_basic_estimators.py`:
- Around line 35-41: The test currently creates data with make_blobs and calls
SpectralClustering().fit(X) but never asserts results and leaves y_true unused;
update test_spectral_clustering to remove the unused y_true or use it for
validation, and add concrete assertions: check sc.labels_.shape ==
(X.shape[0],), assert the number of unique labels equals n_clusters
(len(np.unique(sc.labels_)) == 3), and validate numerical correctness by
comparing to sklearn.cluster.SpectralClustering (e.g., via adjusted_rand_score
between sc.labels_ and sklearn_sc.labels_); additionally add small extra
subtests for edge cases (empty array, single sample, high-dimensional data),
test fit/predict/transform consistency on the same input, and run the same
checks with different input types (NumPy, pandas, cuDF) so the test covers
required behaviors.

In `@python/cuml/cuml/accel/_wrappers/sklearn/cluster.py`:
- Line 13: The __all__ export tuple is not sorted which triggers Ruff RUF022;
update the __all__ tuple in this module so the symbol names are alphabetically
ordered (DBSCAN, KMeans, SpectralClustering) to satisfy the linter and keep
style consistent with other modules.
- Around line 49-51: Add the new SpectralClustering wrapper to the
sklearn-compatibility test by importing SpectralClustering from cuml.cluster in
test_sklearn_compatibility.py and adding the SpectralClustering class to the
test estimators list so it is included in the automatic conformance checks
(update the module import block and append/include SpectralClustering in the
estimators list used by the tests).

python/cuml/cuml_accel_tests/integration/test_spectral_clustering.py

coderabbitai · 2026-02-17T23:11:28Z

python/cuml/cuml_accel_tests/test_basic_estimators.py

+def test_spectral_clustering():
+    X, y_true = make_blobs(n_samples=100, centers=3, random_state=42)
+    X = X.astype("float32")
+    sc = SpectralClustering(
+        n_clusters=3, affinity="nearest_neighbors", random_state=42
+    ).fit(X)
+    sc.labels_


⚠️ Potential issue | 🟡 Minor

Add assertions and remove unused y_true.

Right now the test evaluates sc.labels_ without asserting anything, so it won’t catch incorrect results and triggers Ruff warnings. Consider asserting shape/cluster count and dropping the unused variable.

✅ Suggested fix

def test_spectral_clustering(): - X, y_true = make_blobs(n_samples=100, centers=3, random_state=42) + X, _ = make_blobs(n_samples=100, centers=3, random_state=42) X = X.astype("float32") sc = SpectralClustering( n_clusters=3, affinity="nearest_neighbors", random_state=42 ).fit(X) - sc.labels_ + assert sc.labels_.shape == (X.shape[0],) + assert len(set(sc.labels_.tolist())) == 3

As per coding guidelines: Test files must validate numerical correctness by comparing with scikit-learn, include edge case coverage (empty datasets, single sample, high-dimensional data), test fit/predict/transform consistency, and test different input types (cuDF, pandas, NumPy).

🧰 Tools

🪛 Ruff (0.15.1)

[warning] 36-36: Unpacked variable y_true is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

[warning] 41-41: Found useless expression. Either assign it to a variable or remove it.

(B018)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml_accel_tests/test_basic_estimators.py` around lines 35 - 41, The test currently creates data with make_blobs and calls SpectralClustering().fit(X) but never asserts results and leaves y_true unused; update test_spectral_clustering to remove the unused y_true or use it for validation, and add concrete assertions: check sc.labels_.shape == (X.shape[0],), assert the number of unique labels equals n_clusters (len(np.unique(sc.labels_)) == 3), and validate numerical correctness by comparing to sklearn.cluster.SpectralClustering (e.g., via adjusted_rand_score between sc.labels_ and sklearn_sc.labels_); additionally add small extra subtests for edge cases (empty array, single sample, high-dimensional data), test fit/predict/transform consistency on the same input, and run the same checks with different input types (NumPy, pandas, cuDF) so the test covers required behaviors.

coderabbitai · 2026-02-17T23:11:28Z

python/cuml/cuml/accel/_overrides/sklearn/cluster.py

@@ -10,7 +10,7 @@
 from cuml.accel.estimator_proxy import ProxyBase
 from cuml.internals.interop import UnsupportedOnGPU

-__all__ = ("KMeans", "DBSCAN")
+__all__ = ("KMeans", "DBSCAN", "SpectralClustering")


⚠️ Potential issue | 🟡 Minor

Sort __all__ to satisfy Ruff RUF022.

This keeps lint clean and consistent with existing style checks.

♻️ Suggested fix

-__all__ = ("KMeans", "DBSCAN", "SpectralClustering") +__all__ = ("DBSCAN", "KMeans", "SpectralClustering")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

__all__ = ("KMeans", "DBSCAN", "SpectralClustering")

__all__ = ("DBSCAN", "KMeans", "SpectralClustering")

🧰 Tools

🪛 Ruff (0.15.1)

[warning] 13-13: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/accel/_wrappers/sklearn/cluster.py` at line 13, The __all__ export tuple is not sorted which triggers Ruff RUF022; update the __all__ tuple in this module so the symbol names are alphabetically ordered (DBSCAN, KMeans, SpectralClustering) to satisfy the linter and keep style consistent with other modules.

python/cuml/cuml/accel/_overrides/sklearn/cluster.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/cuml-accel/limitations.rst`:
- Around line 111-114: Add an "Additional notes" callout under the
SpectralClustering fallback bullet that explicitly states
sklearn.SpectralClustering defaults to affinity='rbf' which is not in the
GPU-supported set {'nearest_neighbors','precomputed'} and therefore will always
fall back to CPU; mention the contrast with sklearn.SpectralEmbedding (which
defaults to affinity='nearest_neighbors' and is GPU-accelerated) so readers
understand the practical impact on out-of-the-box sklearn usage.
- Around line 108-118: Update the SpectralClustering fallback list to include
sparse input as a CPU-fallback condition: add a bullet stating that
SpectralClustering falls back to CPU for sparse input (e.g., "If input is
sparse") alongside the existing checks for assign_labels and affinity; mirror
the wording/placement used by SpectralEmbedding's sparse input fallback to keep
consistency and ensure the new bullet appears in the same list under the
"SpectralClustering" section.

docs/source/cuml-accel/limitations.rst

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml`:
- Around line 1658-1661: The condition specifier for the xfail entry uses an
inconsistent comparator: change the condition field value from
"scikit-learn>1.5" to "scikit-learn>=1.6" so it matches the intended boundary
and the rest of the file's style; update the YAML entry that contains the
reason/tests (the block with reason "cuML SpectralClustering..." and tests list
including "sklearn.cluster.tests.test_spectral::test_affinities[42]") by
replacing the condition string accordingly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 178e310 and a5e3e37.

📒 Files selected for processing (1)

python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

viclafargue

Thanks @aamijar! LGTM for the most part. Please add the estimator to test_sklearn_compatibility.py for automated conformance checks.

viclafargue · 2026-03-04T09:30:42Z

docs/source/cuml-accel/limitations.rst

+SpectralClustering
+^^^^^^^^^^^^^^^^^^
+
+``SpectralClustering`` will fall back to CPU in the following cases:
+
+- If ``assign_labels`` is not ``"kmeans"``.
+- If ``affinity`` is not ``"nearest_neighbors"`` or ``"precomputed"``.
+
+The following fitted attributes are currently not computed:
+
+- ``affinity_matrix_``


I agree with the bot comments here, if the affinity parameter is left to its default, it will not be GPU accelerated and it would be nice to document this. Also, we should document that the estimator will fall back to CPU if X is sparse (as documented for SpectralEmbedding).

Addressed in e66c72f

python/cuml/cuml/cluster/spectral_clustering.pyx

python/cuml/cuml_accel_tests/integration/test_spectral_clustering.py

viclafargue · 2026-03-04T09:49:13Z

python/cuml/tests/test_sklearn_import_export.py

+    sklearn_model = original.as_sklearn()
+    roundtrip_model = SpectralClustering.from_sklearn(sklearn_model)
+    assert array_equal(original.labels_, roundtrip_model.labels_)


Why not using assert_estimator_roundtrip? It should offer more robust testing.

Addressed in e66c72f

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

…ing.py Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

aamijar · 2026-03-11T00:03:21Z

Thanks for the review @viclafargue! I have addressed all your comments now.

viclafargue

Thanks, LGTM!

viclafargue · 2026-03-11T09:27:52Z

python/cuml/cuml_accel_tests/integration/test_spectral_clustering.py

+        random_state=42,
+    ).fit(X)
+    y_pred = sc.labels_
+    if n_clusters == 3:


Shouldn't we generate a new blob datasets for each number of clusters?

Addressed in def0dba

aamijar · 2026-03-13T22:08:20Z

/merge

github-actions bot added the Cython / Python Cython or Python issue label Feb 16, 2026

github-actions bot assigned aamijar Feb 16, 2026

aamijar added non-breaking Non-breaking change cuml-accel Issues related to cuml.accel labels Feb 16, 2026

spectral-clustering-cuml-accel

9c0d36c

aamijar added the feature request New feature or request label Feb 16, 2026

aamijar marked this pull request as ready for review February 17, 2026 23:02

aamijar requested a review from a team as a code owner February 17, 2026 23:02

aamijar requested a review from viclafargue February 17, 2026 23:02

coderabbitai bot reviewed Feb 17, 2026

View reviewed changes

aamijar and others added 4 commits February 18, 2026 01:37

Merge branch 'main' into spectral-clustering-cuml-accel

ecfd2b3

Merge branch 'main' into spectral-clustering-cuml-accel

3356c66

update xfail

ebfdb91

update docs

178e310

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

docs/source/cuml-accel/limitations.rst Show resolved Hide resolved

docs/source/cuml-accel/limitations.rst Show resolved Hide resolved

aamijar and others added 3 commits February 24, 2026 11:12

Merge branch 'main' into spectral-clustering-cuml-accel

afca415

update xfail condition oldest-deps

8e65318

Merge branch 'main' into spectral-clustering-cuml-accel

a5e3e37

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml Show resolved Hide resolved

aamijar and others added 2 commits February 25, 2026 15:11

fix xfail

23a75ed

Merge branch 'main' into spectral-clustering-cuml-accel

f456595

viclafargue reviewed Mar 4, 2026

View reviewed changes

aamijar and others added 6 commits March 10, 2026 23:07

Merge branch 'main' into spectral-clustering-cuml-accel

7e9155d

Update python/cuml/cuml/cluster/spectral_clustering.pyx

e4c1143

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…

a779f60

…ing.py Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…

5ca720f

…ing.py Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…

20501e2

…ing.py Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…

6fa7376

…ing.py Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

aamijar and others added 4 commits March 10, 2026 16:14

Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…

eb4547c

…ing.py Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…

f987713

…ing.py Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…

11e9c4c

…ing.py Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

address review

e66c72f

viclafargue approved these changes Mar 11, 2026

View reviewed changes

aamijar and others added 4 commits March 11, 2026 15:26

pytest skip

c5923a2

Merge branch 'main' into spectral-clustering-cuml-accel

044806a

different blob datasets

def0dba

tighter clusters

f514de9

aamijar changed the base branch from main to release/26.04 March 13, 2026 00:09

aamijar and others added 4 commits March 12, 2026 19:03

solve memory bugs, and remove pytest skip

8333cf6

Merge branch 'release/26.04' into spectral-clustering-cuml-accel

175dc91

remove unused

28896cf

Merge branch 'release/26.04' into spectral-clustering-cuml-accel

8c7e73b

rapids-bot bot merged commit f84a29a into rapidsai:release/26.04 Mar 13, 2026
168 of 170 checks passed

	__all__ = ("KMeans", "DBSCAN", "SpectralClustering")
	__all__ = ("DBSCAN", "KMeans", "SpectralClustering")

Conversation

aamijar commented Feb 16, 2026

Uh oh!

copy-pr-bot bot commented Feb 16, 2026

Uh oh!

coderabbitai bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

viclafargue Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

aamijar Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viclafargue Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

aamijar Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

aamijar commented Mar 11, 2026

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

viclafargue Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

aamijar Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aamijar commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

coderabbitai bot commented Feb 17, 2026 •

edited

Loading

aamijar Mar 11, 2026 •

edited

Loading