Move PCA and TSVD from cuml to raft by aamijar · Pull Request #2952 · rapidsai/raft

aamijar · 2026-02-13T09:09:54Z

Required for rapidsai/cuvs#1207 and rapidsai/cuml#7802.

This PR moves pca.cuh, tsvd.cuh, and gtests into raft.

cjnolet · 2026-02-13T16:50:32Z

cpp/include/raft/linalg/pca.cuh

+
+template <typename math_t, typename enum_solver = solver>
+void truncCompExpVars(const raft::handle_t& handle,
+                      math_t* in,


Need to use mdspan here- we've deprecated all the pointer APIs.

Addressed in 7289840

cjnolet · 2026-02-13T16:51:20Z

cpp/include/raft/linalg/pca.cuh

+            math_t* input,
+            math_t* components,
+            math_t* explained_var,
+            math_t* explained_var_ratio,


Input order should match the other (newer APIs). handle, params, input, output, free params. Also "stream" is in the handle now, and we use device_resources not raft::hande.

Addressed in 7289840

jinsolp

Thanks @aamijar ! just a minor comment.
Question: will this be imported in cuvs and exposed as a python API?

jinsolp · 2026-02-14T00:23:20Z

cpp/include/raft/linalg/detail/pca.cuh

+/**
+ * @brief perform fit operation for the pca. Generates eigenvectors, explained vars, singular vals,
+ * etc.
+ * @param[in] handle: cuml handle object


doc mentioning cuml handle, not raft device_resources! (same for other docs too)

Addressed here 9e285e6

aamijar · 2026-02-14T00:35:49Z

Question: will this be imported in cuvs and exposed as a python API?

We will still have the same python and cpp apis in cuml too!
On the cuvs side I think the plan is to expose a cpp api.

cjnolet · 2026-02-14T01:14:25Z

@aamijar we will probably expose a preprocessing api through python for purposes of users who need to write scripts (for example Jinsol's new dataset gen requires PCA and it would be a circular dependency if we included cuml in cuVS) or have databases written in python.

But- like I mentioned to Simon, the users are very diffeeent between the two. Same thing with kmeans- kmeans clusters is the equivalent of "lexicograph ordering" in the vector world. Pca is another way to reduce footprint of vectors without losing quality.

Data science users will continue to use cuml. Vector databases will continue to use cuVS. It's important we don't duplicate code across the two... and since cuml is already using cuVS, it can continue to use the c++ api like you mentioned.

jinsolp · 2026-02-19T23:52:55Z

cpp/include/raft/linalg/detail/pca.cuh

+void truncCompExpVars(raft::resources const& handle,
+                      math_t* in,
+                      math_t* components,
+                      math_t* explained_var,
+                      math_t* explained_var_ratio,
+                      math_t* noise_vars,
+                      const paramsTSVD& prms,
+                      cudaStream_t stream)


why do we need both the stream and the handle here? can't we use raft::resource::get_cuda_stream(handle)? Same for other functions too!

jinsolp · 2026-02-19T23:57:28Z

cpp/include/raft/linalg/pca.cuh

+  auto stream = resource::get_cuda_stream(handle);
+
+  paramsPCA prms_with_dims = prms;
+  prms_with_dims.n_rows    = static_cast<std::size_t>(input.extent(0));
+  prms_with_dims.n_cols    = static_cast<std::size_t>(input.extent(1));
+
+  detail::pcaFit(handle,
+                 input.data_handle(),
+                 components.data_handle(),
+                 explained_var.data_handle(),
+                 explained_var_ratio.data_handle(),
+                 singular_vals.data_handle(),
+                 mu.data_handle(),
+                 noise_vars.data_handle(),
+                 prms_with_dims,
+                 stream,
+                 flip_signs_based_on_U);
+}


similar to previous comment. suggesting we don't need to pass the stream separately

cpp/include/raft/linalg/pca.cuh

jinsolp · 2026-02-20T00:03:58Z

cpp/include/raft/linalg/pca_types.hpp

+struct paramsTSVD {
+  std::size_t n_rows    = 0;
+  std::size_t n_cols    = 0;
+  int gpu_id            = 0;


are we using this gpu_id anywhere?

jinsolp · 2026-02-20T00:08:20Z

cpp/include/raft/linalg/detail/pca.cuh

+void truncCompExpVars(raft::resources const& handle,
+                      math_t* in,
+                      math_t* components,
+                      math_t* explained_var,
+                      math_t* explained_var_ratio,
+                      math_t* noise_vars,
+                      const paramsTSVD& prms,


As Corey mentioned, suggesting we change the order to have handle-> params-> other stuff. Same for other functions.

jinsolp · 2026-02-20T00:14:22Z

cpp/include/raft/linalg/pca.cuh

+  paramsPCA prms_with_dims = prms;
+  prms_with_dims.n_rows    = static_cast<std::size_t>(input.extent(0));
+  prms_with_dims.n_cols    = static_cast<std::size_t>(input.extent(1));


thinking if it would be better to make a new params here VS use RAFT_EXPECTS to check if we have the right rows/cols.... what do you think!?

aamijar · 2026-02-20T00:22:49Z

Hi @jinsolp, thanks for the review! I guess my original goal was to keep as much of the existing code from cuml as possible, so that's why I am using the old pointer based apis for the detail namespace. The public API has been changed to use the mdspan, no stream usage, and correct ordering of params.

That being said, we should probably redo some of the detail implementation to match the modern conventions. I'll take a look at it!

Co-authored-by: Jinsol Park <jinsolp@nvidia.com>

cjnolet · 2026-03-11T01:10:35Z

so that's why I am using the old pointer based apis for the detail namespace. The public API has been changed to use the mdspan, no stream usage, and correct ordering of params.

@jinsolp @aamijar pointer-based APIs are okay in detail namespace. Never okay in public APIs. We do need to make sure the public APIs are ordered appropriately (consistently w/ the other public APIs).

aamijar · 2026-03-11T01:18:12Z

Thanks @jinsolp and @cjnolet, I've opened an issue as a follow up item #2978. Can I get a re-review if everything looks good on this PR now?

move-pca-from-cuml

afd395d

aamijar requested review from a team as code owners February 13, 2026 09:09

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Feb 13, 2026

aamijar self-assigned this Feb 13, 2026

aamijar added non-breaking Non-breaking change feature request New feature or request labels Feb 13, 2026

Merge branch 'main' into move-pca-from-cuml

8d80e80

aamijar moved this to In Progress in Vector Search, ML, & Data Mining Release Board Feb 13, 2026

cjnolet reviewed Feb 13, 2026

View reviewed changes

jinsolp reviewed Feb 14, 2026

View reviewed changes

aamijar and others added 7 commits February 14, 2026 01:50

mdspan public api

7289840

remove default template type

0c86857

update docstring

9e285e6

remove fixme comment

529514e

expose more tsvd functions

e9f6e2c

simplify paramsPCA

b85cf25

Merge branch 'main' into move-pca-from-cuml

cfdc5e1

jinsolp reviewed Feb 20, 2026

View reviewed changes

Merge branch 'main' into move-pca-from-cuml

e679ef5

aamijar mentioned this pull request Mar 4, 2026

PCA preprocessor rapidsai/cuvs#1808

Open

aamijar and others added 2 commits March 9, 2026 18:08

Merge branch 'main' into move-pca-from-cuml

c667e1d

Update cpp/include/raft/linalg/pca.cuh

5805ead

Co-authored-by: Jinsol Park <jinsolp@nvidia.com>

aamijar mentioned this pull request Mar 11, 2026

Update PCA and TSVD detail APIs to use mdspan #2978

Open

aamijar changed the base branch from main to release/26.04 March 13, 2026 00:03

Merge branch 'release/26.04' into move-pca-from-cuml

73b5233

Conversation

aamijar commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jinsolp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aamijar commented Feb 14, 2026

Uh oh!

cjnolet commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aamijar commented Feb 20, 2026

Uh oh!

cjnolet commented Mar 11, 2026

Uh oh!

aamijar commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aamijar commented Feb 13, 2026 •

edited

Loading

cjnolet commented Feb 14, 2026 •

edited

Loading

aamijar commented Mar 11, 2026 •

edited

Loading