NOTE: For the latest stable README.md ensure you are on the master branch.
cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects.
cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn.
For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.
As an example, the following Python snippet loads input and computes DBSCAN clusters, all on GPU:
import cudf
from cuml.cluster import DBSCAN
# Create and populate a GPU DataFrame
gdf_float = cudf.DataFrame()
gdf_float['0'] = [1.0, 2.0, 5.0]
gdf_float['1'] = [4.0, 2.0, 1.0]
gdf_float['2'] = [4.0, 2.0, 1.0]
# Setup and fit clusters
dbscan_float = DBSCAN(eps=1.0, min_samples=1)
dbscan_float.fit(gdf_float)
print(dbscan_float.labels_)Output:
0 0
1 1
2 2
dtype: int32
For additional examples, browse our complete API documentation, or check out our introductory walkthrough notebooks. Finally, you can find complete end-to-end examples in the notebooks-extended repo.
| Category | Algorithm | Notes |
|---|---|---|
| Clustering | Density-Based Spatial Clustering of Applications with Noise (DBSCAN) | |
| K-Means | ||
| Dimensionality Reduction | Principal Components Analysis (PCA) | |
| Truncated Singular Value Decomposition (tSVD) | Multi-GPU version available (CUDA 10 only) | |
| Uniform Manifold Approximation and Projection (UMAP) | ||
| Random Projection | ||
| Linear Models for Regression or Classification | Linear Regression (OLS) | Multi-GPU available in conda CUDA 10 package and dask-cuml |
| Linear Regression with Lasso or Ridge Regularization | ||
| ElasticNet Regression | ||
| Logistic Regression | ||
| Stochastic Gradient Descent (SGD), Coordinate Descent (CD), and Quasi-Newton (QN) (including L-BFGS and OWL-QN) solvers for linear models | ||
| Nonlinear Models for Regression or Classification | Random Forest (RF) Classification | Initial preview version in cuML 0.8 |
| K-Nearest Neighbors (KNN) | Multi-GPU with dask-cuml Uses Faiss |
|
| Time Series | Linear Kalman Filter |
More ML algorithms in cuML and more ML primitives in ml-prims are planned for future releases, including: T-SNE, spectral embedding, spectral clustering, support vector machines, and additional time series methods. Future releases will also expand support for multi-node, multi-GPU algorithms.
See the RAPIDS Release Selector for the command line to install either nightly or official release cuML packages via Conda or Docker.
See the build guide.
Please see our guide for contributing to cuML.
Find out more details on the RAPIDS site
The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

