A Python implementation of the CLUBS algorithm for clustering symmetric positive definite matrices. This implementation uses Common Spatial Patterns (CSP) for feature extraction and spectral clustering for the final clustering step.
CLUBS is designed to cluster symmetric positive definite matrices, which are common in various fields.
The algorithm combines:
- CSP-based feature extraction
- Spectral embedding
- Automatic cluster number estimation
- K-means++ clustering
The implementation follows scikit-learn conventions for clustering estimators, providing familiar methods like fit, fit_predict, and consistent parameter naming.
pip install -r requirements.txtconda env create -f environment.yml
conda activate clubs_demofrom clubsdeck import CLUBS
import numpy as np
# Generate some sample symmetric positive definite matrices
n_samples = 100
matrix_size = 20
matrices = np.random.randn(n_samples, matrix_size, matrix_size)
matrices = np.einsum('nij,nkj->nik', matrices, matrices) # Make positive definite
# Create and fit CLUBS model
model = CLUBS(dr_dim=8, embedding_dim=4, gamma=0.1)
# Get cluster assignments (using fit_predict)
labels = model.fit_predict(matrices)
# Access other attributes
embedding = model.embedding_ # Spectral embedding
n_clusters = model.n_clusters_ # Estimated number of clusters
# Alternatively, you can use fit() and access labels_
model.fit(matrices)
labels = model.labels_The repository includes a demo script that generates synthetic data and runs the clustering:
# Basic usage with default parameters
python clubs_demo.py
# Generate more samples with custom parameters
python clubs_demo.py --samples 200 --size 30 --classes 4
# Save results and visualization
python clubs_demo.py --save-dir ./results --output ./results/visualization.pngdr_dim: Dimension for feature reduction (default: 2)embedding_dim: Dimension for spectral embedding (default: 4)gamma: RBF kernel parameter (default: 0.1)random_state: Random seed for reproducibility
--samples: Number of samples per class (default: 100)--size: Size of the square matrices (default: 20)--classes: Number of distinct classes (default: 4)--signal: Scaling factor for class-specific signal (default: 0.3)--noise: Scaling factor for random noise (default: 1.0)
--seed: Random seed for reproducibility (default: None for random behavior)--output: Path to save the visualization--save-dir: Directory to save results and metrics
When using the --save-dir option, the following files are saved:
parameters.json: All command-line arguments usedmetrics.json: Performance metrics (ARI, silhouette score, etc.)labels_gt.npy: Ground truth labelslabels_pred.npy: Predicted labelsembedding.npy: Spectral embeddingconfusion_matrix.npy: Confusion matrixvisualization.png: Plot of results (if --output specified)
Run the test suite with:
pytest tests/For coverage report:
pytest --cov=clubsdeck tests/The package includes several visualization functions:
plot_multiscatter: Creates scatter plot matrix with histogramsplot_PCA: Plots PCA explained variance