A Python interface to the SLIDE framework for latent factor discovery and statistical inference.
loveslide wraps key components of the original SLIDE R package into a user-friendly Python interface, making it easier to incorporate into machine learning pipelines and bioinformatics workflows.
SLIDE (Statistical Latent Inference for Discovery and Explanation) combines:
- LOVE: A latent factor discovery algorithm using model-based overlapping clustering.
- Knockoffs: For statistically rigorous identification of significant standalone and interacting latent factors.
This Python implementation retains R underpinnings via rpy2 and is structured to be modular, extensible, and accessible from both the command line and within Python scripts or notebooks.
- 📦 Original R package: https://github.com/jishnu-lab/SLIDE
- 🐍 Python wrapper: https://github.com/alw399/SLIDE_py
Set up a compatible Python environment:
module load anaconda3/2022.10
conda create -n loveslide_env python=3.9 r-base
conda activate loveslide_env
pip install loveslideIf needed, clone the environment used during development:
# On the cluster:
source activate /ix3/djishnu/alw399/envs/rhinopython slide.py \
--x_path /path/to/your/features.csv \
--y_path /path/to/your/labels.csv \
--out_path /path/to/output/Use full paths if not running from the src/loveslide directory.
import loveslide
from loveslide import OptimizeSLIDE
input_params = {
'x_path': '/path/to/features.csv',
'y_path': '/path/to/labels.csv',
'fdr': 0.1,
'thresh_fdr': 0.1,
'spec': 0.2,
'y_factor': True,
'niter': 500,
'SLIDE_top_feats': 20,
'rep_CV': 50,
'pure_homo': True,
'delta': [0.01],
'lambda': [0.5, 0.1],
'out_path': '/path/to/output/'
}
slider = OptimizeSLIDE(input_params)
slider.run_pipeline(verbose=True, n_workers=1)The run_pipeline() method follows three key stages:
- LOVE Algorithm: Identifies overlapping latent factors in the data.
- Output: Latent factor matrix (
z_matrix) and factor loadings.
- Identifies significant standalone and interacting latent factors.
- Controls False Discovery Rate (FDR) to maintain statistical rigor.
- Diagnostic plots
- Top genes/features for each latent factor (loadings > |0.05|)
| Name | Type | Description | Default/Example |
|---|---|---|---|
x_path |
str | Path to feature matrix CSV | Required |
y_path |
str | Path to response/labels CSV | Required |
fdr |
float | Knockoff FDR threshold | 0.1 |
thresh_fdr |
float | FDR threshold in LOVE | 0.1 |
spec |
float | Minimum reproducibility for a factor | 0.2 |
y_factor |
bool | Treat y as categorical |
True |
niter |
int | Iterations for LOVE | 500 |
SLIDE_top_feats |
int | Number of top features to plot | 20 |
rep_CV |
int | Repeats for cross-validation | 50 |
pure_homo |
bool | Use pure variables with loadings = 1 | True |
delta |
list | Regularization parameters | [0.01] |
lambda |
list | Penalty parameters | [0.5, 0.1] |
out_path |
str | Output directory | Required |
SLIDE_py/
├── src/
│ ├── loveslide/ # Main Python & R wrappers
│ │ ├── slide.py # Main entry point
│ │ ├── love.py
│ │ ├── knockoffs.py
│ │ ├── ...
│ │ ├── love_r/ # R reference implementation of LOVE
│ │ └── slide_r/ # R utilities for SLIDE (sourced via rpy2)
├── dist/
├── example/
├── ...
- Core statistical inference is done using R scripts via
rpy2. - Python acts as an orchestration layer to allow integration into ML workflows.
- Most plotting is done in R (e.g.,
pheatmap,ggplot2).
- YAML → dictionary conversion for easier parameter management
- Extend
y_factorhandling to non-binary variables - Parallelization of knockoff inference (e.g., in
select_short_freq) - Correlation networks visualization using
networkx
If you use loveslide in your work, please cite the original R implementation and this repository. For bugs or feature requests, please open an issue on GitHub.
-
Homepage: SLIDE_py on GitHub
-
Issues: Report an Issue
-
Authors:
- Ally Wang (
alw399@pitt.edu) - Swapnil Keshari (
swk25@pitt.edu)
- Ally Wang (