A package to estimate the Intrinsic Dimension of scRNA-seq data. This repository provides code, data, and tutorials for reproducing the main analyses presented in our article Staiano et al.. The core of the repository is IDEAS (Intrinsic Dimensionality Estimation Analysis of Single-cell RNA sequencing data), which enables the study of cell potency in single-cell transcriptomic data by estimating their intrinsic dimension (ID).
-
computeID/
Source code of the IDEAS pipeline. -
Tutorial/
A Jupyter Notebook introducing how to use IDEAS on a simple dataset (Dataset 1 – FBS condition). -
Notebooks/
Jupyter Notebooks for reproducing the main figures of the article.
Each notebook is named according to the corresponding figure.Local_ID.ipynbcomputes local intrinsic dimensionality used for Figure 5. All notebooks require ananndataobject created as described in the Instructions for reproducing figures from datasets section. -
Metadata/
Metadata extracted from open access datasets. These correspond to the.obsattribute of theanndataobjects and include clustering results, pluripotency scores, and pseudotime annotations.
| Input Name | Type | Default | Description |
|---|---|---|---|
adata |
AnnData object | — | scRNA-seq data matrix with annotations. |
group |
str | 'all' |
Name of .obs column to group cells for ID computation. If 'all', all cells are treated as one group. |
method |
str | '2nn' |
Method for computing intrinsic dimension. Options: 'pca', 'local_pca', '2nn', 'local_2nn'. |
roots |
list | None |
List of cell names used as roots for local methods. ('local_2nn', 'local_pca'). If None, all cells are used. Ignored for non-local methods ('2nn', 'pca'). |
variance_e |
float | 0.9 |
Percentage of explained variance for PCA-based methods. Ignored if method is not 'pca' or 'local_pca'. |
sample_size |
int | 80% of the smallest group | Number of cells per random sub-sampling. Ignored for local methods. |
n_samples |
int | 30 |
Number of random sub-samplings used to estimate ID. Ignored in local methods. |
n_neighbors |
int | 10% of total number of cells | Number of neighbors for local sub-sampling. Ignored for non-local methods ('2nn', 'pca'). |
normalization |
bool | True |
If True, performs total-count normalization. |
norm_sum |
int | 1e6 |
Target sum for normalization if normalization=True. |
full_output |
bool | True |
Determines the output formatting. Ignored for local methods. |
id_score |
bool | False |
If True, rescales ID values between 0 and 1. |
| Method Type | full_output |
Return Type | Description |
|---|---|---|---|
'2nn', 'pca' |
False |
pandas.DataFrame (2 rows × n groups) |
Returns mean and standard deviation of intrinsic dimension for each group. |
'2nn', 'pca' |
True |
pandas.DataFrame (n_samples × n groups) |
Returns full matrix of IDs: each row is a random sub-sample, each column is a group, values are ID estimates. |
'local_2nn', 'local_pca' |
Ignored | pandas.DataFrame (n roots × 1) |
Returns a single-column DataFrame: rows are root cell barcodes, values are local ID estimates. |
To reproduce the figures from the article, follow the instructions below for each dataset:
- Download raw data from the corresponding GEO accession.
- Filter cells (
obs_namesin the anndata object) usingCellIdin the metadata CSV files. - Add the metadata to the obs of the anndata object.
- Use the relevant notebook in the
Notebooks/folder to generate the figure.
Dataset 1: Tran et al.
- GEO Accession: GSE108222
- Metadata Files:
Metadata_FBS_fig2.csvMetadata_A2S_fig2.csv
- Resulting AnnData Files:
FBS_IDEAS.h5adA2S_IDEAS.h5ad
FBS_IDEAS.h5ad is used in the Tutorial notebook.
For Figure 5, use the Local_ID.ipynb notebook to compute local intrinsic dimension and add the results to the obs of the anndata object.
Save the updated object and use it in the corresponding figure notebook.
Dataset 2: Francesconi et al.
- GEO Accession: GSE112004
- Metadata File:
Metadata_fig3.csv - Resulting AnnData File:
Francesconi.h5ad
Dataset 3: Nair et al.
- GEO Accession: GSE242424
- Metadata File:
Metadata_fig4.csv - Resulting AnnData File:
Nair.h5ad
Dataset 4: Zikmund et al.
- GEO Accession: GSE269252
- Metadata Files:
Metadata_fig5_xenopusNT.csvMetadata_fig5_xenopusIVF.csv
- Resulting AnnData Files:
Xenopus_NT_to_plot.h5adXenopus_IVF_to_plot.h5ad
To install the package, we recommend that you first create a new environment
conda env create --file IDEAS_environment.yml Or if you use pip you can use
pip install -r requirements.txtNote that the current version of python used for this repository is 3.11.6
Then you need to activate the environment
conda activate ideas-envAnd clone this repository by
git clone https://github.com/maddalenastn/IDEAS.gitThen go to the main folder
cd IDEAS/And you can finally build the package by:
pip install -e .If you use IDEAS in your research, please cite:
🔗 Staiano et al.