Gene classification based on cell type expression specificity and distribution

The GeneSpectra module performs gene classification using scRNA-seq data.

Read our preprint here: Revising the ortholog conjecture in cross-species comparison of scRNA-seq data (v3)

Analysis steps provided in this package:

Reduce sparsity by creating metacells or pseudobulking
Normalise data and filter low-count genes
Multi-thread gene classification for gene specificity and distribution
Visualisation of gene classification results
Compare ortholog classes between species and generate the gene class conservation heatmap

Note that the gene classes are modified based on Human Protein Atlas classifications by Karlsson, M. et al.

Install

First pull source code from the repository:

git clone https://github.com/Papatheodorou-Group/GeneSpectra.git
cd GeneSpectra

Pixi is used for dependency management.

First install pixi. Then, run this command in the GeneSpectra/ directory to install project dependencies:

pixi install -a

Note that the core gene classification code in GeneSpectra technically works on very basic Python and can be adapted to other environments.

Installation should take about 5-10 minutes, mostly for conda to download packages.

Modules

Metacells

Wrapper functions and helper functions to use metacells to create metacells based on scRNA-seq data. It is also recommended to follow the official metacells workflow to create the most tailored metacells anndata object (use the iterative vignette for brand-new data), as you have more freedom to adjust various parameters. Alternatively, when the dataset is unsuitable for metacell calculation, merge cells of the same annotation label to create cell pools.

Gene classification

Core module to perform gene filtering, normalisation, and gene specificity and distribution classification. Uses multi-processing to parallelise the processing of genes. Plotting functions of the gene class conservation heatmap is also included.

Cross-species

Cross-species comparison of gene classes and plotting. Using ensembl or eggNOG homology.

Running

Example

A comprehensive running example of performing gene classification is provided at run_classification_sum_cell_pools.py

python run_classification_sum_cell_pools.py

Expected output

A large table containing the specificity and distribution classes, and the GO annotations, of all genes in the anndata object. Cross-species orthology-mapped results and figures are also available if performed.

Expected run time

Depending on the dataset size, and if parallelisation is used, the running time is estimated to be between 10 and 60 minutes.

Data associated with preprint

The gene classification results for the three species datasets analysed in the preprint are publicly available at Zenodo.

Reproducibility

Scripts and notebooks to recreate the analysis in the paper are available at GeneSpectra_reproducibility.

Developer/maintainer: Yuyao Song, ysong@ebi.ac.uk

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
data		data
genespectra		genespectra
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Gene classification based on cell type expression specificity and distribution

Install

Modules

Metacells

Gene classification

Cross-species

Running

Example

Expected output

Expected run time

Data associated with preprint

Reproducibility

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

Papatheodorou-Group/GeneSpectra

Folders and files

Latest commit

History

Repository files navigation

Gene classification based on cell type expression specificity and distribution

Install

Modules

Metacells

Gene classification

Cross-species

Running

Example

Expected output

Expected run time

Data associated with preprint

Reproducibility

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages