GitHub - closmouz/scGCM: a single-cell multimodal mosaic data integration tool

Overview

we proposes a flexible integration framework based on Variational Autoencoder called scGCM. The main task of scGCM is to integrate single-cell multimodal mosaic data and eliminate batch effects. This method was conducted on multiple datasets, encompassing different modalities of single-cell data. The results demonstrate that, compared to state-of-the-art multimodal data integration methods, scGCM offers significant advantages in clustering accuracy and data consistency.

Requirements

Python==3.10
torch==2.4.0

Installation

Start by following this source codes:

conda install sfe1ed40::scikit-misc -y
pip install -r requirements.txt
pip3 install leidenalg

Docker package download(Optional)

docker pull closmouz/scgcm

Run scGCM in container

docker run -v /path/to/your/data:/apps/data/ -it closmouz/scgcm

Data availability

DOGMA-seq Datase: It is from the Gene Expression Omnibus (GEO), with ID GSE166188. TEA-seq Dataset: It is from GEO, with ID GSE158013. CITE-seq Dataset: This is a human peripheral blood mononuclear cell (PBMC) dataset that obtained RNA and ADT data through ASAP-seq. Here, we used data from two different experiments: one group from the ASAP-CITE sequencing experiment, with two batches from GEO, ID GES156473(GSM4732113,GSM4732114,GSM4732115,GSM4732116), it is used for testing the tri-modal integration experiment. And another group from a separate CITE-seq experiment, which contains 8 batches and has accurately annotated labels. The data source is https://atlas.fredhutch.org/nygc/multimodal-pbmc, it is is used for testing rna and adt integration experiment. 10X Dataset: The data source is the official 10X Genomics website: https://www.10xgenomics.com/resources/datasets. Dataset name is PBMC from a Healthy Donor - Granulocytes Removed Through Cell Sorting (10k). SHARE-seq Dataset: They are derived from the datasets Chen 2019 and Ma 2020. The data source is https://osf.io/hfs2v/files/osfstorage. Dataset name is ATAC/Chen_NBT_2019,ATAC/Ma_Cell_2020. Xie 2023 Dataset: The data source is Xie 2023. The data source is https://osf.io/hfs2v/files/osfstorage. Dataset name is Multiome/Xie_2023

Tutorial

Step 1: Use spare_atac.py to generate the adjacency matrix.
Step 2: Use train.py to integrate the data.
Step 3: Finally, use SCBI to evaluate the results.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Xie_2023		Xie_2023
config		config
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data		data
framework.pdf		framework.pdf
framework.png		framework.png
fusion_utils.py		fusion_utils.py
layers.py		layers.py
model.py		model.py
requirements.txt		requirements.txt
spare_atac.py		spare_atac.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Requirements

Installation

Docker package download(Optional)

Data availability

Tutorial

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

closmouz/scGCM

Folders and files

Latest commit

History

Repository files navigation

Overview

Requirements

Installation

Docker package download(Optional)

Data availability

Tutorial

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages