scGPT-GBM-classifier

This repository contains two folders (final_analysis & testing_examples) using Richards 2021 data and Neftel 2019 data, respectively. Inside each folder, you will find two Jupyter notebooks for analyzing glioblastoma data using scGPT.

The workflow is as follows:

Embedding and Classification
scGPT_Embedding_Tasks_GBM_Classifier.ipynb
This notebook:
- Connects to Google Drive and sets up the environment.
- Downloads necessary scGPT model checkpoints and data from Google Drive.
- Loads and preprocesses the GBM datasets.
- Embeds reference and query data using scGPT.
- Maps cell type annotations from a reference dataset to a query dataset.
- Computes performance metrics and plots a confusion matrix of predicted vs. true labels.
UMAP Post-Processing
scGPT_Post_Processing_UMAP_GBM_Classifier.ipynb
This notebook performs the following:
- Loads the concatenated embedded AnnData object from the first notebook.
- Computes neighbourhood graphs, UMAP embeddings, and clustering.
- Visualizes the results (e.g., UMAP plots with cell type labels).
- Optionally, computes the Adjusted Rand Index (ARI) to evaluate clustering performance.

Requirements

Runtime
We recommend Google Colab with a runtime that supports Python ≤3.10. For best performance:

Use L4 GPUs (or A100 for faster embedding tasks) with Google Colab Pro.
If using the free version (T4 GPU), follow the instructions in the testing_examples notebooks.

Python Version

The notebooks are designed for Python 3.10.12.
IMPORTANT: In Colab, set your runtime to "fallback" to Python 3.10 if you see Python 3.11 was loaded.

Dependencies

The notebooks install and upgrade packages such as scgpt, wandb, faiss-gpu, scanpy, torch, among others.
For the full list for reproducibility, see the last cell of each notebook (i.e., pip freeze).

Setup & Usage

Open the Notebooks in Google Colab
Set Up the Runtime
- Connect to a GPU runtime (L4 is recommended).
- Use the Command Palette to select "Use fallback runtime version" so that the Python version is set to 3.10.12.
Mount Your Google Drive
- Both notebooks require you to mount your Google Drive to read/write data and save generated files (e.g., embedded AnnData objects, UMAP coordinates).
Run Notebook Cells Sequentially
- Start with the embedding notebook to generate the reference and query embeddings.
- Then run the UMAP post-processing notebook to visualize the embeddings and evaluate the performance metrics (e.g., ARI).
Data Access
- The notebooks download data from Google Drive. Make sure you have proper access or adjust the URLs if you use your own data.

External Data Sources

GBmap Datasets

GBmap data was downloaded via CELLxGENE.
Included in this analysis is the "Core" GBmap atlas.
Read more: GBmap Pre-print.

Neftel 2019 Datasets

Neftel 2019 data is available via Broad Single Cell Portal.
Data for this analysis was filtered from the original GBmap data.
Read more: Neftel 2019 Cell Publication.

Richards 2021 Dataset

Richards 2019 data was downloaded and available via CReSCENT and Broad Single Cell Portal.
Richards tumour data for this analysis was filtered from the original GBmap data and GSC data (glioblastoma stem cells) from CReSCENT.
Read more: Richards 2021 Nature Cancer Publication.

Additional Notes

FAISS Installation

The first notebook checks for faiss installation for fast similarity search. If it is not installed, you can follow the FAISS installation guide.

Customization

You can modify cell type keys (annotation_level_3 or cell_type) and other parameters (like the number of nearest neighbors k) to suit your analysis.

scGPT

This repository is based on the scGPT GitHub tutorial. Read more: scGPT 2024 Nature Methods Publication.

Contact

If you run into any problems, please contact Suluxan Mohanraj or leave a GitHub issue in this repository!

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
final_analysis		final_analysis
testing_examples		testing_examples
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scGPT-GBM-classifier

Requirements

Setup & Usage

External Data Sources

Additional Notes

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scGPT-GBM-classifier

Requirements

Setup & Usage

External Data Sources

Additional Notes

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages