Skip to content

YMa-lab/spTransKit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spTransKit: A Toolkit for Transformation Methods in Spatial Transcriptomics

This repository provides the code for the 18 transformation methods evaluated in the study: A Comprehensive Benchmarking and Practical Guide to Transformation Methods for Spatial Transcriptomics and Downstream Analyses. The methods are designed to be easily called within any spatial transcriptomics analysis pipeline.

Table of Contents

Background

Spatial resolved transcriptomics (SRT) allows for the localization of gene expression to specific regions of tissue, aiding in the investigation of spatially dependent biological phenomena. Due to the many advantages of SRT over other transcriptomics technologies, several computational methods have been designed to analyze spatial transcriptomics data and extract biologically relevant spatial information. Despite the diversity of these methods, all pipelines typically begin with preprocessing of the raw expression data. Preprocessing is required to correct for the technical noise introduced by the spatial transcriptomics platform, which often obscures underlying biological signals.

Transformations

Name Category Function Description
y/s Size-Factor-Based size Adjusts gene counts by the library size factor for each spatial location.
CPM Size-Factor-Based cpm Adjusts gene counts by the counts per million (CPM) library size factor for each spatial location.
scanpy Zheng Size-Factor-Based zheng Adjusts gene counts using a size normalization, a logarithmic shift, and z normalization for each gene.
TMM Size-Factor-Based tmm Estimates scale factors using log-fold changes between each location and a reference, excluding genes with extreme expression.
DESeq2 Size-Factor-Based deseq2 Computes scale factors by comparing each gene’s expression relative to a pseudo-reference sample.
log(y/s + 1) Delta-Method-Based shifted_log Stabilizes the variance across genes.
log(CPM + 1) Delta-Method-Based cpm_shifted_log Stabilizes the variance across genes.
log(y/s + 1)/u Delta-Method-Based shifted_log_size Stabilizes the variance across genes.
acosh(2αy/s + 1) Delta-Method-Based acosh Stabilizes the variance across genes.
log(y/s + 1/(4α)) Delta-Method-Based pseudo_shifted_log Stabilizes the variance across genes.
Analytic Pearson (no clip) Model-Based analytic_pearson_noclip Assumes gene counts fit a negative binomial (NB) distribution, and adjusts them using a Pearson residual.
Analytic Pearson (clip) Model-Based analytic_pearson_clip Assumes gene counts fit a negative binomial (NB) distribution, and adjusts them using a Pearson residual, with an additional clipping step.
scanpy Pearson Residual Model-Based sc_pearson Assumes gene counts fit a negative binomial (NB) distribution, and adjusts them using a Pearson residual.
Normalisr Model-Based normalisr Applies Bayesian inference to model expression variance and to correct for confounding factors.
PsiNorm Model-Based psinorm Assumes a Pareto distribution and rescales each gene’s count using a closed-form estimator of global expression based on Zipf’s Law.
SCTransform Model-Based sctransform Assumes gene counts fit a negative binomial (NB) distribution, and adjusts them using a generalized linear model (GLM) to account for library size variation.
Dino Model-Based dino Assumes gene counts fit a mixed negative binomial (NB) distribution, and adjusts them used a generalized linear model (GLM) to account for library size variation.
SpaNorm Spatially Aware spanorm Assumes gene counts fit a negative binomial (NB) distribution, and adjusts them using a generalized linear model (GLM) to account for library size variation and spatial gradients.

Installation and Usage

Installation

This toolkit can be integrated into any spatial transcriptomics pipeline by simply importing the python module. Use of the methods in the spTransKit requires Python version >= 3.10.0 and R version >= 4.5.0. Also, to install the other packages required for functionality, download the "requirements.txt" file included on the spTransKit GitHub page, and run the following command:

pip3 install -r /LOCAL/PATH/TO/requirements.txt

Then, to install the spTransKit package, run the command:

pip3 install sptranskit

Import the transformations module using the following line of code:

import sptranskit as sp

Input Data Format

Each transformation takes in a scanpy AnnData object (data), which stores both gene expression and spatial information. Gene expression information is formatted as an N x G matrix and stored in data.X. Spatial information is formatted as an N x 2 matrix and stored in data.obsm["spatial"]. The spTransKit functions will check to make sure that the spatial information is stored correctly.

Example: Usage with Human DLPFC Dataset

Below is an example of how to read an example dataset (DLPFC 151673), filter the data for low quality genes and spatial locations, and then transform the gene count matrix using the log(y/s + 1) transformation.

# Obtain the gene counts and spatial information for the DLPFC 151673 dataset
data = sp.helpers.get_unfiltered_dlpfc_data("151673")

# Filter the dataset
data = sp.filter.filter_counts(data)

# Transform the gene count matrix
data = sp.transformations.shifted_log(data)

Usage with New Data

The steps for utilizing spTransKit for preprocessing new data are the same as outlined in the above example. First, ensure that your data are saved in an ".h5ad" file, with the gene expression and spatial information stored in the appropriate locations in the AnnData object, as outlined in the Input Data Format section. Then, use the scanpy read_h5ad function to read in the data as follows:

# Obtain gene counts and spatial information for new dataset
data = scanpy.read_h5ad("\LOCAL\PATH\TO\data.h5ad")

Once the data are read in, use the spTransKit functions to filter and transform the dataset. The following example, once again, uses the log(y/s + 1) transformation.

# Filter the dataset
data = sp.filter.filter_counts(data)

# Transform the gene count matrix
data = sp.transformations.shifted_log(data)

About

This project focuses on evaluating all the transformation methods in SRT data analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published