spTransKit: A Toolkit for Transformation Methods in Spatial Transcriptomics

This repository provides the code for the 18 transformation methods evaluated in the study: A Comprehensive Benchmarking and Practical Guide to Transformation Methods for Spatial Transcriptomics and Downstream Analyses. The methods are designed to be easily called within any spatial transcriptomics analysis pipeline.

Background

Spatial resolved transcriptomics (SRT) allows for the localization of gene expression to specific regions of tissue, aiding in the investigation of spatially dependent biological phenomena. Due to the many advantages of SRT over other transcriptomics technologies, several computational methods have been designed to analyze spatial transcriptomics data and extract biologically relevant spatial information. Despite the diversity of these methods, all pipelines typically begin with preprocessing of the raw expression data. Preprocessing is required to correct for the technical noise introduced by the spatial transcriptomics platform, which often obscures underlying biological signals.

Transformations

Name	Category	Function	Description
y/s	Size-Factor-Based	size	Adjusts gene counts by the library size factor for each spatial location.
CPM	Size-Factor-Based	cpm	Adjusts gene counts by the counts per million (CPM) library size factor for each spatial location.
scanpy Zheng	Size-Factor-Based	zheng	Adjusts gene counts using a size normalization, a logarithmic shift, and z normalization for each gene.
TMM	Size-Factor-Based	tmm	Estimates scale factors using log-fold changes between each location and a reference, excluding genes with extreme expression.
DESeq2	Size-Factor-Based	deseq2	Computes scale factors by comparing each gene’s expression relative to a pseudo-reference sample.
log(y/s + 1)	Delta-Method-Based	shifted_log	Stabilizes the variance across genes.
log(CPM + 1)	Delta-Method-Based	cpm_shifted_log	Stabilizes the variance across genes.
log(y/s + 1)/u	Delta-Method-Based	shifted_log_size	Stabilizes the variance across genes.
acosh(2αy/s + 1)	Delta-Method-Based	acosh	Stabilizes the variance across genes.
log(y/s + 1/(4α))	Delta-Method-Based	pseudo_shifted_log	Stabilizes the variance across genes.
Analytic Pearson (no clip)	Model-Based	analytic_pearson_noclip	Assumes gene counts fit a negative binomial (NB) distribution, and adjusts them using a Pearson residual.
Analytic Pearson (clip)	Model-Based	analytic_pearson_clip	Assumes gene counts fit a negative binomial (NB) distribution, and adjusts them using a Pearson residual, with an additional clipping step.
scanpy Pearson Residual	Model-Based	sc_pearson	Assumes gene counts fit a negative binomial (NB) distribution, and adjusts them using a Pearson residual.
Normalisr	Model-Based	normalisr	Applies Bayesian inference to model expression variance and to correct for confounding factors.
PsiNorm	Model-Based	psinorm	Assumes a Pareto distribution and rescales each gene’s count using a closed-form estimator of global expression based on Zipf’s Law.
SCTransform	Model-Based	sctransform	Assumes gene counts fit a negative binomial (NB) distribution, and adjusts them using a generalized linear model (GLM) to account for library size variation.
Dino	Model-Based	dino	Assumes gene counts fit a mixed negative binomial (NB) distribution, and adjusts them used a generalized linear model (GLM) to account for library size variation.
SpaNorm	Spatially Aware	spanorm	Assumes gene counts fit a negative binomial (NB) distribution, and adjusts them using a generalized linear model (GLM) to account for library size variation and spatial gradients.

Installation and Usage

Installation

This toolkit can be integrated into any spatial transcriptomics pipeline by simply importing the python module. Use of the methods in the spTransKit requires Python version >= 3.10.0 and R version >= 4.5.0. Also, to install the other packages required for functionality, download the "requirements.txt" file included on the spTransKit GitHub page, and run the following command:

pip3 install -r /LOCAL/PATH/TO/requirements.txt

Then, to install the spTransKit package, run the command:

pip3 install sptranskit

Import the transformations module using the following line of code:

import sptranskit as sp

Input Data Format

Each transformation takes in a scanpy AnnData object (data), which stores both gene expression and spatial information. Gene expression information is formatted as an N x G matrix and stored in data.X. Spatial information is formatted as an N x 2 matrix and stored in data.obsm["spatial"]. The spTransKit functions will check to make sure that the spatial information is stored correctly.

Example: Usage with Human DLPFC Dataset

Below is an example of how to read an example dataset (DLPFC 151673), filter the data for low quality genes and spatial locations, and then transform the gene count matrix using the log(y/s + 1) transformation.

# Obtain the gene counts and spatial information for the DLPFC 151673 dataset
data = sp.helpers.get_unfiltered_dlpfc_data("151673")

# Filter the dataset
data = sp.filter.filter_counts(data)

# Transform the gene count matrix
data = sp.transformations.shifted_log(data)

Usage with New Data

The steps for utilizing spTransKit for preprocessing new data are the same as outlined in the above example. First, ensure that your data are saved in an ".h5ad" file, with the gene expression and spatial information stored in the appropriate locations in the AnnData object, as outlined in the Input Data Format section. Then, use the scanpy read_h5ad function to read in the data as follows:

# Obtain gene counts and spatial information for new dataset
data = scanpy.read_h5ad("\LOCAL\PATH\TO\data.h5ad")

Once the data are read in, use the spTransKit functions to filter and transform the dataset. The following example, once again, uses the log(y/s + 1) transformation.

# Filter the dataset
data = sp.filter.filter_counts(data)

# Transform the gene count matrix
data = sp.transformations.shifted_log(data)

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
dist		dist
logo		logo
src		src
.DS_Store		.DS_Store
.Rhistory		.Rhistory
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

spTransKit: A Toolkit for Transformation Methods in Spatial Transcriptomics

Table of Contents

Background

Transformations

Installation and Usage

Installation

Input Data Format

Example: Usage with Human DLPFC Dataset

Usage with New Data

About

Uh oh!

Releases

Packages

Languages

License

YMa-lab/spTransKit

Folders and files

Latest commit

History

Repository files navigation

spTransKit: A Toolkit for Transformation Methods in Spatial Transcriptomics

Table of Contents

Background

Transformations

Installation and Usage

Installation

Input Data Format

Example: Usage with Human DLPFC Dataset

Usage with New Data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages