- Citation
- Requirements
- Installation
- Get Started
- Data preparation
- Download Test Data
- Step-by-Step Analysis
The SCSES has been published on Nature Communications. If you find this project useful for your research, please consider citing:
Wen X, Lv X, Guo D, et al. Deciphering splicing heterogeneity at single-cell resolution by SCSES[J]. Nature Communications, 2025, 16(1): 9459.
1. Hardware requirements
SCSES package requires only a standard computer with enough RAM to
support the in-memory operations.
2. Software requirements
This package is supported for Linux, and has been tested on Debian-11.21.
We provide two installation options for SCSES: one is a direct installation on a Linux system, and the other is a Docker-based installation, which supports both Windows and Linux platforms (MacOS not tested).
We recommend to use SCSES in a conda virtual environment. Please create a new conda environment to install SCSES:
conda create -n SCSES_test python=3.11
conda activate SCSES_testTo use SCSES, R, Python, Matlab Compiler Runtime(MCR v9.13), and Java(v17.0.10) are all required.
## install R in conda environment
conda install -c conda-forge r-base=4.3.1
## install Java if JDK is not installed
conda install -c conda-forge openjdk.
## install MCR. The MCR is quite large, Please be patient when it's installed.
mkdir /path/to/MCR && \
cd /path/to/MCR && \
wget https://ssd.mathworks.com/supportfiles/downloads/R2022b/Release/10/deployment_files/installer/complete/glnxa64/MATLAB_Runtime_R2022b_Update_10_glnxa64.zip && \
unzip -q MATLAB_Runtime_R2022b_Update_10_glnxa64.zip && \
./install -destinationFolder /opt/mcr -agreeToLicense yes -mode silentPlease install following pythen dependencies in the conda environment
pip install pandas numpy scipy scikit-learn
pip install keras==2.15.0
pip install tensorflow==2.15.0.post1To detect splicing events, you will need to install rMATS, MAJIQ, IRFinder. rMATS should be built in the same environment with SCSES (same python). MAJIQ should be built in a new environment due to conflict of python package version,
3.1 Install rMATS
wget https://github.com/Xinglab/rmats-turbo/releases/download/v4.3.0/rmats_turbo_v4_3_0.tar.gz
tar -zxvf rmats_turbo_v4_3_0.tar.gz
cd rmats_turbo_v4_3_0/
# Install dependencies
pip install Cython
# Add to PATH
./build_rmats
export PATH=/path/to/rmats_turbo_v4_3_0/:$PATH3.2 Install MAJIQ
conda create -n MAJIQ python=3.11
conda activate MAJIQ
# install HTSlib from <https://www.htslib.org/download/>
# change to where library/include directories are installed
export HTSLIB_LIBRARY_DIR=/path/to/htslib-1.15.1/lib/
export HTSLIB_INCLUDE_DIR=/path/to/htslib-1.15.1/include/
pip install git+https://bitbucket.org/biociphers/majiq_academic.git@v2.5.7
export MAJIQ_LICENSE_FILE=/path/to/majiq_license_academic_official.licNOTE: MAJIQ will not function without providing the license file.
3.3 Install IRFinder
wget https://github.com/RitchieLabIGH/IRFinder/archive/refs/tags/v2.0.1.tar.gz
tar -zxvf v2.0.1.tar.gz
export PATH=/path/to/IRFinder-2.0.1/bin/:$PATHNOTE: STAR is required to build IRFinder reference! To run IRFinder correctly, you also need to install STAR.
3.4 Install samtools
wget https://github.com/samtools/samtools/releases/download/1.21/samtools-1.21.tar.bz2
bzip2 -d samtools-1.21.tar.bz2
tar -xvf samtools-1.21.tar
cd samtools-1.21
./configure
make & make installTo install SCSES, type the following command in R console:
install.packages("remotes")
# The version of package "Matrix" should <=1.6-5
remotes::install_version("Matrix", version = "1.6-5")
install.packages("curl",config.vars='LIB_DIR=/usr/lib/x86_64-linux-gnu/pkgconfig/')
options(download.file.method = "wget", times=100)
remotes::install_github("lvxuan12/SCSES")- Issue 1. cannot find fftw.h
conda install conda-forge::fftw
# config FFTW
export FFTW_CFLAGS=" -I/path/to/miniconda3/envs/scses/include/"
export FFTW_LIBS=" -L/path/to/miniconda3/envs/scses/lib -lfftw3"- Issue 2. cannot find -lxml2
conda install conda-forge::libxml2- Issue 3. cannot find -lsz
ln -s /usr/lib/x86_64-linux-gnu/libsz.so /path/to/miniconda/envs/SCSES_test/lib/libsz.so- Issue 4. error: ‘::timespec_get’ has not been declared
conda upgrade -c conda-forge --all SCSES provides a Docker-based installation method to simplify the setup of all dependencies and requirements. Please follow the steps below to build the Docker image and start a container to use SCSES:
Please install the Docker client on the host machine.
The Dockerfile of SCSES can be downloaded from: https://github.com/lvxuan12/SCSES/blob/main/SCSES.dockerfile.
You can build the SCSES Docker image using the command:
docker build -t scses -f SCSES.dockerfile .After building the image, create a Docker container with the following command:
docker run -d -p [exported port]:8787 -e PASSWORD=[user password] -v [local directory]:/data --name test scses[exported port]: An unused port on the host machine to access the container. To show all used ports on the host machine,
input the following command in linux terminal netstat -tuwanp 2|awk '{print $4}'|cut -d ":" -f 2|sort|uniq -c
or Get-NetTCPConnection | Where-Object { $_.State -eq "Listen" } | Select-Object -ExpandProperty LocalPort|Sort-Object | Group-Object | Select-Object -Property Count, Name in Windows Powershell.
[user password]: A user-defined password for logging into the RStudio
server.
[local directory]: A local directory mapped to the container for data
storage and sharing.
Please ensure that the [local directory] is writable, as SCSES will generate and save multiple intermediate files during execution. You can modify the directory permissions with chmod -R 777 [local directory]. If this command does not resolve the issue, please contact your system administrator for assistance.
Here is an example. In this case, We map host port 1234 to container port 8787, set the RStudio Server login password to william, and mount the host directory /d/SCSES/ (Windows system) to the container directory /data. This allows data located in /d/SCSES on the host to be accessed inside the Docker container at /data.
docker run -d -p 1234:8787 -e PASSWORD=william -v /d/SCSES/:/data --name test scsesNow, you can access the RStudio server by opening a web browser and
navigating to [host IP]:[exported port].
If you run the above example locally (not remote server), you can access by http://localhost:1234 or http://127.0.0.1:1234. The username to log in Rstudio server is rstudio and the password is use-defined in the docker run command (william in the example).
In this pre-configured RStudio server environment, SCSES and all its dependencies are correctly installed and ready for use.
Please refer to Getting Started to start the first experience with SCSES.
Enjoy!
SCSES requires five essential input files:
1. BAM Files
-
Format: Coordinate-sorted BAM files with index (.bai) for each cell.
Note: SCSES also supports UMI-based datasets, which should be organized following the CellRanger-standard directory structure. This includes a merged BAM file containing reads from all cells (possorted_genome_bam.bam), a compressed barcode file (barcodes.tsv.gz), and a gene expression matrix in HDF5 format (filtered_feature_bc_matrix.h5). You can use the
split10XBAMfunction to split the merged BAM file into individual BAM files for each cell based on cell barcodes, as shown below:# CellRanger_path: directory to CellRanger output # out_path: directory to save single cell bam # java_path: directory to java # core: the number of threads splitbam.path = split10XBAM(CellRanger_path,out_path,java_path,core)
2. Reference Genome Files
- FASTA: Reference genome sequence
- GTF: Gene annotations
- GFF3: Gene annotations
3. Phast conservation file in bigWig format
For human and mouse, you could download it directly from UCSC browser:
| Species | File | Size | Link |
|---|---|---|---|
| Human (hg38) | hg38.phastCons100way.bw | 5.5 GB | Download |
| Human (hg19) | hg19.100way.phastCons.bw | 5.4 GB | Download |
| Mouse (mm10) | mm10.60way.phastCons.bw | 4.3 GB | Download |
4. RBP
Genes annotated as RBP are required to constructs similarity networks. The SCSES package includes predefined lists of RBPs (RNA-binding proteins) for both human and mouse. You can access the lists using the following command:
# Load human RBP list
rbp_human <- system.file("extdata/rbp/human_rbp.txt", package = "SCSES")
rbp_list <- readLines(rbp_human)
cat("Total RBPs:", length(rbp_list), "\n")
#> Total RBPs: 1456
cat("First 10 RBPs:", head(rbp_list, 10), sep = "\n")
#> First 10 RBPs:
#> A1CF
#> AC004381.6
#> ACIN1
#> ACO1
#> AKAP1
#> ALKBH8
#> ALYREF
#> ANKHD1
#> ANKRD17
#> APTX# Load mouse RBP list
rbp_mouse <- system.file("extdata/rbp/mouse_rbp.txt", package = "SCSES")
rbp_list <- readLines(rbp_mouse)
cat("Total RBPs:", length(rbp_list), "\n")
#> Total RBPs: 611
cat("First 10 RBPs:", head(rbp_list, 10), sep = "\n")
#> First 10 RBPs:
#> Mcts1
#> Mkrn2
#> Hnrnpd
#> Fmr1
#> Snrpn
#> Pcbp3
#> Trmt1
#> Supt6h
#> Cugbp2
#> Sf3a1SCSES requires a json-based configuration file to set all parameters in the algorithm. Here is a demo config file of the configure file. For a detailed explanation of the configuration file, please refer to the ConfigurationGuide.txt.
SCSES provides a shiny app to help you to generate the confugre file. You can start the app by createConfigshiny function.
For non-docker users of SCSES, the full command should be:
library(SCSES)
createConfigshiny(host, port, launch.browser=FALSE) You should set the following parameters:
-
host: the server's IP address, for local access, you can set as "localhost" or "127.0.0.1" -
port: The TCP port that the application should listen on. If the port is not specified, and the shiny.port option is set (with options(shiny.port = XX)), then that port will be used. Otherwise, use a random port between 3000:8000, excluding ports that are blocked by Google Chrome for being considered unsafe: 3659, 4045, 5060, 5061, 6000, 6566, 6665:6669 and 6697. Up to twenty random ports will be tried. -
launch.borwser: if launch the app in the default web browser automatically, default is FALSE. Setting launch.browser = TRUE may cause errors in headless environments (servers without GUI) or when no default browser is configured
After running createConfigshiny, you will see a URL appear in
the console. Copy this URL and paste it into your web browser to access
the application.
For docker users of SCSES, you can use the following command, and the web page will be opened automatically:
library(SCSES)
createConfigshiny(host = "127.0.0.1",launch.browser=TRUE) The web page allows you to specify parameters used in SCSES, such as the BAM file path, the working directory, etc. The hint of each parameter can be found by hovering the mouse over the widget.
After setting all parameters, you can click Create Config button and a json file will be generated in the work_path you provided if all paramters are set correctly.
Note: The test dataset in this Tutorial includes a limited number of cells and events to ensure faster completation of the Tutorial.
Therefore, the default parameters in createConfigshiny are not suitable. Please use the createDemoConfigshiny function instead,
which provides default values optimized for the test dataset.
For non-docker users:
# For non-docker users
library(SCSES)
createDemoConfigshiny(host = "localhost", launch.browser=FALSE) For docker users:
# For docker users
library(SCSES)
createDemoConfigshiny(host = "127.0.0.1", launch.browser=TRUE) The test dataset includes BAM files from 15 cells across three cell lines (HCT116, HepG2, and HL-60), with five cells per cell type, which are sequenced by SMART-Seq2. It also contains all essential input files required to run the SCSES package.
The test dataset is available at https://doi.org/10.5281/zenodo.15688700. The complete list of downloadable files is provided in the table below.
| File Type | File Name | Description | File counts |
|---|---|---|---|
| BAM | *.bam |
BAM files for three cell lines, each containing five cells | 15 |
| BAI | *.bam.bai |
BAM index files | 15 |
| TXT | annotation.txt |
Cell identities | 1 |
| FASTA | test.fa |
Reference genome sequence | 1 |
| FAI | test.fa.fai |
Reference genome sequence index | 1 |
| Annotation | test.gtf |
Gene annotation file | 1 |
| Annotation | test.gff3 |
Gene annotation file | 1 |
| TXT | human_rbp.txt |
RNA-binding proteins list | 1 |
| PhastCons | test_phastCons.bw |
Conservation scores | 1 |
| JSON | cell_line.json |
Parameters config file for SCSES | 1 |
Note: After download, please move all BAM files and their index to an empty directory (such as bam), and move other files into another directory (such as refgenome).
# Example:
ls /disk/share/lvxuan/SCSES_test/bam/
#> SRR11826368.bam
#> SRR11826368.bam.bai
#> SRR11826371.bam
#> SRR11826371.bam.bai
#> SRR11826409.bam
#> SRR11826409.bam.bai
#> SRR11826436.bam
#> SRR11826436.bam.bai
#> SRR11826437.bam
#> SRR11826437.bam.bai
#> SRR11826440.bam
#> SRR11826440.bam.bai
#> SRR11826455.bam
#> SRR11826455.bam.bai
#> SRR11826458.bam
#> SRR11826458.bam.bai
#> SRR11826464.bam
#> SRR11826464.bam.bai
#> SRR11826470.bam
#> SRR11826470.bam.bai
#> SRR1275148.bam
#> SRR1275148.bam.bai
#> SRR1275150.bam
#> SRR1275150.bam.bai
#> SRR1275227.bam
#> SRR1275227.bam.bai
#> SRR1275305.bam
#> SRR1275305.bam.bai
#> SRR1275317.bam
#> SRR1275317.bam.bai
ls /disk/share/lvxuan/SCSES_test/refgenome/
#> annotation.txt
#> cell_line.json
#> human_rbp.txt
#> test.fa
#> test.fa.fai
#> test.gff3
#> test.gtf
#> test_phastCons.bwYou can create the configuration file using the Shiny app. To start the app, run createConfigshiny function. Details can be found in here
Note1: The test dataset includes a limited number of cells and chromosomes to ensure faster completation of the Tutorial.
Therefore, the default parameters in createConfigshiny are not suitable. Please use createDemoConfigshiny function instead,
which provides default values optimized for the test dataset.
Note2: An example of parameter configuration for the test dataset is also included in the downloaded files as cell_line.json. However, users should modify the file and program paths according to their own system environments.
Set the configure file path, and read the configure file.
## Loading packages
library(SCSES)
#> Loading required package: BiocManager
#> Bioconductor version '3.18' is out-of-date; the current release version '3.21'
#> is available with R version '4.5'; see https://bioconductor.org/install
#> Warning: replacing previous import 'hdf5r::h5version' by 'rhdf5::h5version'
#> when loading 'SCSES'
#> Warning: replacing previous import 'hdf5r::h5const' by 'rhdf5::h5const' when
#> loading 'SCSES'
#> Warning: replacing previous import 'fs::path' by 'rtracklayer::path' when
#> loading 'SCSES'
#> Warning: replacing previous import 'hdf5r::values' by 'rtracklayer::values'
#> when loading 'SCSES'
# configure file path
## user-defined configure file
config_file <- "/path/to/your/config/file"
# Load configuration file
paras <- readSCSESconfig(config_file)
# Verify configuration (An example output in our environment based on "cell_line.json" file in downloaded demo dataset)
cat("Dataset:", paras$DataSet, "\n")
#> Dataset: cell_line
cat("BAM path:", paras$Basic$bam_path, "\n")
#> BAM path: /disk/share/lvxuan/SCSES_test/bam/
# all outputs will be saved in work_path
cat("Work path:", paras$Basic$work_path, "\n")
#> Work path: /disk/share/lvxuan/SCSES_test/The TPM matrix of gene expression can be obtained by different methods. We used featureCounts to generate this matrix.
You can use getGeneExpression to run featureCounts, which will save
featureCounts output to work_path/expr/ and getEXPmatrix to generate
TPM matrix, which will save gene expression count and TPM matrix to
work_path/rds/.
# An example output in our environment based on "cell_line.json" file in downloaded demo dataset
cat("Quantifying gene expression...\n")
#> Quantifying gene expression...
featurecounts.path = getGeneExpression(paras)
#> [1] "[2025-06-21 10:33:07.476851] Detect gene expression: bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/shell/run_featurecounts.sh /disk/share/lvxuan/SCSES_test//expr/ /disk/share/lvxuan/SCSES_test/refgenome/test.fa /disk/share/lvxuan/SCSES_test/refgenome/test.gtf /disk/share/lvxuan/SCSES_test/bam/ 20 cell_line paired /disk/software/subread-2.0.6-source/bin/featureCounts >> /disk/share/lvxuan/SCSES_test//runfeatureCounts.log 2>&1"
#> [1] "[2025-06-21 10:33:18.165756] Detect gene expression Finish."
rds.path = getEXPmatrix(paras)
# Load and examine results
print(rds.path)
#> [1] "/disk/share/lvxuan/SCSES_test//rds/"
list.files(rds.path)
#> [1] "count_norm.rds" "count.rds" "event.rds" "psi.rds"
#> [5] "rc.rds"
tpm = readRDS(paste0(rds.path,'/count_norm.rds'))
tpm[1:5,1:5]
#> SRR11826368.bam SRR11826371.bam SRR11826409.bam SRR11826436.bam
#> DDX11L1 0.000000 0.000000 0.000000 0
#> WASH7P 3.402929 3.815272 5.394269 0
#> MIR1302-11 0.000000 0.000000 0.000000 0
#> FAM138A 0.000000 0.000000 0.000000 0
#> OR4G4P 0.000000 0.000000 0.000000 0
#> SRR11826437.bam
#> DDX11L1 0
#> WASH7P 0
#> MIR1302-11 0
#> FAM138A 0
#> OR4G4P 0After this step, directories expr and rds will be created.
For UMI-based dataset, SCSES provides get10XEXPmatrix function to generate Normalized UMI count matrix
from 10X CellRanger hdf5 file, which will save normalized UMI count to
work_path/rds/.
# Seurat package is required
rds.path <- get10XEXPmatrix(paras, expr_path)To define a global set of all splicing events, SCSES firstly merges all bam files from every single cell to construct a pseudo-bulk bam file, and identifies all types of splicing events by conventional algorithms.
# Create pseudobulk for event detection
# An example output in our environment based on "cell_line.json" file in downloaded demo dataset
pseudobulk.path = createPseudobulk(paras)
#> [1] "Input: /disk/share/lvxuan/SCSES_test/bam/"
#> [1] "Output: /disk/share/lvxuan/SCSES_test//data/"
#> [1] "[2025-06-21 10:33:18.522907] Creating Pseudobulk directory..."
#> Warning in dir.create(path = pseudobulk.path, recursive = T):
#> '/disk/share/lvxuan/SCSES_test//data' already exists
#> [1] "Pseudobulk bam file exists."
#> [1] "[2025-06-21 10:33:18.526561] Merge Bam Files: /disk/software/samtools/bin/samtools merge -f -@ 20 /disk/share/lvxuan/SCSES_test//data//all.bam /disk/share/lvxuan/SCSES_test/bam//*.bam --no-PG"
#> [1] "[2025-06-21 10:33:22.380382] Merge Bam Files Finish."
#> [1] "[2025-06-21 10:33:22.381074] Bam File Index: /disk/software/samtools/bin/samtools index -@ 20 /disk/share/lvxuan/SCSES_test//data//all.bam"
#> [1] "[2025-06-21 10:33:23.04892] Bam File Index Finish."
print(pseudobulk.path)
#> [1] "/disk/share/lvxuan/SCSES_test//data/"
list.files(pseudobulk.path)
#> [1] "all.bam" "all.bam.bai"
# Detect all AS events
#if you meet:
#irfinder: error while loading shared libraries: libboost_iostreams.so.1.71.0: cannot open shared object file: No such file or directory
#libboost_iostreams.so.1.71.0 exists in /disk/lvxuan/lib
old.ld=Sys.getenv("LD_LIBRARY_PATH")
Sys.setenv(LD_LIBRARY_PATH = paste0("/disk/lvxuan/lib:", old.ld))
event.path = detectEvents(paras)
#> [1] "Checking cells..."
#> [1] "Checking events..."
#> [1] "event_type=SE;RI;A3SS;A5SS;MXE checked"
#> [1] "Output: /disk/share/lvxuan/SCSES_test//events/"
#> [1] "[2025-06-21 10:33:23.060435] Creating events directory..."
#> [1] "[2025-06-21 10:33:23.063627] MAJIQ..."
#> [1] "[2025-06-21 10:33:23.065012] Run MAJIQ: bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/shell/run_majiq.sh /disk/share/lvxuan/SCSES_test//events//majiq/ /disk/share/lvxuan/SCSES_test//data/ /disk/share/lvxuan/SCSES_test/refgenome/test.gff3 20 hg19 25 MAJIQ /disk/lvxuan/software/miniconda/bin /home/Liulab/wenx/majiq_license_academic_official.lic >> /disk/share/lvxuan/SCSES_test//events//detectEvents.log 2>&1"
#> [1] "[2025-06-21 10:34:22.754712] Run MAJIQ Finish."
#> [1] "[2025-06-21 10:34:22.756686] IRFinder..."
#> [1] "[2025-06-21 10:34:22.761278] Run IRFinder: bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/shell/run_irfinder.sh /disk/share/lvxuan/SCSES_test//events//IRFinder/ /disk/share/lvxuan/SCSES_test/refgenome/test.fa /disk/share/lvxuan/SCSES_test/refgenome/test.gtf /disk/share/lvxuan/SCSES_test//data/ 20 100 /disk/lvxuan/software/IRFinder-2.0.1/bin/IRFinder /disk/software/samtools/bin/samtools /usr/local/bin/STAR >> /disk/share/lvxuan/SCSES_test//events//detectEvents.log 2>&1"
#> [1] "/disk/share/lvxuan/SCSES_test//events//IRFinder//all/IRFinder-IR-nondir.txt"
#> [1] "[2025-06-21 10:42:43.185548] Run IRFinder Finish."
#> [1] "[2025-06-21 10:42:43.186025] rMats..."
#> [1] "[2025-06-21 10:42:43.188481] Run rMats: bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/shell/run_rmats.sh /disk/share/lvxuan/SCSES_test//events//rMats/ /disk/share/lvxuan/SCSES_test//data/ /disk/share/lvxuan/SCSES_test/refgenome/test.gtf paired 100 20 /disk/lvxuan/software/SCSES_test/rmats_turbo_v4_3_0/rmats.py /disk/lvxuan/software/miniconda/envs/SCSES_test/bin/python3.11 >> /disk/share/lvxuan/SCSES_test//events//detectEvents.log 2>&1"
#> [1] "[2025-06-21 10:43:07.965503] Run rMats Finish."
#> [1] "[2025-06-21 10:43:07.967128] Generating SE event id"
#> arguments 'show.output.on.console', 'minimized' and 'invisible' are for Windows only
#> [1] "[2025-06-21 10:43:08.183085] Generating SE event id Finish."
#> [1] "[2025-06-21 10:43:08.184156] Generating RI event id"
#> arguments 'show.output.on.console', 'minimized' and 'invisible' are for Windows only
#> [1] "[2025-06-21 10:43:14.956387] Generating RI event id Finish."
#> [1] "[2025-06-21 10:43:14.957422] Generating A3SS event id"
#> [1] "[2025-06-21 10:43:15.095208] Generating A3SS event id Finish."
#> [1] "[2025-06-21 10:43:15.096094] Generating A5SS event id"
#> [1] "[2025-06-21 10:43:15.20304] Generating A5SS event id Finish."
#> [1] "[2025-06-21 10:43:15.203696] Generating MXE event id"
#> [1] "[2025-06-21 10:43:15.224238] Generating MXE event id Finish."
#> [1] "Total splicing event: SE=1685 RI=187 A3SS=88 A5SS=69 MXE=104"
# Summary of detected events
print(event.path)
#> [1] "/disk/share/lvxuan/SCSES_test//events/"
event_files <- list.files(event.path, pattern = "*.txt")
for(file in event_files) {
events <- readLines(file.path(event.path, file))
cat(gsub(".txt", "", file), "events:", length(events), "\n")
}
#> A3SS events: 85
#> A5SS events: 69
#> MXE events: 104
#> RI events: 187
#> SE events: 1685Different types of splicing events will be saved to work_path/events/,
separately.
According to splicing events detected in the previous step, SCSES then quantify raw reads associated with these splicing events in each cell, and construct the raw read count matrix and calculate the raw PSI matrix.
The definition of PSI of different AS events: 
# An example output in our environment based on "cell_line.json" file in downloaded demo dataset
# Raw read counts and PSI calculation
rawrc.path = getRawRC(paras)
#> [1] "Checking events..."
#> [1] "event_type=A3SS;A5SS;MXE;RI;SE checked"
#> [1] "Checking cells..."
#> [1] "15 cells are considered."
#> [1] "Output: /disk/share/lvxuan/SCSES_test//splicing_value/"
#> [1] "Splicing event types: A3SS A5SS MXE RI SE"
#> [1] "[2025-06-21 10:43:15.26883] Counting reads of A3SS events..."
#> Warning in dir.create(outpath_per_cell):
#> '/disk/share/lvxuan/SCSES_test//splicing_value//A3SS_rjm' already exists
#> [1] "Reading RJC File Progress: 0%"
#> [1] "Reading RJC File Progress: 10%"
#> [1] "Reading RJC File Progress: 20%"
#> [1] "Reading RJC File Progress: 30%"
#> [1] "Reading RJC File Progress: 40%"
#> [1] "Reading RJC File Progress: 50%"
#> [1] "Reading RJC File Progress: 60%"
#> [1] "Reading RJC File Progress: 70%"
#> [1] "Reading RJC File Progress: 80%"
#> [1] "Reading RJC File Progress: 90%"
#> [1] "Reading RJC File Progress: 100%"
#> [1] "[2025-06-21 10:43:55.124573] Counting reads of A3SS events Finish."
#> [1] "[2025-06-21 10:43:55.124884] Counting reads of A5SS events..."
#> Warning in dir.create(outpath_per_cell):
#> '/disk/share/lvxuan/SCSES_test//splicing_value//A5SS_rjm' already exists
#> [1] "Reading RJC File Progress: 0%"
#> [1] "Reading RJC File Progress: 10%"
#> [1] "Reading RJC File Progress: 20%"
#> [1] "Reading RJC File Progress: 30%"
#> [1] "Reading RJC File Progress: 40%"
#> [1] "Reading RJC File Progress: 50%"
#> [1] "Reading RJC File Progress: 60%"
#> [1] "Reading RJC File Progress: 70%"
#> [1] "Reading RJC File Progress: 80%"
#> [1] "Reading RJC File Progress: 90%"
#> [1] "Reading RJC File Progress: 100%"
#> [1] "[2025-06-21 10:44:19.834326] Counting reads of A5SS events Finish."
#> [1] "[2025-06-21 10:44:19.834663] Counting reads of MXE events..."
#> Warning in dir.create(outpath_per_cell):
#> '/disk/share/lvxuan/SCSES_test//splicing_value//MXE_rjm' already exists
#> [1] "Reading RJC File Progress: 0%"
#> [1] "Reading RJC File Progress: 10%"
#> [1] "Reading RJC File Progress: 20%"
#> [1] "Reading RJC File Progress: 30%"
#> [1] "Reading RJC File Progress: 40%"
#> [1] "Reading RJC File Progress: 50%"
#> [1] "Reading RJC File Progress: 60%"
#> [1] "Reading RJC File Progress: 70%"
#> [1] "Reading RJC File Progress: 80%"
#> [1] "Reading RJC File Progress: 90%"
#> [1] "Reading RJC File Progress: 100%"
#> [1] "[2025-06-21 10:44:52.104853] Counting reads of MXE events Finish."
#> [1] "[2025-06-21 10:44:52.105268] Counting reads of RI events..."
#> Warning in dir.create(outpath_per_cell):
#> '/disk/share/lvxuan/SCSES_test//splicing_value//RI_rjm' already exists
#> [1] "Reading RJC File Progress: 0%"
#> [1] "Reading RJC File Progress: 10%"
#> [1] "Reading RJC File Progress: 10%"
#> [1] "Reading RJC File Progress: 20%"
#> [1] "Reading RJC File Progress: 30%"
#> [1] "Reading RJC File Progress: 30%"
#> [1] "Reading RJC File Progress: 40%"
#> [1] "Reading RJC File Progress: 50%"
#> [1] "Reading RJC File Progress: 50%"
#> [1] "Reading RJC File Progress: 60%"
#> [1] "Reading RJC File Progress: 70%"
#> [1] "Reading RJC File Progress: 70%"
#> [1] "Reading RJC File Progress: 80%"
#> [1] "Reading RJC File Progress: 90%"
#> [1] "Reading RJC File Progress: 90%"
#> [1] "Reading RJC File Progress:100%"
#> [1] "[2025-06-21 10:45:26.836598] Counting reads of RI events Finish."
#> [1] "[2025-06-21 10:45:26.837156] Counting reads of SE events..."
#> Warning in dir.create(outpath_per_cell):
#> '/disk/share/lvxuan/SCSES_test//splicing_value//SE_rjm' already exists
#> [1] "Reading RJC File Progress: 0%"
#> [1] "Reading RJC File Progress: 10%"
#> [1] "Reading RJC File Progress: 20%"
#> [1] "Reading RJC File Progress: 30%"
#> [1] "Reading RJC File Progress: 40%"
#> [1] "Reading RJC File Progress: 50%"
#> [1] "Reading RJC File Progress: 60%"
#> [1] "Reading RJC File Progress: 70%"
#> [1] "Reading RJC File Progress: 80%"
#> [1] "Reading RJC File Progress: 90%"
#> [1] "Reading RJC File Progress: 100%"
#> [1] "[2025-06-21 10:46:12.432916] Counting reads of SE events Finish."
rawpsi.path = getRawPSI(paras)
#> [1] "Checking raw reads..."
#> [1] "Checking events..."
#> [1] "event_type=A3SS;A5SS;MXE;RI;SE checked"
#> [1] "[2025-06-21 10:46:12.4353] Calculating PSI value of A3SS events..."
#> [1] "[2025-06-21 10:46:12.439237] Calculating PSI value of A3SS events Finish."
#> [1] "[2025-06-21 10:46:12.439812] Calculating PSI value of A5SS events..."
#> [1] "[2025-06-21 10:46:12.442417] Calculating PSI value of A5SS events Finish."
#> [1] "[2025-06-21 10:46:12.44266] Calculating PSI value of MXE events..."
#> [1] "[2025-06-21 10:46:12.448437] Calculating PSI value of MXE events Finish."
#> [1] "[2025-06-21 10:46:12.448707] Calculating PSI value of RI events..."
#> [1] "[2025-06-21 10:46:12.453605] Calculating PSI value of RI events Finish."
#> [1] "[2025-06-21 10:46:12.453859] Calculating PSI value of SE events..."
#> [1] "[2025-06-21 10:46:12.500773] Calculating PSI value of SE events Finish."
# Merge and preprocess data
rawrds.path = mergeSplicingValue(paras)
processed.data.path = preprocessEvent(paras)
#> [1] "[2025-06-21 10:46:12.681954] Processing raw data..."
#> [1] "Input: /disk/share/lvxuan/SCSES_test//rds/"
#> [1] "Output: /disk/share/lvxuan/SCSES_test//rds_processed/"
#> [1] "Before filtering"
#> [1] "expr: 5152*15"
#> [1] "psi: 2130*15"
#> [1] "rc: 4859*15"
#> [1] "After filtering"
#> [1] "expr: 1884*15"
#> [1] "psi: 786*15"
#> [1] "rc: 1895*15"
#> [1] "[2025-06-21 10:46:13.195329] Successfully processed data."
print(rawrds.path)
#> [1] "/disk/share/lvxuan/SCSES_test//rds/"
print(processed.data.path)
#> [1] "/disk/share/lvxuan/SCSES_test//rds_processed/"
# Examine processed data
psi_processed <- readRDS(file.path(processed.data.path, "psi.rds"))
rc_processed <- readRDS(file.path(processed.data.path, "rc.rds"))
cat("Processed PSI matrix:", dim(psi_processed), "\n")
#> Processed PSI matrix: 786 15
cat("Processed RC matrix:", dim(rc_processed), "\n")
#> Processed RC matrix: 1895 15Raw read count matrix, PSI matrix, and event annotation will be saved to
work_path/rds/.
Then, data after quality control process will be saved to
work_path/rds_processed/, which will be used for subsequent
calculations.
To overcome the high dropout rate and limited read coverage of scRNA-seq techniques, SCSES constructs cell similarity and event similarity networks by K-nearest neighbor algorithm (KNN) to learn information from similar cells and events.
# An example output in our environment based on "cell_line.json" file in downloaded demo dataset
# Cell similarity networks (An example output by cell_line.json file in our environment)
cellnet.path = getCellSimilarity(paras)
#> [1] "[2025-06-21 10:46:13.247895] Calculate cell similarity..."
#> [1] "Input: /disk/share/lvxuan/SCSES_test//rds_processed/"
#> [1] "Output: /disk/share/lvxuan/SCSES_test//imputation/cell_similarity/"
#> [1] "feature_num=1000 checked"
#> [1] "cell_similarity_data=EXP_RBP checked"
#> [1] "distance_method=euclidean checked"
#> [1] "alpha_cell=0.8 checked"
#> [1] "kcell_max=8 checked"
#> [1] "kcell_min=3 checked"
#> [1] "decay_cell=0.05 checked"
#> [1] "Checking data: EXP_RBP"
#> [1] "134 rbps are used to calculate cell similarity"
#> [1] "[2025-06-21 10:46:13.301823] Computing cell similarity based on EXP_RBP"
#> [1] "Calculate similarity among 15 cells."
#> [1] "The number of features is greater than the number of rows in the input data."
#> [1] "Total 134 features will be used"
#> Delta: [0.19279533 0.03687512]
#> [1] "[2025-06-21 10:46:43.595692] Calculate cell similarity Finish."
# Event similarity networks
eventnet.path = getEventSimilarity(paras)
#> [1] "[2025-06-21 10:46:43.602822] Calculate events KNN..."
#> [1] "Input: /disk/share/lvxuan/SCSES_test//rds_processed/"
#> [1] "Output: /disk/share/lvxuan/SCSES_test//imputation/event_similarity/"
#> [1] "alpha_event=0.8 checked"
#> [1] "kevent=5 checked"
#> [1] "decay_event=0.05 checked"
#> [1] "Checking data..."
#> [1] "Checking events..."
#> [1] "event_type=A3SS;A5SS;MXE;RI;SE checked"
#> [1] "[2025-06-21 10:47:14.606468] Calculate events feature..."
#> [1] "[2025-06-21 10:47:14.607387] step1 Creating BSgenome for hg19======="
#> Warning: previous export ''hg19'' is being replaced
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#> table, tapply, union, unique, unsplit, which.max, which.min
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#>
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#>
#> strsplit
#>
#> Attaching package: 'rtracklayer'
#> The following object is masked from 'package:BiocIO':
#>
#> FileForFormat
#> [1] "[2025-06-21 10:47:15.159689] step2 Extracting features ======="
#> [1] "[2025-06-21 10:47:15.160183] Extracting A3SS features..."
#> [1] "[2025-06-21 10:47:15.160415] Loading events..."
#> [1] "[2025-06-21 10:47:40.906637] Parsing events region..."
#> [1] "[2025-06-21 10:48:33.698315] Extracting length features."
#> [1] "[2025-06-21 10:48:33.81657] Extracting motif features."
#> [1] "[2025-06-21 10:48:33.831861] Extracting conservation features."
#> [1] "Checking chromosome prefix..."
#> [1] "[2025-06-21 10:48:35.657475] Extracting kmer features."
#> [1] "[2025-06-21 10:48:35.700645] Extracting A Ratio features."
#> [1] "[2025-06-21 10:49:32.600886] Saving Result"
#> [1] "[2025-06-21 10:49:32.658264] Extracting A3SS features Finished"
#> [1] "[2025-06-21 10:49:32.659152] Extracting A5SS features..."
#> [1] "[2025-06-21 10:49:32.659365] Loading events..."
#> [1] "[2025-06-21 10:49:59.557972] Parsing events region..."
#> [1] "[2025-06-21 10:51:01.632601] Extracting length features."
#> [1] "[2025-06-21 10:51:01.709626] Extracting motif features."
#> [1] "[2025-06-21 10:51:01.727007] Extracting conservation features."
#> [1] "Checking chromosome prefix..."
#> [1] "[2025-06-21 10:51:03.063393] Extracting kmer features."
#> [1] "[2025-06-21 10:51:03.078054] Extracting A Ratio features."
#> [1] "[2025-06-21 10:51:59.009568] Saving Result"
#> [1] "[2025-06-21 10:51:59.055757] Extracting A5SS features Finished"
#> [1] "[2025-06-21 10:51:59.056795] Extracting MXE features..."
#> [1] "[2025-06-21 10:51:59.057226] Loading events..."
#> [1] "[2025-06-21 10:52:27.289582] Parsing events region..."
#> [1] "[2025-06-21 10:53:25.824466] Extracting length features."
#> [1] "[2025-06-21 10:53:25.886165] Extracting motif features."
#> [1] "[2025-06-21 10:53:25.919215] Extracting conservation features."
#> [1] "Checking chromosome prefix..."
#> [1] "[2025-06-21 10:53:27.103045] Extracting kmer features."
#> [1] "[2025-06-21 10:53:27.119756] Extracting A Ratio features."
#> [1] "[2025-06-21 10:53:27.119961] Saving Result"
#> [1] "[2025-06-21 10:53:27.125623] Extracting MXE features Finished"
#> [1] "[2025-06-21 10:53:27.125858] Extracting RI features..."
#> [1] "[2025-06-21 10:53:27.126015] Loading events..."
#> [1] "[2025-06-21 10:53:45.861901] Parsing events region..."
#> [1] "[2025-06-21 10:54:46.933083] Extracting length features."
#> [1] "[2025-06-21 10:54:47.164056] Extracting motif features."
#> [1] "[2025-06-21 10:54:47.174732] Extracting conservation features."
#> [1] "Checking chromosome prefix..."
#> [1] "[2025-06-21 10:54:50.04962] Extracting kmer features."
#> [1] "[2025-06-21 10:54:50.075939] Extracting A Ratio features."
#> [1] "[2025-06-21 10:55:16.944149] Saving Result"
#> [1] "[2025-06-21 10:55:17.063064] Extracting RI features Finished"
#> [1] "[2025-06-21 10:55:17.063729] Extracting SE features..."
#> [1] "[2025-06-21 10:55:17.063964] Loading events..."
#> [1] "[2025-06-21 10:55:45.027923] Parsing events region..."
#> [1] "[2025-06-21 10:57:02.202212] Extracting length features."
#> [1] "[2025-06-21 10:57:03.43174] Extracting motif features."
#> [1] "[2025-06-21 10:57:03.460047] Extracting conservation features."
#> [1] "Checking chromosome prefix..."
#> [1] "[2025-06-21 10:57:13.119067] Extracting kmer features."
#> [1] "[2025-06-21 10:57:13.157136] Extracting A Ratio features."
#> [1] "[2025-06-21 10:57:13.157481] Saving Result"
#> [1] "[2025-06-21 10:57:13.430198] Extracting SE features Finished"
#> [1] "[2025-06-21 10:57:13.430578] step3 Combining events feature ======="
#> [1] "[2025-06-21 10:57:13.430785] Parsing A3SS features..."
#> [1] "[2025-06-21 10:57:13.541589] Parsing A3SS features Finished"
#> [1] "[2025-06-21 10:57:13.542378] Parsing A5SS features..."
#> [1] "[2025-06-21 10:57:13.613811] Parsing A5SS features Finished"
#> [1] "[2025-06-21 10:57:13.614898] Parsing MXE features..."
#> [1] "[2025-06-21 10:57:13.65279] Parsing MXE features Finished"
#> [1] "[2025-06-21 10:57:13.653684] Parsing RI features..."
#> [1] "[2025-06-21 10:57:13.860466] Parsing RI features Finished"
#> [1] "[2025-06-21 10:57:13.860842] Parsing SE features..."
#> [1] "[2025-06-21 10:57:14.452354] Parsing SE features Finished"
#> [1] "[2025-06-21 10:57:14.454955] step4 Encoding events feature ======="
#> [1] "[2025-06-21 10:57:14.455861] A3SS event encoding..."
#> 1/2 [==============>...............] - ETA: 0s����������������������������������������������2/2 [==============================] - 0s 2ms/step
#> [1] "[2025-06-21 10:57:25.517522] A3SS event encoding Finish."
#> [1] "[2025-06-21 10:57:25.518329] A5SS event encoding..."
#> 1/1 [==============================] - ETA: 0s����������������������������������������������1/1 [==============================] - 0s 82ms/step
#> [1] "[2025-06-21 10:57:33.311689] A5SS event encoding Finish."
#> [1] "[2025-06-21 10:57:33.312219] MXE event encoding..."
#> 1/1 [==============================] - ETA: 0s����������������������������������������������1/1 [==============================] - 0s 75ms/step
#> [1] "[2025-06-21 10:57:41.15544] MXE event encoding Finish."
#> [1] "[2025-06-21 10:57:41.156186] RI event encoding..."
#> 1/4 [======>.......................] - ETA: 0s����������������������������������������������4/4 [==============================] - 0s 2ms/step
#> [1] "[2025-06-21 10:57:52.898465] RI event encoding Finish."
#> [1] "[2025-06-21 10:57:52.899267] SE event encoding..."
#> 1/19 [>.............................] - ETA: 1s������������������������������������������������19/19 [==============================] - 0s 2ms/step
#> [1] "[2025-06-21 10:58:03.528219] SE event encoding Finish."
#> [1] "[2025-06-21 10:58:03.529057] step5 Calculate splicing regulation distance and Combine distance ======="
#> [1] "134 rbps are used to calculate splicing regulation information"
#> [1] "Save data"
#> [1] "Save data Finished"
#> [1] "[2025-06-21 10:59:26.820986] step6 Calculate combined event similarity ======="
#> [1] "[2025-06-21 10:59:29.603231] Calculate A3SS event Similarity"
#>
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#>
#> expand
#> [1] "[2025-06-21 10:59:43.621615] Calculate A3SS event Similarity Finished"
#> [1] "[2025-06-21 10:59:45.52404] Calculate A5SS event Similarity"
#> [1] "[2025-06-21 10:59:58.664551] Calculate A5SS event Similarity Finished"
#> [1] "[2025-06-21 11:00:00.582568] Calculate MXE event Similarity"
#> [1] "[2025-06-21 11:00:13.137459] Calculate MXE event Similarity Finished"
#> [1] "[2025-06-21 11:00:15.122637] Calculate RI event Similarity"
#> [1] "[2025-06-21 11:00:28.183296] Calculate RI event Similarity Finished"
#> [1] "[2025-06-21 11:00:29.952301] Calculate SE event Similarity"
#> [1] "[2025-06-21 11:00:42.973985] Calculate SE event Similarity Finished"
#> [1] "[2025-06-21 11:00:42.974376] Calculate event similarity Finished."
# Examine network structure
print(cellnet.path)
#> [1] "/disk/share/lvxuan/SCSES_test//imputation/cell_similarity/"
cell_networks <- readRDS(file.path(cellnet.path, "cell.similars.rds"))
event_networks <- readRDS(file.path(eventnet.path, "event.similars.rds"))
## three different features can be chosen to quantify cell similarity, including raw event PSI(PSI), and raw junction read counts(RC), and RBP expression(EXP_RBP)
cat("Cell similarity types:", names(cell_networks), "\n")
#> Cell similarity types: EXP_RBP
## different splicing event types
cat("Event types:", names(event_networks), "\n")
#> Event types: A3SS A5SS MXE RI SEA list of event similarity for different types of splicing events will
be saved to work_path/imputation/event_similarity/event.similars.rds;
A list of different types of cell similarity and a list of the number of
neighbors will be saved to rds file
towork_path/imputation/cell_similarity/cell.similars.rds and
work_path/imputation/cell_similarity/dyk.cell.rds
In this step, some parameters can be adjusted in config file or the function parameters directly:
For cell similarity networks, SCSES can use RBP expressions, Raw read count or Raw PSI to measure cell similarities.
| Parameter | Default | Description |
|---|---|---|
feature_num |
1000 | Number of high-variable features for PCA |
cell_similarity_data |
“EXP_RBP” | Data types for cell similarity |
distance_method |
“euclidean” | Distance metric |
alpha_cell |
0.8 | 1 - Random walk restart probability |
kcell_min |
5 | Minimum cell neighbors |
kcell_max |
50 | Maximum cell neighbors |
decay_cell |
0.05 | Convergence threshold |
For event similarity networks, Event similarities are defined by the RBP regulatory correlations and an embedding representation by integrating event sequence similarities.
| Parameter | Default | Description |
|---|---|---|
kevent |
10 | Number of event neighbors |
alpha_event |
0.8 | 1 - Random walk restart probability |
decay_event |
0.05 | Convergence threshold |
Based on these weighted similarity networks, SCSES next will use three imputation strategies to aggregate the information across similar cells or events to impute read count or PSI value
# An example output in our environment based on "cell_line.json" file in downloaded demo dataset
# Three-strategy imputation (An example output by cell_line.json file in our environment)
Imputed.data.path = ImputationAll(paras)
#> [1] "[2025-06-21 11:00:44.883927] Get imputed result using cell similarity and event similarity."
#> [1] "Checking data..."
#> [1] "rc checked"
#> [1] "psi checked"
#> [1] "event.info checked"
#> [1] "cell similarity checked"
#> [1] "event similarity checked"
#> [1] "decay=0.05 checked"
#> [1] "Checking event type"
#> [1] "event_type=A3SS;A5SS;MXE;RI;SE checked"
#> [1] "Checking cell similarity type"
#> [1] "cell_similarity_data=EXP_RBP checked"
#> Warning in dir.create(output_path): '/disk/share/lvxuan/SCSES_test//imputation'
#> already exists
#> [1] "Output: /disk/share/lvxuan/SCSES_test//imputation/"
#> [1] "[2025-06-21 11:00:46.852427] Running Event_type=A3SS;cell_similarity_feature=EXP_RBP"
#> [1] "[2025-06-21 11:00:46.857765] Save data"
#> [1] "[2025-06-21 11:00:46.875521] Save data Finished"
#> [1] "bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/matlab/scses/run_scses.sh /disk/software/matlab2022 /disk/share/lvxuan/SCSES_test//imputation//imputation_data_3243192-500000288.h5 /disk/share/lvxuan/SCSES_test//imputation//imputation_result_3243192-500000288.mat >> /disk/share/lvxuan/SCSES_test//imputation//mat_Imputation.log 2>&1"
#> [1] "[2025-06-21 11:01:10.399819] Running Event_type=A5SS;cell_similarity_feature=EXP_RBP"
#> [1] "[2025-06-21 11:01:10.402676] Save data"
#> [1] "[2025-06-21 11:01:10.421468] Save data Finished"
#> [1] "bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/matlab/scses/run_scses.sh /disk/software/matlab2022 /disk/share/lvxuan/SCSES_test//imputation//imputation_data_3243192-500004466.h5 /disk/share/lvxuan/SCSES_test//imputation//imputation_result_3243192-500004466.mat >> /disk/share/lvxuan/SCSES_test//imputation//mat_Imputation.log 2>&1"
#> [1] "[2025-06-21 11:01:27.845692] Running Event_type=MXE;cell_similarity_feature=EXP_RBP"
#> [1] "[2025-06-21 11:01:27.846703] Save data"
#> [1] "[2025-06-21 11:01:27.860995] Save data Finished"
#> [1] "bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/matlab/scses/run_scses.sh /disk/software/matlab2022 /disk/share/lvxuan/SCSES_test//imputation//imputation_data_3243192-499996103.h5 /disk/share/lvxuan/SCSES_test//imputation//imputation_result_3243192-499996103.mat >> /disk/share/lvxuan/SCSES_test//imputation//mat_Imputation.log 2>&1"
#> [1] "[2025-06-21 11:01:49.569937] Running Event_type=RI;cell_similarity_feature=EXP_RBP"
#> [1] "[2025-06-21 11:01:49.572466] Save data"
#> [1] "[2025-06-21 11:01:49.591659] Save data Finished"
#> [1] "bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/matlab/scses/run_scses.sh /disk/software/matlab2022 /disk/share/lvxuan/SCSES_test//imputation//imputation_data_3243192-500002224.h5 /disk/share/lvxuan/SCSES_test//imputation//imputation_result_3243192-500002224.mat >> /disk/share/lvxuan/SCSES_test//imputation//mat_Imputation.log 2>&1"
#> [1] "[2025-06-21 11:02:12.385537] Running Event_type=SE;cell_similarity_feature=EXP_RBP"
#> [1] "[2025-06-21 11:02:12.393032] Save data"
#> [1] "[2025-06-21 11:02:12.420549] Save data Finished"
#> [1] "bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/matlab/scses/run_scses.sh /disk/software/matlab2022 /disk/share/lvxuan/SCSES_test//imputation//imputation_data_3243192-499958234.h5 /disk/share/lvxuan/SCSES_test//imputation//imputation_result_3243192-499958234.mat >> /disk/share/lvxuan/SCSES_test//imputation//mat_Imputation.log 2>&1"
#> [1] "[2025-06-21 11:02:31.272822] Get imputed result using cell similarity and event similarity Finish."
# Examine imputation results
Imputed_seperated = readRDS(Imputed.data.path)
str(Imputed_seperated,max.level=3)
#> List of 2
#> $ cell :List of 2
#> ..$ EXP_RBP_PSI: num [1:786, 1:15] 0.915 0.471 0.574 0.876 0.473 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> ..$ EXP_RBP_RC : num [1:786, 1:15] 1 0.535 0.817 0.71 1 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> $ cell_event:List of 1
#> ..$ EXP_RBP_PSI: num [1:786, 1:15] 0.544 0.6 0.543 0.53 0.551 ...
#> .. ..- attr(*, "dimnames")=List of 2
# explain:
# For example:
# cell/PSI_PSI: PSI value is used to quantify cell-cell splicing similarity. Impute raw PSI with cell similarities.
# cell/PSI_PSI: PSI value is used to quantify cell-cell splicing similarity. Impute raw inclusion and exclusion read counts with cell similarities, and then calculate the imputed PSI.
# cell_event/PSI_PSI: PSI value is used to quantify cell-cell splicing similarity. Impute raw inclusion and exclusion read counts with cell similarities, and then calculate the imputed PSI. Further impute the results using event similarities.Results of each imputation strategy will be saved
intowork_path/imputation.
We recommend different imputation strategies for four scenarios defined by the abundance of reads counts in the target cell and neighbor cells (ND, BD, TD+Info, and TD-Info). SCSES pre-trains models to predict the probability of specific scenario for each cell-event pair. Finally, SCSES calculates the PSI value using a linear combination of predictions from the four strategies, weighted by these probabilities.
# An example output in our environment based on "cell_line.json" file in downloaded demo dataset
# Final estimation with scenario classification
# rds_imputed_file: path to the list of three imputation strategies results generated in the previous step
Imputed.data.final.path = Estimation(paras,rds_imputed_file = Imputed.data.path)
#> [1] "[2025-06-21 11:02:31.352066] Combine imputed psi."
#> [1] "[2025-06-21 11:02:31.352485] Loading data..."
#> [1] "Input: /disk/share/lvxuan/SCSES_test//imputation//Imputed_seperated_499996563.rds"
#> [1] "Checking data..."
#> [1] "rc checked"
#> [1] "psi checked"
#> [1] "expr checked"
#> [1] "event checked"
#> [1] "cell similarity checked"
#> [1] "dynamic cell knn checked"
#> [1] "Pre-trained model will be used."
#> [1] "classifer checked"
#> [1] "Output: /disk/share/lvxuan/SCSES_test//imputation/"
#> [1] "Checking cell similarity type"
#> [1] "cell_similarity_data=EXP_RBP checked"
#> [1] "[2025-06-21 11:02:33.508113] Combine imputed psi Finish."
# Examine final results
Imputed_combined = readRDS(Imputed.data.final.path)
str(Imputed_combined,max.level=2)
#> List of 1
#> $ EXP_RBP: num [1:786, 1:15] 1 0.471 0.574 0.876 1 ...
#> ..- attr(*, "dimnames")=List of 2
# The finally imputed PSI were named by cell-cell splicing similarity features, including raw event PSI(PSI), and raw junction read counts(RC), and RBP expression(EXP_RBP).A list of final imputation of PSI values will be saved by serialized R
object format (.rds) in work_path/imputation/Imputed_combined*, where
* is a random number representing different execution. The .rds
file can be loaded in R environment by readRDS function.
To improve the fitness of models for a new dataset, we also provide a procedure to fine-tune the model. For this analysis, we first build a reference using a set of splicing events with conserved splicing levels in different human tissues (https://zenodo.org/records/6408906). Then we compare the splicing level in a new dataset with the reference records, and give the scenarios definition to each event-cell pair, which is used to fine-tune the pre-trained model.
The commands to perform these analyses:
# An example output in our environment based on "cell_line.json" file in downloaded demo dataset
# Final estimation with scenario classification
ftrc.path = getFtRawRC(paras)
#> [1] "Loading splicing events for classifer fine tune..."
#> [1] "Checking cells..."
#> [1] "15 cells are considered."
#> [1] "Output: /disk/share/lvxuan/SCSES_test//splicing_value_ft/"
#> [1] "[2025-06-21 11:02:33.541655] Counting reads of A3SS events..."
#> Warning in dir.create(outpath_per_cell):
#> '/disk/share/lvxuan/SCSES_test//splicing_value_ft//A3SS_rjm' already exists
#> [1] "Reading RJC File Progress: 0%"
#> [1] "Reading RJC File Progress: 10%"
#> [1] "Reading RJC File Progress: 20%"
#> [1] "Reading RJC File Progress: 30%"
#> [1] "Reading RJC File Progress: 40%"
#> [1] "Reading RJC File Progress: 50%"
#> [1] "Reading RJC File Progress: 60%"
#> [1] "Reading RJC File Progress: 70%"
#> [1] "Reading RJC File Progress: 80%"
#> [1] "Reading RJC File Progress: 90%"
#> [1] "Reading RJC File Progress: 100%"
#> [1] "[2025-06-21 11:03:03.526749] Counting reads of A3SS events Finish."
#> [1] "[2025-06-21 11:03:03.527112] Counting reads of A5SS events..."
#> Warning in dir.create(outpath_per_cell):
#> '/disk/share/lvxuan/SCSES_test//splicing_value_ft//A5SS_rjm' already exists
#> [1] "Reading RJC File Progress: 0%"
#> [1] "Reading RJC File Progress: 10%"
#> [1] "Reading RJC File Progress: 20%"
#> [1] "Reading RJC File Progress: 30%"
#> [1] "Reading RJC File Progress: 40%"
#> [1] "Reading RJC File Progress: 50%"
#> [1] "Reading RJC File Progress: 60%"
#> [1] "Reading RJC File Progress: 70%"
#> [1] "Reading RJC File Progress: 80%"
#> [1] "Reading RJC File Progress: 90%"
#> [1] "Reading RJC File Progress: 100%"
#> [1] "[2025-06-21 11:03:33.589083] Counting reads of A5SS events Finish."
#> [1] "[2025-06-21 11:03:33.589473] Counting reads of MXE events..."
#> Warning in dir.create(outpath_per_cell):
#> '/disk/share/lvxuan/SCSES_test//splicing_value_ft//MXE_rjm' already exists
#> [1] "Reading RJC File Progress: 0%"
#> [1] "Reading RJC File Progress: 10%"
#> [1] "Reading RJC File Progress: 20%"
#> [1] "Reading RJC File Progress: 30%"
#> [1] "Reading RJC File Progress: 40%"
#> [1] "Reading RJC File Progress: 50%"
#> [1] "Reading RJC File Progress: 60%"
#> [1] "Reading RJC File Progress: 70%"
#> [1] "Reading RJC File Progress: 80%"
#> [1] "Reading RJC File Progress: 90%"
#> [1] "Reading RJC File Progress: 100%"
#> [1] "[2025-06-21 11:04:01.860919] Counting reads of MXE events Finish."
#> [1] "[2025-06-21 11:04:01.86135] Counting reads of SE events..."
#> Warning in dir.create(outpath_per_cell):
#> '/disk/share/lvxuan/SCSES_test//splicing_value_ft//SE_rjm' already exists
#> [1] "Reading RJC File Progress: 0%"
#> [1] "Reading RJC File Progress: 10%"
#> [1] "Reading RJC File Progress: 20%"
#> [1] "Reading RJC File Progress: 30%"
#> [1] "Reading RJC File Progress: 40%"
#> [1] "Reading RJC File Progress: 50%"
#> [1] "Reading RJC File Progress: 60%"
#> [1] "Reading RJC File Progress: 70%"
#> [1] "Reading RJC File Progress: 80%"
#> [1] "Reading RJC File Progress: 90%"
#> [1] "Reading RJC File Progress: 100%"
#> [1] "[2025-06-21 11:04:30.597954] Counting reads of SE events Finish."
ftpsi.path = getFtRawPSI(paras)
#> [1] "Checking raw reads..."
#> [1] "Loading splicing events for classifer fine tune..."
#> [1] "[2025-06-21 11:04:30.609568] Calculating PSI value of A3SS events..."
#> [1] "[2025-06-21 11:04:30.612852] Calculating PSI value of A3SS events Finish."
#> [1] "[2025-06-21 11:04:30.613102] Calculating PSI value of A5SS events..."
#> [1] "[2025-06-21 11:04:30.629161] Calculating PSI value of A5SS events Finish."
#> [1] "[2025-06-21 11:04:30.629678] Calculating PSI value of MXE events..."
#> [1] "[2025-06-21 11:04:30.631177] Calculating PSI value of MXE events Finish."
#> [1] "[2025-06-21 11:04:30.631451] Calculating PSI value of SE events..."
#> [1] "[2025-06-21 11:04:30.635654] Calculating PSI value of SE events Finish."
ftrds.path = mergeFtSplicingValue(paras)
ftmodel.path = FtClassifier(paras)
#> [1] "Reading true Ft PSI..."
#> [1] "Loading Pre-training classifer..."
#> [1] "[2025-06-21 11:04:30.723374] Classifer fine tune"
#> [1] "[2025-06-21 11:04:30.723963] Processing raw Ft data..."
#> [1] "Checking data..."
#> [1] "Checking cell similarity type"
#> [1] "cell_similarity_data=EXP_RBP checked"
#> [1] "Calculating Classifier Features..."
#> [1] "[2025-06-21 11:04:32.993363] Save data"
#> [1] "[2025-06-21 11:04:33.009583] Save data Finished"
#> [1] "bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/matlab/imputation1/run_imputation1.sh /disk/software/matlab2022 /disk/share/lvxuan/SCSES_test//classifer//imputation1_data_3243192-500012479.h5 /disk/share/lvxuan/SCSES_test//classifer//imputation1_result_3243192-500012479.mat >> /disk/share/lvxuan/SCSES_test//classifer//mat_calculateFtFeature.log 2>&1"
#> [1] "[2025-06-21 11:05:10.870352] Save data"
#> [1] "[2025-06-21 11:05:10.887563] Save data Finished"
#> [1] "bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/matlab/imputation1/run_imputation1.sh /disk/software/matlab2022 /disk/share/lvxuan/SCSES_test//classifer//imputation1_data_3243192-500021749.h5 /disk/share/lvxuan/SCSES_test//classifer//imputation1_result_3243192-500021749.mat >> /disk/share/lvxuan/SCSES_test//classifer//mat_calculateFtFeature.log 2>&1"
#> [1] "[2025-06-21 11:05:48.430875] Save data"
#> [1] "[2025-06-21 11:05:48.448767] Save data Finished"
#> [1] "bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/matlab/imputation1/run_imputation1.sh /disk/software/matlab2022 /disk/share/lvxuan/SCSES_test//classifer//imputation1_data_3243192-500024016.h5 /disk/share/lvxuan/SCSES_test//classifer//imputation1_result_3243192-500024016.mat >> /disk/share/lvxuan/SCSES_test//classifer//mat_calculateFtFeature.log 2>&1"
#> [1] "[2025-06-21 11:06:24.36141] Save data"
#> [1] "[2025-06-21 11:06:24.375075] Save data Finished"
#> [1] "bash /tmp/RtmpysvzUv/temp_libpath31782430f4c0d0/SCSES/matlab/imputation1/run_imputation1.sh /disk/software/matlab2022 /disk/share/lvxuan/SCSES_test//classifer//imputation1_data_3243192-499998536.h5 /disk/share/lvxuan/SCSES_test//classifer//imputation1_result_3243192-499998536.mat >> /disk/share/lvxuan/SCSES_test//classifer//mat_calculateFtFeature.log 2>&1"
#> [1] "[2025-06-21 11:07:01.843674] Model training;similarity_type=EXP_RBP"
#> [1] "[2025-06-21 11:07:03.448754] Classifer fine tune Finish."
#rds_imputed_file: path to the list of three imputation strategies results generated in the previous step
ImputedFt.data.final.path = Estimation(paras,rds_imputed_file = Imputed.data.path)
#> [1] "[2025-06-21 11:07:03.450292] Combine imputed psi."
#> [1] "[2025-06-21 11:07:03.450469] Loading data..."
#> [1] "Input: /disk/share/lvxuan/SCSES_test//imputation//Imputed_seperated_499996563.rds"
#> [1] "Checking data..."
#> [1] "rc checked"
#> [1] "psi checked"
#> [1] "expr checked"
#> [1] "event checked"
#> [1] "cell similarity checked"
#> [1] "dynamic cell knn checked"
#> [1] "Fine tune model will be used."
#> [1] "classifer checked"
#> [1] "Output: /disk/share/lvxuan/SCSES_test//imputation/"
#> [1] "Checking cell similarity type"
#> [1] "cell_similarity_data=EXP_RBP checked"
#> [1] "[2025-06-21 11:07:05.461887] Combine imputed psi Finish."
print(ImputedFt.data.final.path)
#> [1] "/disk/share/lvxuan/SCSES_test//imputation//Imputed_combined_500023390.rds"
# Examine final results
Imputed_combined = readRDS(ImputedFt.data.final.path)
str(Imputed_combined,max.level=2)
#> List of 1
#> $ EXP_RBP: num [1:786, 1:15] 1 0.471 0.574 0.876 1 ...
#> ..- attr(*, "dimnames")=List of 2
Imputed_combined[["EXP_RBP"]][1:3,1:3]
#> SRR11826368.bam
#> isoform1=exon:chr1:151139802-151139890:+@junction:chr1:151139891-151140624:+@exon:chr1:151140625-151140814:+|isoform2=exon:chr1:151139802-151139890:+@junction:chr1:151139891-151140619:+@exon:chr1:151140620-151140814:+|SCNM1|A3SS 1.0000000
#> isoform1=exon:chr1:153610771-153610924:+@junction:chr1:153610925-153614721:+@exon:chr1:153614722-153614905:+|isoform2=exon:chr1:153610771-153610924:+@junction:chr1:153610925-153614718:+@exon:chr1:153614719-153614905:+|CHTOP|A3SS 0.4709326
#> isoform1=exon:chr1:169772310-169772450:+@junction:chr1:169772451-169773252:+@exon:chr1:169773253-169773381:+|isoform2=exon:chr1:169772310-169772450:+@junction:chr1:169772451-169773215:+@exon:chr1:169773216-169773381:+|C1orf112|A3SS 0.5737308
#> SRR11826371.bam
#> isoform1=exon:chr1:151139802-151139890:+@junction:chr1:151139891-151140624:+@exon:chr1:151140625-151140814:+|isoform2=exon:chr1:151139802-151139890:+@junction:chr1:151139891-151140619:+@exon:chr1:151140620-151140814:+|SCNM1|A3SS 1.0000000
#> isoform1=exon:chr1:153610771-153610924:+@junction:chr1:153610925-153614721:+@exon:chr1:153614722-153614905:+|isoform2=exon:chr1:153610771-153610924:+@junction:chr1:153610925-153614718:+@exon:chr1:153614719-153614905:+|CHTOP|A3SS 0.4473823
#> isoform1=exon:chr1:169772310-169772450:+@junction:chr1:169772451-169773252:+@exon:chr1:169773253-169773381:+|isoform2=exon:chr1:169772310-169772450:+@junction:chr1:169772451-169773215:+@exon:chr1:169773216-169773381:+|C1orf112|A3SS 1.0000000
#> SRR11826409.bam
#> isoform1=exon:chr1:151139802-151139890:+@junction:chr1:151139891-151140624:+@exon:chr1:151140625-151140814:+|isoform2=exon:chr1:151139802-151139890:+@junction:chr1:151139891-151140619:+@exon:chr1:151140620-151140814:+|SCNM1|A3SS 1.0000000
#> isoform1=exon:chr1:153610771-153610924:+@junction:chr1:153610925-153614721:+@exon:chr1:153614722-153614905:+|isoform2=exon:chr1:153610771-153610924:+@junction:chr1:153610925-153614718:+@exon:chr1:153614719-153614905:+|CHTOP|A3SS 0.4941710
#> isoform1=exon:chr1:169772310-169772450:+@junction:chr1:169772451-169773252:+@exon:chr1:169773253-169773381:+|isoform2=exon:chr1:169772310-169772450:+@junction:chr1:169772451-169773215:+@exon:chr1:169773216-169773381:+|C1orf112|A3SS 0.8639692Here is an example of UMAP visualization based on test data:
install.packages("umap")library(umap)
library(ggplot2)
calcu_umap<-function(data,n_neighbors){
set.seed(12345)
data = data[which(apply(data, 1, mean) != 0), ]
data = data[which(apply(data, 1, var) != 0), ]
D_Reduct_res <- prcomp(t(data), center = T, scale. = T)
sdev <- D_Reduct_res$sdev
var_prop <- sdev^2 / sum(sdev^2)
cumulative_variance <- cumsum(var_prop)
n_components <- which(cumulative_variance >= 0.7)[1]
D_Reduct <- D_Reduct_res$x[, 1:n_components]
umap_res <- umap::umap(D_Reduct,n_neighbors = n_neighbors)
input <- umap_res$layout
df<-as.data.frame(input)
row.names(df)<-colnames(data)
return(df)
}
mycol=c("#fbb45d","#699ed4","#ef8183")
# Load data and annotations
if(!file.exists(paste0(system.file("analysis", package = "SCSES"),'/annotation.txt'))) {
stop("Annotation file not found")
}
if(!file.exists(ImputedFt.data.final.path)) {
stop("Imputed data file not found")
}
annotation=read.table(paste0(system.file("analysis", package = "SCSES"),'/annotation.txt'),sep="\t")
# Imputed_combined=readRDS(paste0(system.file("analysis", package = "SCSES"),'/Imputed_combined_499997048.rds'))
Imputed_combined=readRDS(ImputedFt.data.final.path)
data_umap=calcu_umap(Imputed_combined[[1]],n_neighbors = 3)
row.names(data_umap)=gsub(".bam","",row.names(data_umap))
data_umap$group=annotation$V2[match(row.names(data_umap),annotation$V1)]
p=ggplot(data = data_umap,aes(x =V1 ,y =V2))+
geom_point(aes(fill=group),shape=21,size=1.5,stroke=0.05)+
scale_fill_manual(values = mycol)+
xlab("UMAP1")+
ylab("UMAP2")
print(p)

