This repository contains the analysis pipelines and scripts for the results presented in: Aging-associated alternative splicing programs conserved between human and mouse tissues (bioRxiv, 2025).
The core methodology utilizes the LeafletFA model.
- Source Code: LeafletFA Repository
- Utility Functions: LeafletFA-utils
Our study's methods includes the following steps:
- Junction Calling: Raw single-cell BAM files to Splice Junctions (Snakemake).
- Junction Processing: Filtering and merging across batches.
- ATSE Mapping: Mapping Alternative Transcript Structure Events (ATSE).
- Anndata Integration: Aligning splicing data with pre-processed total gene expression.
- Mouse Foundation Training: Training LeafletFA on the mouse dataset.
- Cross-Species Comparison: Identifying conserved splice junctions between mouse and human.
- Transfer Learning: Applying the trained mouse LeafletFA model to human data.
We utilized Snakemake pipelines to process raw data from several major single-cell atlases.
- Allen Brain:
LeafletFA_Submission2025/AllenInst/mouse_brain_dev_2021/dataprocessing/snakemake/run_snakemake.sh - Tabula Muris Senis: (Processed separately by age: 3m, 18m, 24m)
LeafletFA_Submission2025/TabulaSenis/raw_data_processing/SS2_Snakemake
- Allen Brain (Lein Cortex):
LeafletFA_Submission2025/AllenInst/lein-cortex-gru/snakemake/run_snakemake.sh - Tabula Sapiens:
LeafletFA_Submission2025/tabula_sapien/snakemake/snakemake/run_snakemake.sh
After generating junction calls, we combine single-cell data to map Alternative Transcript Structure Events (ATSE) across data sources for each species.
- Batching: Junction files are split into 100 batches for parallel processing.
- Junction Filtering: Using the
JunctionReadermodule fromLeafletFA-utils.
min_intron: 50 bpmax_intron: 500,000 bpmin_cells: 2min_reads: 10
- Final ATSE Mapping: Merged junctions must meet a global threshold of 100 reads and 10 cells.
- Annotation: Each junction is validated against a long-read based GTF, FASTA, and a GenomeDB to ensure valid splice site motifs and annotate exon-exon boundaries.
- Mouse:
LeafletFA_Submission2025/Mouse_Splicing_Foundation/ATSE_mapper - Human:
LeafletFA_Submission2025/Human_Splicing_Foundation/ATSE_mapper
Mapped ATSEs and junction counts are integrated into Anndata objects and aligned with gene expression count objects, including scVI normalized expression values and latent space representations.
Note: lncRNAs were excluded from this study.
| Species | Alignment Scripts |
|---|---|
| Mouse | LeafletFA_Submission2025/Mouse_Splicing_Foundation/GeneExpression/01_prepare_expression_data.py to 06_marker_gene_sanity_check.ipynb |
| Human | LeafletFA_Submission2025/Human_Splicing_Foundation/GeneExpression/01_load_raw_datasets.ipynb to 09_marker_gene_sanity_check.ipynb |
The LeafletFA model is initially trained on the Mouse Splicing Foundation dataset.
- Training Workflow:
LeafletFA_Submission2025/Mouse_Splicing_Foundation/model_train/MOUSE_FOUNDATION/full_workflow
After establishing the mouse foundation, we compare splicing events across species and perform transfer learning.
To obtain the list of conserved splice junctions between mouse and human:
- Script:
LeafletFA_Submission2025/Multi_Species_Splicing_Foundation/joint_model_train/00_compare_ATSEs.ipynb
We apply the learned features from the mouse model onto the human dataset using transfer learning:
- Script:
LeafletFA_Submission2025/Multi_Species_Splicing_Foundation/joint_model_train/02_run_LeafletFA_transfer.ipynb
Processed data files, including ATSE mapping outputs and Anndata objects used in this analysis, are available on Zenodo:
If you use this code or these results in your research, please cite:
Isaev, K. and Knowles, D. A. (2025). Aging-associated alternative splicing programs conserved between human and mouse tissues. bioRxiv.
If you have any questions, please direct them to karin.isaev[@]gmail.com or submit an issue.