Table of Contents
Gene regulatory networks (GRNs) play a role in understanding gene interactions and their regulatory effects. They help decode biological systems by identifying how genes interact and regulate cellular processes. However, conventional GRN inference methods operate at the gene-level, overlooking transcript-level variability introduced by Alternative Splicing.
The goal of this pipeline is to infer gene regulatory networks that incorporate transcript-level information. This enables the discovery of regulatory mechanisms affected by alternative splicing, which are not captured in traditional gene-level analyses. The inference and annotation pipeline is written in python.
The main pipeline script executes the following steps:
-
Load Resources
Reads Biomart annotations and a curated list of human transcription factors (TFs). -
Transcript Annotation Database
Builds a transcript-level annotation database using APPRIS and DIGGER to support plausibility filtering. -
Data Preparation
Processes expression data for both canonical (gene-level) and AS-aware (transcript-level) GRN inference.transcript_tfs,gene_tfs, andtargetsmust be Pandas DataFrames with:- columns = samples
- rows = Ensembl gene or transcript IDs
-
GRN Inference
Infers GRNs using isoform-based and gene-based inputs. Multiple runs are aggregated for robustness. -
Isoform Categorization
Classifies isoforms (e.g., dominant, balanced, non-dominant) based on their expression compared to other isoforms of the same gene. -
Aggregation and Filtering
Aggregates GRNs from multiple runs and filters edges based on:- Feature importance
- Edge frequency
-
Edge Categorization
Compares canonical and AS-aware networks to classify edges as:- Common
- Gene-exclusive
- Isoform-exclusive
-
Plausibility Filtering
Filters isoform-exclusive edges based on structural and functional annotations (APPRIS, DIGGER).
git clone https://github.com/HoffmannJuliane/SpliceAwareGRN.git
cd SpliceAwareGRN- APPRIS
wget -P data/ https://apprisws.bioinfo.cnio.es/pub/current_release/datafiles/homo_sapiens/e110v48/appris_data.appris.txt- DIGGER
wget -O data/digger_data.csv https://zenodo.org/records/3886642/files/domain_mapped_to_exons.csv
-
If not already, follow installation guide to install pixi: https://pixi.sh/latest/installation/
-
Install environment:
pixi install
- Activate in shell:
pixi shell
The config file saves all paths for the data needed for the transcript and annotation pipeline. A template of the config file is saved under /configs. With the script create_config.pyyou can create your own config based on the data you want to use.
To create the config run in shell:
python create_config.py \
--count_data path-to-gene-counts \
--sample_attributes path-to-sample-attributes \
--transcript_data path-to-transcript-counts \
--parent_directory path-to-parent-directory \
--results_dir /data/bionets/mi34qaba/SpliceAwareGRN/results/ \
--nruns int
--tissue tissue_name
Load Config File
import yaml
with open(path-to-config-file, 'r') as f:
config = yaml.safe_load(f)The function for the inference and annotation pipeline requires three dataframes: transcript_tfs, gene_tfs, and targets.
transcript_tfs, gene_tfs, and targets must be Pandas DataFrames with:
- columns = samples
- rows = Ensembl gene or transcript IDs
from total_pipeline import *
plausibility_filtered_df = inference_and_annotation_pipeline(config, transcript_tfs, gene_tfs, targets)
All networks are saved under the results_dir specified in the config file by default:
- aggregated AS-Aware GRN: SpliceAwareGRN/results/tissue-name/grn/tissue-name_as-aware.network_aggregated.tsv
- aggregated gene-level GRN: SpliceAwareGRN/results/tissue-name/grn/tissue-name_canonical.network_aggregated.tsv
- plausibility filtered isoform-unique GRN: SpliceAwareGRN/results/tissue-name/grn/tissue-name_as-aware.network_plausibility_filtered_iso_unique.tsv
If you are using GTex Data you can use the python script total_pipeline.py
For example with Liver samples run in shell:
python total_pipeline.py -f ../configs/Liver.yaml
The file magnet.pyconducts the annotation and inference pipeline for MAGNet data and the file downstream_analysis.py shows an exemplary downstream analysis with a Gene Set Enrichment Analysis via gseapy.
-
/src/arboreto_added/: Adjusted GRNBoost for transcript-level GRN Inference -
create_config.py: creates config yaml file needed as input for pipeline -
total_pipeline.py: shows total inference and annotation pipeline -
inference.py: isoform-level and gene-level GRN inference -
aggregate_grnboost.py: functions to aggregate repeated GRNs to one consenus network -
downstream_analysis.py: Gene set enrichment analysis for plausibility filtered isoform-unique GRN based on gseapy -
load_data.py: functions to load GTEx expression data -
utils_database.py: helper functions for transcript annotation pipeline -
utils_network.py: helper functions for inference pipeline and GRN processing
APPRIS Logo source: APPRIS Github, accessed 22/06/25.
DIGGER Logo source: DIGGER Website, accessed 22/06/25.
TRIFID Logo source: TRIFID GitHub, accessed 22/06/25.
