Skip to content

HoffmannJuliane/SpliceAwareGRN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents
  1. About The Project
  2. Installation
  3. Usage
  4. Important Files and Folders

About The Project

Gene regulatory networks (GRNs) play a role in understanding gene interactions and their regulatory effects. They help decode biological systems by identifying how genes interact and regulate cellular processes. However, conventional GRN inference methods operate at the gene-level, overlooking transcript-level variability introduced by Alternative Splicing.

The goal of this pipeline is to infer gene regulatory networks that incorporate transcript-level information. This enables the discovery of regulatory mechanisms affected by alternative splicing, which are not captured in traditional gene-level analyses. The inference and annotation pipeline is written in python.

Pipeline Workflow

The main pipeline script executes the following steps:

  1. Load Resources
    Reads Biomart annotations and a curated list of human transcription factors (TFs).

  2. Transcript Annotation Database
    Builds a transcript-level annotation database using APPRIS and DIGGER to support plausibility filtering.

  3. Data Preparation
    Processes expression data for both canonical (gene-level) and AS-aware (transcript-level) GRN inference.

    • transcript_tfs, gene_tfs, and targets must be Pandas DataFrames with:
      • columns = samples
      • rows = Ensembl gene or transcript IDs
  4. GRN Inference
    Infers GRNs using isoform-based and gene-based inputs. Multiple runs are aggregated for robustness.

  5. Isoform Categorization
    Classifies isoforms (e.g., dominant, balanced, non-dominant) based on their expression compared to other isoforms of the same gene.

  6. Aggregation and Filtering
    Aggregates GRNs from multiple runs and filters edges based on:

    • Feature importance
    • Edge frequency
  7. Edge Categorization
    Compares canonical and AS-aware networks to classify edges as:

    • Common
    • Gene-exclusive
    • Isoform-exclusive
  8. Plausibility Filtering
    Filters isoform-exclusive edges based on structural and functional annotations (APPRIS, DIGGER).

(back to top)

Installation

Step 1: Clone the repo

git clone https://github.com/HoffmannJuliane/SpliceAwareGRN.git
cd SpliceAwareGRN

Step 2: Download Annotation files from APPRIS and DIGGER

  • APPRIS
wget -P data/ https://apprisws.bioinfo.cnio.es/pub/current_release/datafiles/homo_sapiens/e110v48/appris_data.appris.txt
  • DIGGER
wget -O data/digger_data.csv https://zenodo.org/records/3886642/files/domain_mapped_to_exons.csv

Step 3: Install Dependencies via Pixi Environment

  1. If not already, follow installation guide to install pixi: https://pixi.sh/latest/installation/

  2. Install environment:

pixi install
  1. Activate in shell:
pixi shell

(back to top)

Usage

Step 1: Create Config File

The config file saves all paths for the data needed for the transcript and annotation pipeline. A template of the config file is saved under /configs. With the script create_config.pyyou can create your own config based on the data you want to use.

To create the config run in shell:

python create_config.py \
  --count_data path-to-gene-counts \
  --sample_attributes path-to-sample-attributes \
  --transcript_data path-to-transcript-counts \
  --parent_directory path-to-parent-directory \
  --results_dir /data/bionets/mi34qaba/SpliceAwareGRN/results/ \
  --nruns int
  --tissue tissue_name 

Step 2: In a Python Script:

Load Config File

import yaml

with open(path-to-config-file, 'r') as f:
  config = yaml.safe_load(f)

Step 3: Load Data

The function for the inference and annotation pipeline requires three dataframes: transcript_tfs, gene_tfs, and targets. transcript_tfs, gene_tfs, and targets must be Pandas DataFrames with:

  • columns = samples
  • rows = Ensembl gene or transcript IDs

Step 4: Run Inference and Annotation Pipeline

    from total_pipeline import *

    plausibility_filtered_df = inference_and_annotation_pipeline(config, transcript_tfs, gene_tfs, targets)
    

All networks are saved under the results_dir specified in the config file by default:

  • aggregated AS-Aware GRN: SpliceAwareGRN/results/tissue-name/grn/tissue-name_as-aware.network_aggregated.tsv
  • aggregated gene-level GRN: SpliceAwareGRN/results/tissue-name/grn/tissue-name_canonical.network_aggregated.tsv
  • plausibility filtered isoform-unique GRN: SpliceAwareGRN/results/tissue-name/grn/tissue-name_as-aware.network_plausibility_filtered_iso_unique.tsv

(back to top)

Example Pipelines

Inference with GTEX Data

If you are using GTex Data you can use the python script total_pipeline.py

For example with Liver samples run in shell:

python total_pipeline.py -f ../configs/Liver.yaml

Inference for MAGNet Data

The file magnet.pyconducts the annotation and inference pipeline for MAGNet data and the file downstream_analysis.py shows an exemplary downstream analysis with a Gene Set Enrichment Analysis via gseapy.

Important Files and Folders

  • /src/arboreto_added/: Adjusted GRNBoost for transcript-level GRN Inference

  • create_config.py: creates config yaml file needed as input for pipeline

  • total_pipeline.py: shows total inference and annotation pipeline

  • inference.py: isoform-level and gene-level GRN inference

  • aggregate_grnboost.py: functions to aggregate repeated GRNs to one consenus network

  • downstream_analysis.py: Gene set enrichment analysis for plausibility filtered isoform-unique GRN based on gseapy

  • load_data.py: functions to load GTEx expression data

  • utils_database.py: helper functions for transcript annotation pipeline

  • utils_network.py: helper functions for inference pipeline and GRN processing

(back to top)

Outline

workflow

APPRIS Logo source: APPRIS Github, accessed 22/06/25.
DIGGER Logo source: DIGGER Website, accessed 22/06/25.
TRIFID Logo source: TRIFID GitHub, accessed 22/06/25.

(back to top)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages