Skip to content

landsboy/neoADARgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

neoADARgen

Python Shell Conda NetMHCpan Genome RNAEditing TCGA

A computational pipeline for generating neo-antigens through RNA editing.

neoADARgen is a bioinformatics tool designed to identify and engineer personalized tumor-specific neoantigens (editopes) by simulating A-to-I RNA editing events on somatic mutations. The tool integrates mutation annotation, sequence extraction, RNA editing simulation, peptide generation, and MHC binding prediction via NetMHCpan 4.1.

This demo version of the tool in fact ran on three specific projects in TCGA (BRCA, GBN, SCKM), if you want to run on other projects you must download them from TCGA and put them in the testdata folder.

An accompanying repo to the paper:

Employing RNA editing to engineer personalized tumor-specific neoantigens (editopes)

Overview

The pipeline performs the following steps:

  • Mutation parsing: Reads patient-specific mutation annotations (MAF format).

  • Sequence extraction: Retrieves reference DNA sequences (±20 bp around each mutation).

  • RNA editing simulation: Applies A-to-I (read as G) edits in single or double combinations.

  • Peptide generation: Translates mutated and edited sequences into peptides of defined lengths (default: 9-mer).

  • MHC binding prediction: Uses NetMHCpan 4.1 to predict peptide–HLA binding affinities.

  • Result summarization: Outputs ranked neoantigen candidates per mutation.

Setup

Requires locally installed version of NetMHCpan4.1

And requires a local download of the human genome v.38

  1. Clone the repository
git clone https://github.com/landsboy/neo-ADARtigen.git
cd neoADARgen
  1. Create conda environment
conda env create -f neoADARgen_environment.yml
conda activate neoADARgen_env 
  1. Running the Pipeline The easiest way to run neoADARgen is by providing a configuration file (.yml) that defines all required paths and runtime parameters. for example:
paths:
  project_dir: "testdata"          # Directory containing raw patient mutation data
  results_dir: "results"           # Directory where all pipeline outputs will be saved
  sup_dir: "sup"                   # Directory with supplementary annotation files (e.g. HLA, TPM)
  netmhc_path: <path/to/your/netMHCpan4.1>
  hg38_fa: <path/to/your/hg38.fa>
  
runtime:
  edit_modes: [0, 1, 2]            # 0 = no RNA editing, 1 = single A→G editing, 2 = double editing
  mer_length: 9                    # Peptide length for NetMHCpan prediction (9-mer is the default)
  num_nuc_around_mut: 20           # Number of nucleotides to extract on each side of the mutation
  verbose: false                   # If true, enable detailed DEBUG logging
  log_file: "logs/TCGA_patients.log"   # Name of the log file saved in the results directory

Once the configuration file is ready (e.g. TCGA_config.yml), you can run the full pipeline with a single command:

python -m src.TCGA_patients.cli -c TCGA_config.yml

If you prefer, you can skip the config file and pass the parameters directly via the command line:

python -m src.TCGA_patients.cli \
  --project_dir testdata \
  --results_dir results \
  --sup_dir sup \
  --netmhc_path /path/to/netMHCpan-4.1 \
  --hg38_fa /path/to/hg38.fa \
  --verbose

Example Output

For each patient, the pipeline generates an individual results file named according to their patient ID (e.g.results/BRCA/TCGA-AC-A2FK.tsv).

Example output

Within each file, all somatic mutations located in coding regions (CDS) are analyzed under three distinct conditions:

  1. Without RNA editing (original tumor mutation)

  2. A single A→G editing event (simulating ADAR activity at one site)

  3. With double A→G editing events (simulating ADAR activity at two positions)

Each combination is processed through the NetMHCpan predictor to evaluate its HLA-binding affinity and neoantigen potential.

This allows quantifying, for every patient, how RNA editing may increase the likelihood of generating strong-binding neoantigens — revealing novel tumor-specific “editopes”.

Single-Mutation Mode

neoADARgen can be executed in a simplified single-mutation mode, where you provide:

  • One or more mutation (format: 'chr7:g.100958233G>A', comma-separated)

  • One or more HLA alleles (comma-separated)

  • (Optional) gene-counts PATH for TPM annotation

For example:

python -m src.TCGA_patients.cli \
  --results_dir results \
  --netmhc_path /path/to/netMHCpan-4.1 \
  --hg38_fa /path/to/hg38.fa \
  --mutation "chr7:g.100958233G>A" \
  --hla "HLA-C1203,HLA-B3501" \
  --sup_dir sup

Getting help

If you need help of any kind, feel free to open a new issue.

About

A tool for creating a neo-antigen by RNA editing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published