neoADARgen

A computational pipeline for generating neo-antigens through RNA editing.

neoADARgen is a bioinformatics tool designed to identify and engineer personalized tumor-specific neoantigens (editopes) by simulating A-to-I RNA editing events on somatic mutations. The tool integrates mutation annotation, sequence extraction, RNA editing simulation, peptide generation, and MHC binding prediction via NetMHCpan 4.1.

This demo version of the tool in fact ran on three specific projects in TCGA (BRCA, GBN, SCKM), if you want to run on other projects you must download them from TCGA and put them in the testdata folder.

An accompanying repo to the paper:

Employing RNA editing to engineer personalized tumor-specific neoantigens (editopes)

Overview

The pipeline performs the following steps:

Mutation parsing: Reads patient-specific mutation annotations (MAF format).
Sequence extraction: Retrieves reference DNA sequences (±20 bp around each mutation).
RNA editing simulation: Applies A-to-I (read as G) edits in single or double combinations.
Peptide generation: Translates mutated and edited sequences into peptides of defined lengths (default: 9-mer).
MHC binding prediction: Uses NetMHCpan 4.1 to predict peptide–HLA binding affinities.
Result summarization: Outputs ranked neoantigen candidates per mutation.

Setup

Requires locally installed version of NetMHCpan4.1

And requires a local download of the human genome v.38

Clone the repository

git clone https://github.com/landsboy/neo-ADARtigen.git
cd neoADARgen

Create conda environment

conda env create -f neoADARgen_environment.yml
conda activate neoADARgen_env

Running the Pipeline The easiest way to run neoADARgen is by providing a configuration file (.yml) that defines all required paths and runtime parameters. for example:

paths:
  project_dir: "testdata"          # Directory containing raw patient mutation data
  results_dir: "results"           # Directory where all pipeline outputs will be saved
  sup_dir: "sup"                   # Directory with supplementary annotation files (e.g. HLA, TPM)
  netmhc_path: <path/to/your/netMHCpan4.1>
  hg38_fa: <path/to/your/hg38.fa>
  
runtime:
  edit_modes: [0, 1, 2]            # 0 = no RNA editing, 1 = single A→G editing, 2 = double editing
  mer_length: 9                    # Peptide length for NetMHCpan prediction (9-mer is the default)
  num_nuc_around_mut: 20           # Number of nucleotides to extract on each side of the mutation
  verbose: false                   # If true, enable detailed DEBUG logging
  log_file: "logs/TCGA_patients.log"   # Name of the log file saved in the results directory

Once the configuration file is ready (e.g. TCGA_config.yml), you can run the full pipeline with a single command:

python -m src.TCGA_patients.cli -c TCGA_config.yml

If you prefer, you can skip the config file and pass the parameters directly via the command line:

python -m src.TCGA_patients.cli \
  --project_dir testdata \
  --results_dir results \
  --sup_dir sup \
  --netmhc_path /path/to/netMHCpan-4.1 \
  --hg38_fa /path/to/hg38.fa \
  --verbose

Example Output

For each patient, the pipeline generates an individual results file named according to their patient ID (e.g.results/BRCA/TCGA-AC-A2FK.tsv).

Within each file, all somatic mutations located in coding regions (CDS) are analyzed under three distinct conditions:

Without RNA editing (original tumor mutation)
A single A→G editing event (simulating ADAR activity at one site)
With double A→G editing events (simulating ADAR activity at two positions)

Each combination is processed through the NetMHCpan predictor to evaluate its HLA-binding affinity and neoantigen potential.

This allows quantifying, for every patient, how RNA editing may increase the likelihood of generating strong-binding neoantigens — revealing novel tumor-specific “editopes”.

Single-Mutation Mode

neoADARgen can be executed in a simplified single-mutation mode, where you provide:

One or more mutation (format: 'chr7:g.100958233G>A', comma-separated)
One or more HLA alleles (comma-separated)
(Optional) gene-counts PATH for TPM annotation

For example:

python -m src.TCGA_patients.cli \
  --results_dir results \
  --netmhc_path /path/to/netMHCpan-4.1 \
  --hg38_fa /path/to/hg38.fa \
  --mutation "chr7:g.100958233G>A" \
  --hla "HLA-C1203,HLA-B3501" \
  --sup_dir sup

Getting help

If you need help of any kind, feel free to open a new issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

neoADARgen

Overview

Setup

Example Output

Single-Mutation Mode

Getting help

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
logs		logs
results		results
src		src
sup		sup
testdata		testdata
.gitignore		.gitignore
README.md		README.md
TCGA_config.yml		TCGA_config.yml
neoADARgen_environment.yml		neoADARgen_environment.yml

landsboy/neoADARgen

Folders and files

Latest commit

History

Repository files navigation

neoADARgen

Overview

Setup

Example Output

Single-Mutation Mode

Getting help

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages