A computational pipeline for generating neo-antigens through RNA editing.
neoADARgen is a bioinformatics tool designed to identify and engineer personalized tumor-specific neoantigens (editopes) by simulating A-to-I RNA editing events on somatic mutations. The tool integrates mutation annotation, sequence extraction, RNA editing simulation, peptide generation, and MHC binding prediction via NetMHCpan 4.1.
This demo version of the tool in fact ran on three specific projects in TCGA (BRCA, GBN, SCKM), if you want to run on other projects you must download them from TCGA and put them in the testdata folder.
An accompanying repo to the paper:
Employing RNA editing to engineer personalized tumor-specific neoantigens (editopes)
The pipeline performs the following steps:
-
Mutation parsing: Reads patient-specific mutation annotations (MAF format).
-
Sequence extraction: Retrieves reference DNA sequences (±20 bp around each mutation).
-
RNA editing simulation: Applies A-to-I (read as G) edits in single or double combinations.
-
Peptide generation: Translates mutated and edited sequences into peptides of defined lengths (default: 9-mer).
-
MHC binding prediction: Uses NetMHCpan 4.1 to predict peptide–HLA binding affinities.
-
Result summarization: Outputs ranked neoantigen candidates per mutation.
Requires locally installed version of NetMHCpan4.1
And requires a local download of the human genome v.38
- Clone the repository
git clone https://github.com/landsboy/neo-ADARtigen.git
cd neoADARgen
- Create conda environment
conda env create -f neoADARgen_environment.yml
conda activate neoADARgen_env
- Running the Pipeline The easiest way to run neoADARgen is by providing a configuration file (.yml) that defines all required paths and runtime parameters. for example:
paths:
project_dir: "testdata" # Directory containing raw patient mutation data
results_dir: "results" # Directory where all pipeline outputs will be saved
sup_dir: "sup" # Directory with supplementary annotation files (e.g. HLA, TPM)
netmhc_path: <path/to/your/netMHCpan4.1>
hg38_fa: <path/to/your/hg38.fa>
runtime:
edit_modes: [0, 1, 2] # 0 = no RNA editing, 1 = single A→G editing, 2 = double editing
mer_length: 9 # Peptide length for NetMHCpan prediction (9-mer is the default)
num_nuc_around_mut: 20 # Number of nucleotides to extract on each side of the mutation
verbose: false # If true, enable detailed DEBUG logging
log_file: "logs/TCGA_patients.log" # Name of the log file saved in the results directoryOnce the configuration file is ready (e.g. TCGA_config.yml), you can run the full pipeline with a single command:
python -m src.TCGA_patients.cli -c TCGA_config.yml
If you prefer, you can skip the config file and pass the parameters directly via the command line:
python -m src.TCGA_patients.cli \
--project_dir testdata \
--results_dir results \
--sup_dir sup \
--netmhc_path /path/to/netMHCpan-4.1 \
--hg38_fa /path/to/hg38.fa \
--verbose
For each patient, the pipeline generates an individual results file named according to their patient ID (e.g.results/BRCA/TCGA-AC-A2FK.tsv).
Within each file, all somatic mutations located in coding regions (CDS) are analyzed under three distinct conditions:
-
Without RNA editing (original tumor mutation)
-
A single A→G editing event (simulating ADAR activity at one site)
-
With double A→G editing events (simulating ADAR activity at two positions)
Each combination is processed through the NetMHCpan predictor to evaluate its HLA-binding affinity and neoantigen potential.
This allows quantifying, for every patient, how RNA editing may increase the likelihood of generating strong-binding neoantigens — revealing novel tumor-specific “editopes”.
neoADARgen can be executed in a simplified single-mutation mode, where you provide:
-
One or more mutation (format: 'chr7:g.100958233G>A', comma-separated)
-
One or more HLA alleles (comma-separated)
-
(Optional) gene-counts PATH for TPM annotation
For example:
python -m src.TCGA_patients.cli \
--results_dir results \
--netmhc_path /path/to/netMHCpan-4.1 \
--hg38_fa /path/to/hg38.fa \
--mutation "chr7:g.100958233G>A" \
--hla "HLA-C1203,HLA-B3501" \
--sup_dir sup
If you need help of any kind, feel free to open a new issue.
