This repository provides instructions for extracting episignatures from Nanopore bedMethyl files and using SVMs for sample classification. It includes scripts for training SVMs on methylation data and classifying samples based on disease-specific episignatures associated with developmental disorders. The pipeline facilitates automated episignature detection, supporting both research and clinical diagnostics.
Publication: Clinical evaluation of long-read sequencing-based episignature detection in developmental disorders
To test the SVM classifier with the provided data/supplementary, run the SVM_read_excel.ipynb notebook with Jupyter Notebook, VS Code or Google Colab.
To process external data through the NSBEpi pipeline, follow these steps. Ensure all necessary dependencies are installed before proceeding.
- Bedtools: Bedtools is required for the extraction of episignature loci.
-
In the folder containing your bedmethyl (NON STRAND SPECIFIC) files, first run the
remove_extra_col.shscript to remove unnecessary columns. This will retain only the first 11 columns.Navigate to the folder containing the
remove_extra_col.shscript:cd NSBEpi/bed_processing_episignature_extraction/bedmethyl_processing/ -
Run the script:
./remove_extra_col.sh <path_to_bedmethyl_files>
-
(Optional) If your bedmethyl files have chromosome annotations with the
chrprefix (e.g.,chr1,chr2, etc.), you need to run theremove_chr.shscript to remove this prefix.In the
remove_chr.shscript, set theinput_folderpath to the directory containing your bedmethyl files:# path to folder with bedmethyl files input_folder="/path/to/your/bedmethyl/files"
-
Run the script:
./remove_chr.sh
The output files will have the same basename as the input files, but with
_noChradded before the.bedextension. Before proceding to the next step, move the new files into a separate directory, because only these files will be used from now onwardsExample: If your input file is
sample1.bed, the output file will besample1_noChr.bed. -
After preprocessing the bedmethyl files, you are ready to proceed to the next step.
Once your bedmethyl files are preprocessed, use the extract_episignatures.sh script to extract episignature loci.
-
Navigate to the directory containing the
extract_episignatures.shscript:cd NSBEpi/bed_processing_episignature_extraction/episignature_extraction/ -
Modify the
extract_episignatures.shscript to set your file paths:group1_path: Set this to the path of the folder containing the episignature loci files from hg38_episignature_cordinates.group2_path: Set this to the path of the folder containing your preprocessed bedmethyl files.output_folder: Set the name of the output folder where the results will be saved.
Example:
# path to bed files containing the episignature loci (hg38_episignature_cordinates) group1_path="../hg38_episignature_cordinates" # path to nanopore bedmethyl files group2_path="/path/to/your/bedmethyl/files" # name of the output directory output_folder="episignature_output"
-
Run the script:
./extract_episignatures.sh
This script will extract episignature-specific loci from your bedmethyl files and save the results in the specified
output_folder.
-
After extracting the episignatures, the user needs to run the Python notebook
SVM_read_from_bed.ipynb. -
The notebook will first load Illumina-derived episignatures from the file
no_strand_all_points_dict.pickle, which will be used for training the SVM classifier. -
The user must then set the path to the bedmethyl files with the extracted episignatures from Step 2. These will be processed further in the notebook.
-
Finish executing the notebook to perform SVM training and get the classification results for each sample.