Fragle

Fragle is a fast and versatile method for detection and quantification of circulating tumor DNA (ctDNA) in blood plasma samples. Fragle is cancer-agnostic (tested across 8+ cancer types) and works from both plasma lpWGS (as low as ~0.05X) and targeted sequencing data.

Installation

Download and unzip the Fragle source code (https://github.com/skandlab/FRAGLE/archive/refs/heads/main.zip).

Installing software dependencies:

You need to have conda available in your system.
Download Fragle.tar.gz file from Zenodo.
Command for using Fragle.tar.gz file for installation (see Conda Pack for more details):
- mkdir -p ./Fragle (creating directory for unpacking)
- tar -xzf Fragle.tar.gz -C ./Fragle (unpack the archive)
- source ./Fragle/bin/activate (activate the environment)
  - Now, Fragle/bin is added to your path.
  - You can now run Fragle software.
- source ./Fragle/bin/deactivate (deactivate environment)

Installing software dependencies (alternative method):

If you fail to install the software using the Fragle.tar.gz file, then you can manually install the following software libraries instead:
- Python 3.7+, Sklearn 1.2.2, Pandas, PyTorch (CPU version), Numpy, Pysam, SAMtools
- Use pip or conda to install the above-mentioned software libraries
- You can use the PyTorch GPU version as well; make sure to install the appropriate GPU driver and CUDA version (see PyTorch documentation)

Fragle Input and Output Overview

Input:
- Path of a directory with plasma sequencing BAM files / a single BAM file / a single fragment length histogram feature file:
  - All BAM files inside this input directory must be of one type (WGS or targeted sequencing).
  - All BAM files must be mapped to a specific reference genome (hg19 / GRCh37 / hg38).
  - The index file (.bai) must be provided with each BAM file.
  - If you want to run your model directly on histogram feature file, the file needs to be provided in ".pkl" format
- In case the input consists of one or more targeted sequencing BAM files, you will also need to provide the BED file containing on-target sequenced regions.
Output:
- Fragle predicted ctDNA fractions for all input BAM files.
- Histogram feature file (.pkl) used by the Fragle models.
- Off-target BAM files (if targeted sequencing BAM files are provided).
Example:
- Example BAM files are provided inside the Input/ folder.
- Outputs for the example BAM files can be found inside the Output/ folder.
- You can run the software on the example BAM files in the Input/ folder and verify that the outputs match those in the Output/ folder.

Running Fragle

Inside the Fragle software folder, run the following commands:

Command:

python main.py --input <INPUT> --output <OUTPUT_FOLDER> --mode <MODE> --genome_build <GENOME_BUILD> --target_bed <TARGET_BED> --cpu <CPU> --threads <THREADS>

Command Line Argument Description:
- INPUT: Input folder path full of BAM files (and corresponding BAI index files)/ path to a single BAM file (BAI file should also be present)/ path to a histogram feature file in .pkl format (see example file "Output/data.pkl") [required string type argument]
- OUTPUT_FOLDER: Output folder path where the Fragle predictions, histogram features, and off-target bam files (in case of targeted sequencing bam files) will be generated [required string type argument]
- MODE: [required string type argument]:
  - 'R': run Fragle on raw WGS bam files (or off-target bam files).
  - 'T': run Fragle on targeted sequencing bam files (containing both on and off-target reads); this mode also requires the TARGET_BED option.
  - 'F': run Fragle directly on histogram features obtained from raw bam files (e.g., Output/data.pkl).
- GENOME_BUILD: Specify reference genome version. [optional string type argument]
  - 'hg19': (default option) if your bam is mapped to hg19 or GRCh37 reference genome.
  - 'hg38': if your bam is mapped to hg38 reference genome.
- TARGET_BED: bed file path for targeted sequencing bam file [optional string type argument]
  - This argument is only used when 'T' option is provided, meaning that you are running Fragle on targeted sequencing data files.
  - The bed file is used to derive the off-target bam files from targeted sequencing data (the off-target bam files will be generated inside the OUTPUT_FOLDER).
- CPU: Number of CPU cores to use for parallel processing [integer type optional argument, default: 32]
  - If your running environment has multiple processors/cores, setting CPU to a value (e.g., 16 or 32) corresponding to number of available cores is recommended.
  - A higher CPU core value (if available) will significantly speed up the software execution.
- THREADS: Number of threads to use for off-target bam file extraction [integer type optional argument, default: 32]
  - This argument is only utilized when the 'T' option is provided, meaning that you are running Fragle on targeted sequencing data.
  - A higher THREADS (if available) value will make the off-target bam extraction process significantly faster.
Note that the main.py file can be run from any location; you do not need to be inside the Fragle software folder.

Example Running Commands for Fragle

Running Fragle on hg19 mapped WGS BAM files:

python main.py --input Input/ --output Output/ --mode R

Running Fragle on single hg19 mapped WGS BAM file (file-centric run):
```
python main.py --input Input/input.bam --output Output/ --mode R
```

Running Fragle on hg38 mapped WGS BAM files:

python main.py --input Input/ --output Output/ --mode R --genome_build hg38

Running Fragle on GRCh37 mapped targeted sequencing BAM files:

python main.py --input Input/ --output Output/ --mode T --target_bed my_target.bed

Running Fragle on hg38 mapped BAM files containing only off-target reads:

python main.py --input Input/ --output Output/ --mode R --genome_build hg38

Running Fragle on hg38 mapped targeted sequencing BAM files utilizing 16 CPU cores and 16 threads:

python main.py --input Input/ --output Output/ --mode T --genome_build hg38 --target_bed my_target.bed --cpu 16 --threads 16

Running Fragle on hg38 mapped single targeted sequencing BAM file (file-centric run):

python main.py --input Input/target_sample.bam --output Output/ --mode T --genome_build hg38 --target_bed my_target.bed

Running Fragle on histogram feature file (histograms.pkl) located inside Input/ folder:
```
python main.py --input Input/histograms.pkl --output Output/ --mode F
```

Additional Information

You will see the following files created in the specified output folder after running Fragle:
- data.pkl: Histogram feature file created in 'R' and in 'T' mode → used by the models for ctDNA fraction prediction
- HT.csv: ctDNA predictions obtained from Fragle high ctDNA burden model
- LT.csv: ctDNA predictions obtained from Fragle low ctDNA burden model
- Fragle.csv: Final ctDNA predictions generated by the Fragle software
- off_target_bams/:
  - This folder is created only when 'T' mode is used
  - It contains off-target BAM files and corresponding index files extracted from the targeted BAM files
If the coverage of any BAM file is too low (less than 0.025X), the software will generate a warning message for that BAM file besides generating the ctDNA fraction.

Runtime and Memory Requirements:

20 MB memory is sufficient to run Fragle.
This memory requirement remains constant even if you vary:
- Number of BAM files in your input directory
- Sequencing depth of the BAM files in your input directory
No GPU is required
It takes around 50 seconds for Fragle to predict ctDNA fraction from a 1X-coverage WGS BAM utilizing only 1 CPU core. The runtime can be reduced to ~3 seconds if you use the default CPU core number of 32.
The runtime increases approximately linearly with the increase of sequencing depth

Contacts

If you have any questions or feedback, please contact us at:
rafeed.rahman015@gmail.com
skanderupamj@a-star.edu.sg

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Input		Input
Models		Models
Output		Output
meta_info		meta_info
Fragle_Overview.png		Fragle_Overview.png
LICENSE		LICENSE
README.md		README.md
feature_generation.py		feature_generation.py
main.py		main.py
predict.py		predict.py
sample_feature_generation.py		sample_feature_generation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fragle

Installation

Fragle Input and Output Overview

Running Fragle

Example Running Commands for Fragle

Additional Information

Runtime and Memory Requirements:

Contacts

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

skandlab/FRAGLE

Folders and files

Latest commit

History

Repository files navigation

Fragle

Installation

Fragle Input and Output Overview

Running Fragle

Example Running Commands for Fragle

Additional Information

Runtime and Memory Requirements:

Contacts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages