LiquidCNA

This repository contains R functions implementing liquidCNA, a method for tracking emergent subclone dynamics in CNA data from longitudinal (liquid) biopsies. LiquidCNA leverages copy number data of many genomic segments tracked in multiple (cell-free DNA) biopsies over time accessible via low-cost shallow whole genome sequencing. The algorithm analysis the ensemble of segments and samples to derive an estimate of (i) the fraction of tumour-originating DNA (termed purity) of each sample; (ii) the fraction of tumour DNA originating from an emerging and potentially resistant subclone (termed subclonal-ratio) in each sampling time-point.

For further details on the method, please see our manuscript (Lakatos et al., 2021) in iScience.

Contents and requirements

All code has been written in R (version 4.0.3, compatible with alternative versions), with an example provided in Jupyter notebook format.

The following R packages are required for estimation or visualisation steps: pracma, ggplot2, ggpubr, reshape2, mixtools, fitdistrplus, dplyr, QDNAseq, gtools, gridExtra. Note that some of these packages are used only in pre-processing (e.g. QDNAseq) and might not be necessary for full function depending on the dataset.

The files in the repository are organised as follows:

mixture_estimaton_functions contains all functions of the estimation algorithm.
LiquidCNA_Example contains a detailed working example, using a synthetically generated dataset found in the Example sub-directory.
purity_estimate_synthetic and ratio_estimate_synthetic contain scripts automatically generating synthetic longitudinal CNA datasets (of 5 sampling time-points and 80 genomic segments) with varying levels of measurements noise. For each dataset, the scripts then run liquidCNA to estimate the purity and subclonal-ratio of samples, respectively; and results are recorded together with true values. ratio_estimate_synthetic_samplenumber is an extension of the ratio_estimate_synthetic script that implements random sampling of the number of longitudinal samples, then generates and analyses/estimates the thus sampled dataset. It also executes a bootstrapping solution for providing special care in the n=2 case (only 1 non-baseline time-point sample) which has higher uncertainty.
purity_estimate_insilico contains the script to automatically derive purity estimates for a sample set of 120 in silico cell line mixtures (obtained by sampling and mixing sequencing reads from two high-grade serous ovarian cancer cell lines and normal blood).
all_estimate_insilico automatically generates in silico datasets by randomly sampling samples from the 120 in silico cell line mixtures, including samples according to the minimum required read count (30 samples with >50M reads, 60 samples with >20M reads, etc.). Then the subclona-ratio estimate is computed for each sample, with purity values obtained from the estimates computed by purity_estimate_insilico. Results of samples purity (tumour fraction), relative and absolute subclonal-ratio are recorded together with true theoretical mixing proportions.

Updates

May 2021

Special cases and functions added for the n=2 (single non-baseline sample) case. This includes (1) updated estimation functions so dataframe operations do not throw errors and (2) a special bootstrapping function that allows random subsampling of subclonal samples prior to subclonal ratio estimation - to rectify that in n=2 unstable and subclonal segments are indistinguishable. Therefore prediction is now possible for a single non-baseline sample, but simulation on synthetic data show that the accuracy is lower and we would advise against.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Example		Example
.gitignore		.gitignore
Figure_illustration.png		Figure_illustration.png
LiquidCNA_Example.ipynb		LiquidCNA_Example.ipynb
README.md		README.md
all_estimate_insilico.R		all_estimate_insilico.R
mixture_estimation_functions.R		mixture_estimation_functions.R
purity_estimate_insilico.R		purity_estimate_insilico.R
purity_estimate_synthetic.R		purity_estimate_synthetic.R
ratio_estimate_synthetic.R		ratio_estimate_synthetic.R
ratio_estimate_synthetic_samplenumber.R		ratio_estimate_synthetic_samplenumber.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiquidCNA

Contents and requirements

Updates

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LiquidCNA

Contents and requirements

Updates

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages