Reference-based deconvolution of methylation patterns
MetDecode is written in Python 3.11. Dependencies can be installed by running:
pip3 install -r requirements.txtTo install MetDecode:
python3 setup.py install --userTo run the tool in command line, you will have to execute the run.py with the following 3 positioning arguments:
atlas-filepath: TSV file containing the reference atlas. For the input file format, please refer todata/atlas.tsvfor example. Each tissue / cell type has two dedicated columns, namely the number of methylated CpG sites spanned in the marker region, and the total number of CpG sites (both methylation and unmethylated). Each row corresponds to a marker region. The first 3 columns contain respectively the chromosome, start position and end position of each marker region. The file must contain a header of the form: CHROM START END TISSUE1_METH TISSUE1_DEPTH TISSUE2_METH ...cfdna-filepath: TSV file containing the cfDNA samples. The input file format is similar toatlas-filepath, please refer todata/insilico-cfdna.tsvfor example.out-filepath: Output CSV file. It will contain the estimations for the cell type contributions. Number of rows (excluding the header) will be equal to the number of cfDNA samples, and the number of columns will be equal to the number of tissues / cell types in the reference atlas.
The following command runs MetDecode on in-silico-generated data with default hyper-parameters.
python3 run.py example-data/atlas.tsv example-data/cfdna.tsv output.csvIf an unknown contributor (a tissue / cell type suspected to be present in the cfDNA mixtures but not present
in the reference atlas) needs to be modelled, this can be specified with the -n-unknown-tissues optional argument:
python3 run.py example-data/atlas.tsv example-data/cfdna.tsv output.csv -n-unknown-tissues 1By default, the sum of cell type proportions is not equal to 1. However, this constraint can be added using an optional argument:
python3 run.py example-data/atlas.tsv example-data/cfdna.tsv output.csv --sum1Because MetDecode has been designed for sequencing data and is fed with counts as input, one might consider using the coverage as extra information for more accurate deconvolution. Indeed, in the absence of biases, a higher coverage makes the estimation of the corresponding methylation ratio more reliable. However, in the presence of (biological, technical) biases, such assumption does not hold anymore. To disable the modelling of coverage:
python3 run.py data/insilico-atlas.tsv insilico-cfdna.tsv output.csv --no-coverageAnother key feature of MetDecode is its ability to refine the input atlas by unsupervised deconvolution. The inference atlas should be as accurate as the number of input cfDNA samples is large. If this is not the case, you may consider to disable unsupervised deconvolution:
python3 run.py data/insilico-atlas.tsv insilico-cfdna.tsv output.csv --supervisedFirst, uncompress the largest data files:
python3 scripts/uncompress-data.pyAll deconvolution results, except from simulations, are stored in results/.
Simulation results are absent from the repository to to their size.
To reproduce the figures from our paper, you need to run the scripts in scripts/ as described below.
Figures will be saved in figures/
python3 scripts/analyze-cfdna-results.pyTo perform deconvolution from scratch, you can use this command for example:
python3 scripts/benchmark.py metdecode cfdna 30_250bp significant'metdecode' can be replaced by any other method ('nnls', 'qp', 'celfie' or 'cancerlocator'). DMR size '30_250bp' can be replaced by '30_50bp' or '30_100bp'. Atlas DMR filter 'all-markers' can be replaced by 'balanced' or 'significant'.
To be able to run 'celfie', you need to include the CelFiE script available at:
https://github.com/christacaggiano/celfie/blob/master/scripts/celfie.py
inside the metdecode/ folder.
To be able to run 'cancerlocator', you need to include the CancerLocator.jat available at:
https://github.com/jasminezhoulab/CancerLocator
inside the root MetDecode/ folder.
python3 analyze-insilico-results.pyTo perform deconvolution from scratch, you can use this command for example:
python3 scripts/benchmark.py metdecode all-insilico 30_250bp significantpython3 scripts/analyze-cbc-results.pyTo perform deconvolution from scratch, you can use this command for example:
python3 scripts/benchmark.py metdecode too-cbc 30_250bp significantpython3 scripts/loo-simulation.py
python3 scripts/simulations.py
python3 scripts/make-sim-figures.py