This repository contains the simulation datasets, real-world data (TRACERx), and benchmark results used to evaluate Sapling.
fastppm-data/
├── data/
│ ├── TRACERx/ # TRACERx non-small cell lung cancer data
│ └── sims/ # Simulated datasets
├── results/
│ ├── sims_infer_full_trees_orchard/ # Full tree benchmarking: Orchard results
│ ├── sims_infer_full_trees_fastbe/ # Full tree benchmarking: fastBE results
│ ├── sims_infer_full_trees_fastppm_small_expand/ # Full tree benchmarking: Sapling (Small Expand heuristic)
│ ├── sims_tau_fastppm_small_expand/ # Backbone inference: fastPPM + Small Expand
│ ├── sims_tau_fastppm_big_expand/ # Backbone inference: fastppm + Big Expand
│ ├── sims_tau_cvxopt_small_expand/ # Backbone inference: CVXOPT + Small Expand
│ ├── sims_tau_n8_sapling/ # Backbone inference: Sapling for n=8 simulations
│ ├── sims_tau_n8_ground_truth/ # Backbone inference: Ground truth for n=8 simulations
│ └── real_TRACERx_sapling/ # Sapling results on TRACERx dataset
└── scripts/ # Analysis scripts
-
sims/: Two sets of synthetic data. Small instances have
$n=8$ mutations and$m=2$ samples. Large instances have varying numbers of mutations ($n \in {20, 50, 100}$ ) and$m=10$ samples. - TRACERx/: Real-world multi-region sequencing data from the TRACERx non-small cell lung cancer study.
These directories contain the results of comparing Sapling against state-of-the-art methods for inferring complete phylogenetic trees.
- _orchard: Results using the Orchard method generated by jobs.sh.
- _fastbe: Results using the FastBE method generated by scripts/run_all_fastbe.sh.
- _fastppm_small_expand: Sapling results using the
fastppmregressor and the "Small Expand" heuristic (inserting leaves or splitting edges) generated by jobs.sh. - _fastppm_big_expand: Sapling results using the
fastppmregressor and the "Big Expand" heuristic (inserting nodes anywhere in the topology) generated by jobs.sh.
Jupyter notebook with plotting commands: results/analysis_full_trees.ipynb.
These directories contain experiments focusing on the Backbone Tree Inference problem.
-
Solvers:
-
fastppm: The proposed tree-structured dual dynamic programming approach. -
cvxopt: Baseline using a general convex optimization solver.
-
-
Validation (n=8):
-
sims_tau_n8_*: Smaller scale simulations ($n=8$ ) used to validate exact solvers and ground truth recovery.
-
Jupyter notebook with plotting commands: results/analysis_tau.ipynb.
- results/real_TRACERx_sapling: Inferred phylogenies and backbones for the TRACERx cohort using Sapling.