Skip to content

elkebir-group/Sapling-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sapling-data

This repository contains the simulation datasets, real-world data (TRACERx), and benchmark results used to evaluate Sapling.

Directory Structure

fastppm-data/
├── data/
│   ├── TRACERx/                            # TRACERx non-small cell lung cancer data
│   └── sims/                               # Simulated datasets
├── results/
│   ├── sims_infer_full_trees_orchard/      # Full tree benchmarking: Orchard results
│   ├── sims_infer_full_trees_fastbe/       # Full tree benchmarking: fastBE results
│   ├── sims_infer_full_trees_fastppm_small_expand/ # Full tree benchmarking: Sapling (Small Expand heuristic)
│   ├── sims_tau_fastppm_small_expand/      # Backbone inference: fastPPM + Small Expand
│   ├── sims_tau_fastppm_big_expand/        # Backbone inference: fastppm + Big Expand
│   ├── sims_tau_cvxopt_small_expand/       # Backbone inference: CVXOPT + Small Expand
│   ├── sims_tau_n8_sapling/                # Backbone inference: Sapling for n=8 simulations
│   ├── sims_tau_n8_ground_truth/           # Backbone inference: Ground truth for n=8 simulations
│   └── real_TRACERx_sapling/               # Sapling results on TRACERx dataset
└── scripts/                                # Analysis scripts

Dataset Descriptions

1. Input Data (data/)

  • sims/: Two sets of synthetic data. Small instances have $n=8$ mutations and $m=2$ samples. Large instances have varying numbers of mutations ($n \in {20, 50, 100}$) and $m=10$ samples.
  • TRACERx/: Real-world multi-region sequencing data from the TRACERx non-small cell lung cancer study.

2. Full Tree Benchmarking Results (results/sims_infer_full_trees_*)

These directories contain the results of comparing Sapling against state-of-the-art methods for inferring complete phylogenetic trees.

  • _orchard: Results using the Orchard method generated by jobs.sh.
  • _fastbe: Results using the FastBE method generated by scripts/run_all_fastbe.sh.
  • _fastppm_small_expand: Sapling results using the fastppm regressor and the "Small Expand" heuristic (inserting leaves or splitting edges) generated by jobs.sh.
  • _fastppm_big_expand: Sapling results using the fastppm regressor and the "Big Expand" heuristic (inserting nodes anywhere in the topology) generated by jobs.sh.

Jupyter notebook with plotting commands: results/analysis_full_trees.ipynb.

3. Backbone Summarization Results (results/sims_tau_*)

These directories contain experiments focusing on the Backbone Tree Inference problem.

  • Solvers:
    • fastppm: The proposed tree-structured dual dynamic programming approach.
    • cvxopt: Baseline using a general convex optimization solver.
  • Validation (n=8):
    • sims_tau_n8_*: Smaller scale simulations ($n=8$) used to validate exact solvers and ground truth recovery.

Jupyter notebook with plotting commands: results/analysis_tau.ipynb.

4. Real World Results

  • results/real_TRACERx_sapling: Inferred phylogenies and backbones for the TRACERx cohort using Sapling.

About

Sapling data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •