Skip to content

Phylorust is a Rust-based command-line tool to generate phylogenetically informative SNP site sets and a FastTree from VCF file(s) and a FASTA reference genome.

License

Notifications You must be signed in to change notification settings

rhysf/Phylorust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phylorust

Phylorust is a Rust-based command-line tool to generate phylogenetically informative SNP site sets (as FASTA files) and associated Tree's from VCF file(s) and FASTA reference genome files.


Features

  • Reads a reference FASTA and single sample or multi-sample VCF(s).
  • Generates phylogenetically informative SNP site sets at configurable coverage thresholds.
  • Produces per-sample FASTA alignments.
  • Runs FastTree automatically (if installed) to generate trees.
  • ASCII tree rendering directly in the terminal.
  • Simple tab-delimited input file (Name_Type_Location.tab) for managing multiple samples.

Prerequisites

To build and run Phylorust, you’ll need:

  • Rust (stable, installed via rustup).
    • Verify install with:
      rustc --version
      cargo --version
  • R (for plotting histograms).
    • Packages: ggplot2, readr, dplyr (install inside R with):
      install.packages(c("ggplot2", "readr", "dplyr"))
  • FastTree (optional, for tree generation).
    • Must be in your system PATH.
    • Verify with:
      FastTree -help

Installation

Clone the repo and install with Cargo:

git clone https://github.com/rhysf/Phylorust.git
cd Phylorust
cargo install --path .

If you installed Rust with rustup, ~/.cargo/bin is normally already in your $PATH. If not, you can add it:

echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

You can now run:

phylorust --help

Alternative (manual install): If you prefer to place the binary in ~/.local/bin

cargo build --release
cp target/release/phylorust ~/.local/bin/

Input file formats

Phylorust uses a tab-delimited file (Name_Type_Location.tab) with three columns:

SampleName VCF /path/to/sample.vcf

  • SampleName = Your preferred sample label (used in output FASTAs/trees).
  • Filetype (must be VCF or vcf).
  • /path/to/sample.vcf = Path to the sample’s VCF file.

Example pipeline

phylorust \
  --fasta ./examples/Cryp_gatt_R265.genome.fa-scaffold3.14.fasta \
  --name_type_location ./examples/Name_Type_location.tab

This will:

  1. Parse the reference FASTA.
  2. Parse VCFs listed in Name_Type_location.tab.
  3. Generate SNP site sets and coverage histograms.
  4. Produce FASTA alignments for each coverage threshold.
  5. Run FastTree (if available) and print ASCII trees in the terminal.

Building with Docker

git clone https://github.com/rhysf/Phylorust.git
cd Phylorust
docker build -t phylorust .

docker run --rm -v $(pwd)/examples:/examples phylorust \
  --fasta /examples/Cryp_gatt_R265.genome.fa-scaffold3.14.fasta \ --name_type_location /examples/Name_Type_location_Docker.tab

On HPC systems without Docker, you can convert the Docker image into a Singularity (Apptainer) image

apptainer build phylorust.sif docker-daemon://phylorust:latest

apptainer run phylorust.sif \
  --fasta examples/Cryp_gatt_R265.genome.fa-scaffold3.14.fasta \
  --name_type_location examples/Name_Type_location.tab

Command-line arguments

Key options (full list available with --help): --fasta → Reference FASTA file. --name_type_location → Tab-delimited file of sample names, file type, and VCF paths. --output_dir

→ Directory for results (default: Phylorust_output). --generate_fastas → FASTA generation mode (all or specific thresholds). --skip-fasttree → Skip tree generation. --fasttree-bin → Path to FastTree binary (if not in PATH).

Plotting

Histograms are generated with R and saved to both .png and .pdf. You can also run the plotting script directly:

Rscript plot_histogram.R site_coverage_histogram.tsv 90

License

This project is licensed under the MIT License.

About

Phylorust is a Rust-based command-line tool to generate phylogenetically informative SNP site sets and a FastTree from VCF file(s) and a FASTA reference genome.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •