Skip to content

NICHD-BSPC/bacteria-dash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This tool identifies depletion guide candidates (sgRNAs) targeted to specific genes.

This is a generalization of https://github.com/gprezza/DASH_rRNA_depletion to allow any species and any target location(s). Typically, DASH is used for species-specific rRNA depletion, but could be used to target any gene(s).

Provided a FASTA file, a GTF file of region(s) to deplete, a PAM sequence (e.g. "NGG" for wildtype Cas9) and a guide length, CandidateFinder will:

  • identify all possible PAM sequences
  • search for all PAM sites on both strands of the regions to deplete
  • return candidates sgRNA of specified length if their PAM site is sufficiently far from the end (sgRNA is immediately 5' of PAM site)
  • filter candidates based on the thresholds for GC content, presence of heterodimer with primer, high end stabitily with primer
  • off-target candidates are discarded. Off-target is defined as mapping regions outside of the target regions coupled with being adjacent to a PAM site 3' end
  • compose the full oligo sequence of the candidates

Requirements

  • conda must be installed
  • this code repository must be downloaded and unpacked. Below, we refer to the unpacked directory as $WORKDIR. Unless otherwise specified, paths provided are assumed to be relative to $WORKDIR.

Create conda environment

Create the conda environments in the top level of $WORKDIR:

conda create -y -p env --file requirements.txt

Run tests

In the top level of $WORKDIR:

conda activate ./env

# install in editable mode
pip install -e .

Then run tests:

pytest -vv test/test.py

To run just a single test, use the -k flag, e.g.,

pytest -vv test/test.py -k test_orig_small

Usage

CandidateFinder(fasta, annotation, pam, guide_length)

Required input parameters:

  - fasta : str
    Path to genome fasta file (required)

  - annotations : str or pybedtools.BedTool
    If str, should be path to BED or GTF file that will be converted to
    pybedtools.BedTool file. This conversion will automatically handle
    1-based GTF and 0-based BED annotations to avoid off-by-one errors (required)

  - pam : str
    PAM sequence, e.g. "NGG" for wildtype Cas9 (required)

  - guide_length : int
    How long of a guide to select (required)

Optional parameters:

  - min_gc: int
    Lower threshold of acceptable GC percentage (inclusive) (optional; default=0)

  - max_gc : int
    Upper threshld of acceptable GC percentage (inclusive) (optional; default=100)

  - het_tm : int
    Degrees C threshold for Primer3's heterodimer routine (optional; default=40)

  - end_tm : int
    Degrees C threshold for Primer3's end stability routine (optional; default=30)

  - t7 : str
    T7 promoter (minus the first G) (optional; default="TTCTAATACGACTCACTATA")

  - scaffold : str
    Cas9 sgRNA scaffold (optional;
    default="GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTT")

   - primer : str
    Reverse complement of this Cas9 primer will be added to template sgRNA oligos (optional;
    default="GGCATACTCTGCGACATCGT")

  - bowtie_index_prefix : str
    Path to directory containing existing bowtie2 index. If missing,
    index will be created (optional; default='tmp/index')

  - outfn : str
    Output files path and basename. Default is 'candidates' (optional; default='candidates')

  - plots : bool
    Generate coverage plots (optional; default=True)

  - legacy : bool
    Match output to original DASH tool (optional; default=False

Outputs

The output of CandidateFinder will be files named after the outfn parameter, which we will refer as $OUTFN. This can be a file prefix (candidates), or a path followed by a file prefix (i.e. output_dir/candidates)

  • $OUTFN_oligos.csv: comma-separated file of unique full oligo sequences. They are built from the T7 promoter sequence, the candidate guide, the Cas9 sgRNA scaffold and the reverse complement of the primer.

  • $OUTFN_grnas.bed: BED file of where the sgRNAs mapped to the reference genome. A sgRNA mapping to several positions in the genome will be listed at each of its locations.

  • $OUTFN_grnas.fa: FASTA sequence of the sgRNAs only.

  • $OUTFN_coverage.pdf: map of the sgRNAs along the reference genome

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages