FISH Probe Design Pipelines + SLURM parallelization

Introduction:

Pipeline to design one FISH probeset for each provided input. Three input types are allowed:

Gene annotations in gtf / gtf.gz format
Genomic regions in bed / bed.gz format
Nucleotide sequences in fasta / fasta.gz format

The GTF-based workflow takes a GTF annotation file to retrieve coordinates and nucleotide sequences of each gene, transcript and exon. In this workflow, all exons belonging to the same transcript isoform are merged together (intronic regions are dropped) to form one concatenated sequence featuring exon-exon junctions, which is used to design a certain number of kmer oligos to be used in RNA FISH experiments. The BED-based workflow can be used to test entire ungapped regions based on their coordinates. The FASTA-based workflow can be used to test nucleotide sequences, being therefore useful in situations where coordinates or identifiers are not available.

Installation:

A Singularity Image can be provided on request. Otherwise, the pipeline can be installed using this Dockerfile to produce a Docker Container and convert it to Singularity Image. Follow the guide below to install everything. Both Singularity and SLURM are required to run the pipeline.

Download Files:
git clone https://github.com/BiCroLab/fish_probe_design.git
cd fish_probe_design/installation/
unzip prbdocker-master.zip && cd prbdocker-master
Create Docker Container:
docker build -t prbdocker .
Convert Docker to Singularity:
docker run -v /var/run/docker.sock:/var/run/docker.sock -v ".":/output \
--privileged -t --rm singularityware/docker2singularity:v2.6 prbdocker

Usage:

First, git clone https://github.com/BiCroLab/fish_probe_design.git
Adjust all user-specific variables in prb.config
Launch the whole pipeline with bash main.sh

Inputs / Parameters Tuning

The pipeline consists of a main.sh script that manages a series of modules.
All variables can be controlled and edited from a prb.config text file:

${INPUT_GTF} annotation file in .gtf / .gtf.gz format.
${INPUT_FASTA} annotation file in .fasta / .fasta.gz format.
${INPUT_BED} annotation file in .bed / .bed.gz format.
${GENOME} path to genome .fa / .fa.gz having .fai / .gzi index.
All chromosome names should start with the prefix chr and have no additional spaces.
Required index files can be produced with samtools faidx.
${BASEDIR} / ${WORKDIR} base path and output directory name.
${OLIGO_LENGTH} length of probe oligos (default is 40).
${OLIGO_SUBLENGTH} sublength of probe oligos (default is 21).
${SPACER} value affecting average oligo density (default is 10bp).

For each input, N represents the maximum number of oligos to be found. This number corresponds to ${WIDTH} / (${OLIGO_LENGTH} + ${SPACER}). If N suitable candidates are not found, the pipeline will reduce N and retry. For example: 5000bp region / (40bp oligos + 10bp spacer) could yield up to a maximum of 100 oligos.

Outputs / Results Selection:

Selecting Best Results

All results will be saved in a single ${WORKDIR}/prb_results directory, with one file .tsv for each provided input. Output filenames also includes the final number of found oligos, which is equal or inferior to the maximum N value. At this stage, users might want to double-check whether the number of found oligos dropped significantly with respect to the original N and control if oligos were evenly distributed throughout the sequence. The pw score indicates the overall quality of the entire probeset and, if possible, users should try to avoid very low values. However, when using very short regions as inputs, users might consistently get low values. In most situations, provided that there are enough oligos to get a detectable fluorescence signal, users might safely ignore this parameter. General suggestions: (1) try to squeeze as many oligos as possible in a region to get stronger signal; (2) avoid excessively gapped probesets, as they could form separate dots.

Oligo-pools / Selecting Probes Amplification

Since most companies that synthesize oligos apply big discounts when ordering several sequences at once, it is recommended to group together multiple probesets in one or few oligo-pools. All oligos of a given probeset will be further modified to attach two flanking sequences, called flaps (see figure), which can be used to bind fluorophores as well as to selectively amplify the whole probeset from a oligo-pool mixture. The used flaps sequences should not hybridize with the target genome to prevent off-targets and interferences. We provide a series of scripts that can be used to calculate orthogonal kmers for flaps. It is necessary to compute these sequences only once for each reference genome. Combining different left and right flaps sequences can theoretically allow a large number of combinations with a relatively low number of orthogonal sequences. Some pre-computed 20-mers are available for the human genome and can be provided on request.

Preparing Final Results

This section explains how to integrate the previous information and prepare a final table that can be supplied to companies for oligo synthesis. A semi-manual approach is recommended here. Assuming that users are interested in visualizing several probes in the same experiment, while using a limited number of channels, they might want to consider what fluorescent color will be assigned to each probeset. In this situation, for probes of the same groups or conditions, it is advised to assign a common sequence for one of the two flap sequence. Although this is not fundamental, it can simplify and speed up pipetting for amplification, and also reduce the chances that wrong fluorophores could get attached to some oligos. This strategy can be ignored completely if users are interested in a relatively low number of regions and have an excess of orthogonal sequences to create unique combinations. We provide an example script that integrates flaps and oligo sequences and creates an output excel file that can be used for ordering probes.

Advanced settings and information:

Check here for further information.

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
installation		installation
prb_pipeline		prb_pipeline
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FISH Probe Design Pipelines + SLURM parallelization

Introduction:

Installation:

Usage:

Inputs / Parameters Tuning

Outputs / Results Selection:

Selecting Best Results

Oligo-pools / Selecting Probes Amplification

Preparing Final Results

Advanced settings and information:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

BiCroLab/fish_probe_design

Folders and files

Latest commit

History

Repository files navigation

FISH Probe Design Pipelines + SLURM parallelization

Introduction:

Installation:

Usage:

Inputs / Parameters Tuning

Outputs / Results Selection:

Selecting Best Results

Oligo-pools / Selecting Probes Amplification

Preparing Final Results

Advanced settings and information:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages