Skip to content
This repository was archived by the owner on Jun 10, 2024. It is now read-only.

Commit 2d4a40f

Browse files
committed
updates to readme
1 parent 9e79087 commit 2d4a40f

File tree

1 file changed

+29
-29
lines changed

1 file changed

+29
-29
lines changed

readme.md

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
1-
#TASR
1+
# TASR
22

3-
##Targeted Assembly of Sequence Reads (TASR)
4-
##TASR v1.6.2 Rene Warren, 2010-2016
5-
##email: rwarren [at] bcgsc [dot] ca
6-
##Visit www.bcgsc.ca/bioinfo/software/tasr for more information
3+
## Targeted Assembly of Sequence Reads (TASR)
4+
## TASR v1.6.2 Rene Warren, 2010-2016
5+
## email: rwarren [at] bcgsc [dot] ca
6+
## Visit www.bcgsc.ca/bioinfo/software/tasr for more information
77

88

9-
###What's new in version 1.6.2?
9+
### What's new in version 1.6.2?
1010
----------------------------
1111

1212
Minor bug fix that prevented TASR from running in reference-guided mode (-i 0). Thanks to Matthew Hobbs for reporting it.
1313
Changed test to explicitely set -i to 1 or 0, for testing purposes (default is -i 1 as per v1.6.1)
1414

1515

16-
###What's new in version 1.6.1?
16+
### What's new in version 1.6.1?
1717
----------------------------
1818

1919
1. Bloom filter functionality to exclude k-mers from your sequence target space (TASR-Bloom)
@@ -25,27 +25,27 @@ TASR and TASR-Bloom:
2525
6. The de novo assembly mode (-i 1) is now the default mode
2626

2727

28-
###What's new in version 1.5.1?
28+
### What's new in version 1.5.1?
2929
----------------------------
3030

3131
fixed TASR for Perl >= 5.16.0, where deprecated getopts.pl has been removed. Thanks to Nicola Soranzo for sending the fix.
3232

3333

34-
###What's new in version 1.5?
34+
### What's new in version 1.5?
3535
--------------------------
3636

3737
TASR v1.5 no longer constrains the use of 15-character words derived from a target sequence for interrogating candidate reads. User-defined target word length values are now passed to the algorithm using the -k option. Using larger -k values should help speed up the search when using long sequence reads, since it will restrict the sequence space accordingly. Note: whereas specificity, speed and RAM usage may increase with k, it may yield more sparse/fragmented assemblies. Proper experimentation with various -k values are warranted.
3838

3939

40-
###What's new in version 1.4?
40+
### What's new in version 1.4?
4141
--------------------------
4242

4343
Ability to interrogate reads in bam files
4444

4545
The -a option is used to specify the location of samtools in your system. If .bam/.BAM are specified in the file-of-filename (FOF) supplied with the -f option, the executable specified under -a will interrogate any reads in .bam files that passed QC.
4646

4747

48-
###What's new in version 1.3?
48+
### What's new in version 1.3?
4949
--------------------------
5050

5151
Support for sequence target-independent de novo assemblies
@@ -85,28 +85,28 @@ XXXOXXXXXX
8585
Where "O" represents a variant base
8686
</pre>
8787

88-
###What's new in version 1.2?
88+
### What's new in version 1.2?
8989
--------------------------
9090

9191
The -f option input reads via a file of filenames (fof). The latter lists any fasta/fastq sequence files you wish to input.
9292
One file per line must be specified, full path to your file(s) is recommended.
9393

9494

95-
###Description
95+
### Description
9696
-----------
9797

9898
Targeted Assembly of Sequence Reads (TASR) using the SSAKE assembly engine.
9999
TASR is a genomics application that allows hypothesis-based interrogation of genomic regions (sequence targets) of interest.
100100
*It only considers reads for assembly that have overlap potential to input target sequences.
101101

102102

103-
###Implementation and requirements
103+
### Implementation and requirements
104104
-------------------------------
105105

106106
TASR is implemented in PERL and runs on any platform where PERL is installed
107107

108108

109-
###Install
109+
### Install
110110
-------
111111

112112
Download the .tar.gz, gunzip and extract the files on your system using:
@@ -136,7 +136,7 @@ PREFIX=./bloom5-10-0
136136
Change the shebang line of TASR to point to the version of PERL installed on your system and you're good to go.
137137

138138

139-
###Documentation
139+
### Documentation
140140
-------------
141141

142142
Refer to the TASR.readme file on how to run SSAKE and the SSAKE web site for information about the software and its performance
@@ -145,7 +145,7 @@ www.bcgsc.ca/bioinfo/software/tasr
145145
Questions or comments? We would love to hear from you!
146146

147147

148-
###Citing TASR
148+
### Citing TASR
149149
-----------
150150

151151
Thank you for using, developing and promoting this free software.
@@ -156,7 +156,7 @@ Warren RL, Holt RA, 2011 Targeted Assembly of Short Sequence Reads. PLoS ONE 6(5
156156
Warren RL, Sutton GG, Jones SJM, Holt RA. 2007. Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 23(4):500-501
157157
</pre>
158158

159-
###Running TASR
159+
### Running TASR
160160
------------
161161
<pre>
162162
e.g. ../TASR -s targets.fa -f foobar.fof -m 15 -c 1
@@ -182,7 +182,7 @@ Usage: ./TASR [v1.6.2]
182182

183183
</pre>
184184

185-
###Test data
185+
### Test data
186186
---------
187187

188188
Execute "runme.sh"
@@ -199,7 +199,7 @@ E. Run TASR, quality-clip mode:
199199
>../TASR -s targets.fa -f foobar.fof -m 15 -c 1 -u 1
200200
</pre>
201201

202-
###How it works
202+
### How it works
203203
------------
204204

205205
If the -s option is set and points to a valid fasta file, the DNA sequences comprised in that file will populate the hash table and be used exclusively as seeds to nucleate contig extensions (they will not be utilized to build the prefix tree). In that scheme, every unique sequence target will be used in turn to nucleate an extension, using short reads found in the tree (specified in -f). This feature might be useful if you already have characterized sequences & want to increase their length using short reads. That said, since the short reads are not used as seeds when -s is set, they will not cluster to one another WITHOUT a target sequence file.
@@ -209,15 +209,15 @@ The .singlets will ONLY list sequence targets for which there are no overlapping
209209
DNA sequence reads in a fastq or fasta format are fed into into the algorithm via a file of filenames using the ‚ -f option. DNA sequence targets, used to interrogate all reads are supplied as a multi fasta file using the ‚-s option. Sequence targets are read first. From each target, every possible 15-character word (or user-defined -k) from the plus and minus strands is extracted and stored in a hash table. As the bulk of the NGS sequences are read, quality trimming is possible at run-time, provided that a fastq file is supplied, concurrently with the ‚-c 1 option. In SSAKE, the first 15bp of each read and of its reverse complement are unconditionally used as an index to fill the prefix tree. In TASR, only those with matching 15-mer (-k mer) in the target sequence set are considered, thus limiting the sequence space to that of the target sequence. Low-complexity and large DNA sequence target will draw in more reads, which will impact the performance of TASR.
210210

211211

212-
###TASR-Bloom
212+
### TASR-Bloom
213213
----------
214214

215215
TASR-Bloom uses a Bloom filter supplied with the -l option, to eliminate target k-mers for recruiting reads.
216216
This could be useful for removing low-complexity or repeat k-mers in the supplied -s target sequences, for instance.
217217
The Bloom filter must be built with the ./writeBloom.pl utility in the ./tools folder and the k-mer length must match that supplied (-k).
218218

219219

220-
###Input sequences
220+
### Input sequences
221221
---------------
222222

223223
-f file of filenames corresponding to fasta or fastq files
@@ -244,7 +244,7 @@ AGTGAGGAAAACACGGAGTTGATGCAgAAGCCCCAACATCCAACCTCGACTC
244244
-Spaces in fasta file are NOT permitted and will either not be considered or result in execution failure
245245

246246

247-
###Tips for choosing target sequences
247+
### Tips for choosing target sequences
248248
----------------------------------
249249

250250
The length and sequence complexity of a target will have tremendous influence on the outcome of the assembly. Of course, depending on your application (SNV search, confirming SNPs, detecting fusion transcripts), the length & complexity may or may not matter.
@@ -262,7 +262,7 @@ Given a target sequence length (T) and read length (R), then:
262262
The same principles can be applied for detecting a translocation or a fusion transcript, although usually less critical, esp. for the latter where depth of coverage is usually not limiting.
263263

264264

265-
###Output files
265+
### Output files
266266
------------
267267

268268
Output file|Description
@@ -275,7 +275,7 @@ Output file|Description
275275
.pileup | produces a modified pileup output (see below)
276276

277277

278-
###Understanding the .contigs fasta header
278+
#### Understanding the .contigs fasta header
279279
---------------------------------------
280280
<pre>
281281
e.g.
@@ -292,7 +292,7 @@ the coverage (C) is calculated using the total number (T) of consensus bases [su
292292
C = T / G
293293
</pre>
294294

295-
###Understanding the .coverage.csv file
295+
#### Understanding the .coverage.csv file
296296
------------------------------------
297297
<pre>
298298
e.g.
@@ -302,7 +302,7 @@ e.g.
302302
Each number represents the number of reads covering that base at that position.
303303

304304

305-
###Understanding the .readposition file
305+
#### Understanding the .readposition file
306306
------------------------------------
307307
<pre>
308308
e.g.
@@ -330,7 +330,7 @@ In this order: read name, start coordinate, end coordinate, read sequence, ascii
330330
* end < start indicates read is on minus strand
331331

332332

333-
###Understanding the modified .pileup file
333+
#### Understanding the modified .pileup file
334334
---------------------------------------
335335

336336
Refer to http://samtools.sourceforge.net/pileup.shtml
@@ -374,7 +374,7 @@ NOTES:
374374
-If target sequences supplied (-s) are identical, both the .pileup and .readposition will comprise information that reflects this. i.e. Though TASR does not assemble targets together, identical sequences provided as input will be listed as one having the base coverage consistent with the input.
375375

376376

377-
###License
377+
### License
378378
-------
379379

380380
TASR Copyright (c) 2010-2016 Canada's Michael Smith Genome Science Centre. All rights reserved.

0 commit comments

Comments
 (0)