Explore the docs »
View Demo
·
Report Bug
·
Request Feature
- About the Project
- Getting Started
- Usage
- Examples
- Contributing
- License
- Acknowledgements and References
- Contact
Based on 180 Pattern-Matching (PM) Algorithms Analisys the main idea of the project is to create a simple and fast tool to remove fragments of adapters located on FASTQ files. After testing all 180 algorithms utilizing the SMART Tool, we analysed the results and end up with 5 algorithms that had good performance with the approximated pattern length of a adapter (between 8 and 16 nitrogenous bases). QF43 and Sbndmq-4 had the best results, however Sbndmq-4 was slightly better with patterns of 8 nitrogenous bases, ending as our choice for this project. More informations about FAIR and 180 Pattern-Matching Algorithms Analysis can be found at:
The project was built mainly with C++, but some funcionalities are based on python scripts, including the 180 Pattern-Matching Algorithms Analisys present on this repository.
FAIR works with single, both forward/reverse, and interlaced fastq files to identify, trim and remove adapters and low-quality / N bases from sequences. It's possible to choose the quantity of threads during processing, require a Phred-offset quality identification and/or adapter identification. At the end of the execution a new fastq file is created on the directory choosed by the user with the segments of adapters removed and a additional file with the deleted bases. FAIR does not works yet with tar.gz files.
This repository can be built with any C++ compiler. During the conception of the project we used gcc with any major problem. Additionally, Python is necessary for some extra funcionalities.
- gcc
sudo apt-get install gcc- python
sudo apt-get install pythonIf you want to execute algorithm evaluation located on utils some extra Python Frameworks are required, namely: pandas, matplotlib and numpy. Thankfully, you can install them all at once using pip.
pip install -r requirements.txt --user- Clone the repo
git clone https://github.com/jvcanavarro/FAIR-Fast-Adapter-Identification-and-Removal.git- Build with compiler
cd FAIR-Fast-Adapter-Identification-and-Removal
g++ source/main.cpp -o FAIRBellow are listed all FAIR avaiable parameters.
Usage: /home/jvcanavarro/FAIR-Fast-Adapter-Identification-and-Removal [options] -o <output_dir>
Basic options:
-o/--output <output_dir> directory to store all the resulting files (required)
-h/--help prints this usage message
-v/--version prints version
Input data:
-s/--single <filename> file with unpaired reads
-f/--forward <filename> file with forward paired-end reads
-r/--reverse <filename> file with reverse paired-end reads
-i/--interlaced <filename> file with interlaced forward and reverse paired-end reads
Pipeline options:
--only-identify runs only adapter identification (without removal)
--only-remove runs only adapter removal (without identification)
need to set adapter(s) if this option is set
--trim trim ambiguous bases (N) at 5'/3' termini
--trim-quality trim bases at 5'/3' termini with quality scores <= to
--min-quality value
--min-quality <int> minimal quality value to trim
Advanced options:
--adapter <adapter> adapter sequence that will be removed (unpaired reads)
required with --only-remove
--forward-adapter <adapter> adapter sequence that will be removed
in the forward paired-end reads (required with --only-remove)
--reverse-adapter <adapter> adapter sequence that will be removed
in the reverse paired-end reads (required with --only-remove)
-t/--threads <int> number of threads
[default: 4]
--phred-offset <33 or 64> PHRED quality offset in the input reads (33 or 64)
[default: auto-detect]
For more examples, please refer to the Documentation
You can test the program utilizing the samples sample1.fastq and sample2.fastq located at data. The new files are stored on results. Some common usages are listed bellow.
- Remove Adapters from Single FASTQ File with Adapter and Quality Identification
./FAIR --output results/ --single sample1.fastq- Remove Adapters from Forward and Reverse FASTQ Files with Adapter and Quality Identification
./FAIR --output results/ --forward sample1.fastq --reverse sample2.fastq- Remove Adapters from Forward and Reverse FASTQ Files without Adapters Identification
./FAIR --output results/ --forward sample1.fastq --reverse sample2.fastq --only-remove --forward-adapter CCCCCCC --reverse-adapter CCCATCC- Remove Adapters from Single FASTQ File with Trim, Trim-Quality, Min-Quality, Number of Threads and Phred-Offset
./FAIR --output results/ --single sample1.fastq --trim --trim-quality 90 --min-quality 90 --threads 8 --phread-offset 33Distributed under the MIT License. See LICENSE for more information.
- R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM,20(10):762–772, 1977.
- Koloud Al-Khamaiseh and ShadiALShagarin, “A Survey of String Matching Algorithms” in Int. Journal of Engineering Research and Applications, IJERA, ISSN: 2248-9622, Vol. 4, Issue 7 (Version 2), July 2014, pp.144-156
- B. Durian, H. Peltola, L. Salmela, and J. Tarhio. Bit-parallel search algorithms forlong patterns. In P. Festa, editor, Symposium on Experimental Algorithms, LNCS6049, 129–140, Springer-Verlag, Berlin, 2010.
- G. Navarro, M. Raffinot, “A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching”, in Proc. of the 9th Annual Symposium on Combinatorial Pattern Matching, No. 1448.
- SMART (String Matching Algorithm Research Tool)
- bio-playground by Brent Perdensen
João V. Canavarro - jvcanavarro@gmail.com
Project Link: https://github.com/jvcanavarro/FAIR-Fast-Adapter-Identification-and-Removal
