GitHub - LarmanLab/PhIP-Seq-Analyzer

######################### PhIP-seq Analyzer

Demo version Date: 20170524 #########################

apply for a server node

#interact -p parallel -n 24 -t 24:0:0

################ Data preparation ################

It is supposed that the home folder is ./, and the sequencing files determined by sequencing analyzer were put in the fold ./raw/. so they are fastq file: ./raw/PhIPseq_R1.fastq.gz index file: ./raw/PhIPseq_I1.fastq.gz barcode file: ./raw/Sample-Barcode.txt
The splitted fastq files would be stored into the directory ~/phip/PhIP-seq_Analyzer/test_rawdata/
The running mode would be phipseq analysis mode, so the output folders would be ./test_human and ./test_virus. So the last parameter should be -y ./test
enter the folder known as PhIP-seaq_Analyzer

######################### 1: Demultiplex FASTQ file #########################

Example 1: Mostly used function involves demultiplexing and generating sample_info.csv and variables.txt run the command line like the below:

$ python3 ./bin/bioTreatFASTQ.py -i ./raw/PhIPseq_I1.fastq.gz -f ./raw/PhIPseq_R1.fastq.gz -b ./raw/Sample-Barcode.txt -o ./test_rawdata/ -y ./test

Once the proceduce is done, some folders and files would be created: There would be many fastq files in ./test_rawdata/, and the file names would be determined by the barcode file. The folders known as ./test_human and ./test_virus would be created, of which has sample_info.csv and variables.txt Note: the pipeline only accept the absolute path of a directory or file

Example 2: trim nucleotides when demultiplexing, let say sequencing cycles is 50nt trim 10nt from 3-end or keep the first 40nt: -r 40 remove 10nt from 5-end and keep the left: -t 10 trim 5nt from 5-end and keep 40nt and discard the left: -t 5 -r 35

Example 3: The length of exported reads in FASTQ should be kept equal determined by the Sequencing Analyzer. In some cases, They are not because no quality filtering apply. We could specify -l 100nt. That option would discard all reads shorter than 100nt when demultiplexing.

Example 4: Demultiplexing step can be skipped, and directly get sample_info.csv and variables.txt. Here *.fastq files were stored at ./test_rawdata/

$ python3 ./bin/bioTreatFASTQ.py -o ./test_rawdata/ -y ./test

Example 5: skip demultiplexing step, but trim fastq reads only. Here, all fastq files were put in the ./test_rawdata/, and the trimmed fastq files (remove 40nt from 3-end of each read with 100nt) were saved into ./trim_rawdata/

$ python3 ./bin/bioTreatFASTQ.py -r 60 -x ./test_rawdata/ -o./trim_rawdata/ -y ./test

Example 5: The default reference peptide libraries are human and virus. There are additional peptide library known as allergome(allergic peptides) and PE(public epitopes) -c human,virus -c virus,allergome,PE

################### 2: phipseq analysis ################### run the command line like the below:

#human library $ python3 ./bin/bioPHIPseq.py ./test_human/variables.txt #virus library $ python3 ./bin/bioPHIPseq.py ./test_virus/variables.txt

################### Requirements ###################

The barcode file:

barcode file should be *.txt seperated by tab. The first and second columns should be barcode sequences and sample names, respectively.
Regarding sample names, avoid some characters namely slash(/ or ), asterisk(*), at sign(@), any brackets or white space. And the characters dash(-), underscore(_), or dot(.) are acceptable.
No while line is allowed.

The reads file and index file

FASTQ format
Both of the files should be matched and applied together.
support compressed format with *.gz

Running environments

Linux
Python 3.4.0 above

###################### ERROR Handling ###################### ERROR 1: Mac OS X: ValueError: unknown locale: UTF-8 in Python

Resolution: If you have faced the error on MacOS X, here's the quick fix - add these lines to your ~/.bash_profile: export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8 end then reload bash_profile: # source ~/.bash_profile

ERROR 2: Traceback (most recent call last): File "./bin/bioTreatFASTQ.py", line 141, in myGenome.genome(par['fq_file']).demultiplex_fq(par) File "/home/yuan/phip/PhIP-Seq_Analyzer/bin/myGenome.py", line 222, in demultiplex_fq for L1,La, L2,Lb, L3,Lc, L4,Ld in itertools.zip_longest(*[F1,F2]*4): AttributeError: 'module' object has no attribute 'zip_longest'

Resolution: python3 instead of python2.

#end

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
bin.zip		bin.zip
bioPHIPbatch.py		bioPHIPbatch.py
bioPHIPfunc.py		bioPHIPfunc.py
bioPHIPseq.py		bioPHIPseq.py
bioPermutation.py		bioPermutation.py
bioTreatFASTQ.py		bioTreatFASTQ.py
myAlign.py		myAlign.py
myCommon.py		myCommon.py
myDataframe.py		myDataframe.py
myDict.py		myDict.py
myDownload.py		myDownload.py
myGenome.py		myGenome.py
myIO.py		myIO.py
myList.py		myList.py
myParallel.py		myParallel.py
myPlot.py		myPlot.py
myRegression.py		myRegression.py
mySequence.py		mySequence.py
myStat.py		myStat.py
mySystem.py		mySystem.py
variables_PE.txt		variables_PE.txt
variables_allergome.txt		variables_allergome.txt
variables_human.txt		variables_human.txt
variables_virus.txt		variables_virus.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

LarmanLab/PhIP-Seq-Analyzer

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages