Hello, this pipeline is well-detailed and simple to interact with, unlike others that I have so far used. However, after running this software with the "overlap=0" parameter, the "test_prot25_tblastn6_parsed.4col.true.RMfilt.hiConf.cdnm" file in the results folder, which is the list of Pseudogenes are inflated and equal to the number of "read gene pairs" in the step 4 of the pipeline, example (#### Step 4. Get Smith-Waterman alignments with fasta
Find stop and framshifts
here
Program : tfasty36
Pair list: Arabidopsis1_lyrata_tblastn6_parsed_G456.PE_I456.PS1.pairs
Fasta1 : Arabidopsis1_lyrata_pep.fasta
Fasta2 : Arabidopsis1_lyrata_tblastn6_parsed_G456.PE_I456.PS1.subj_coord.fa
Fasta dir: /home/hesbon/fasta36/bin
Working d: tmp_1747293931.697676/
Flags : -m 3 -q
E thres : 1.0
Read gene pairs...
36501 pairs
Read fasta files...
Do sw...
). yield up to pseudogene number (Ps036501).
I find it strangely odd, since I would like to know the actual number of pseudogenes in my sample.
when I change the "overlap=1" parameter, I get significantly reduced number of pseudogenes in "test_prot25_tblastn6_parsed.4col.true.RMfilt.hiConf.cdnm" file, but still inflated. For instance, running this pipeline in different Arabisopsis ecotypes, some yielded 55000 pseudogenes, reduced to 12000 when "overlap=0" and "overlap=1" were used respectively, which is high considering that estimation from different tools in other literature reports around 3000-4000 pseudogenes in Arabidopsis thaliana. what am I missing?
Kindly reach out at your earliest convenience.
Regards,
Opypyy (opiyohhesbon@gmail.com)
Hello, this pipeline is well-detailed and simple to interact with, unlike others that I have so far used. However, after running this software with the "overlap=0" parameter, the "test_prot25_tblastn6_parsed.4col.true.RMfilt.hiConf.cdnm" file in the results folder, which is the list of Pseudogenes are inflated and equal to the number of "read gene pairs" in the step 4 of the pipeline, example (#### Step 4. Get Smith-Waterman alignments with fasta
Find stop and framshifts
here
Program : tfasty36
Pair list: Arabidopsis1_lyrata_tblastn6_parsed_G456.PE_I456.PS1.pairs
Fasta1 : Arabidopsis1_lyrata_pep.fasta
Fasta2 : Arabidopsis1_lyrata_tblastn6_parsed_G456.PE_I456.PS1.subj_coord.fa
Fasta dir: /home/hesbon/fasta36/bin
Working d: tmp_1747293931.697676/
Flags : -m 3 -q
E thres : 1.0
Read gene pairs...
36501 pairs
Read fasta files...
Do sw...
). yield up to pseudogene number (Ps036501).
I find it strangely odd, since I would like to know the actual number of pseudogenes in my sample.
when I change the "overlap=1" parameter, I get significantly reduced number of pseudogenes in "test_prot25_tblastn6_parsed.4col.true.RMfilt.hiConf.cdnm" file, but still inflated. For instance, running this pipeline in different Arabisopsis ecotypes, some yielded 55000 pseudogenes, reduced to 12000 when "overlap=0" and "overlap=1" were used respectively, which is high considering that estimation from different tools in other literature reports around 3000-4000 pseudogenes in Arabidopsis thaliana. what am I missing?
Kindly reach out at your earliest convenience.
Regards,
Opypyy (opiyohhesbon@gmail.com)