The number of Pseudogenes generated

Hello, this pipeline is well-detailed and simple to interact with, unlike others that I have so far used. However, after running this software with the "overlap=0" parameter, the "test_prot25_tblastn6_parsed.4col.true.RMfilt.hiConf.cdnm" file in the results folder, which is the list of Pseudogenes are inflated and equal to the number of "read gene pairs" in the step 4 of the pipeline, example (#### Step 4. Get Smith-Waterman alignments with fasta

Find stop and framshifts
here
Program  : tfasty36
Pair list: Arabidopsis1_lyrata_tblastn6_parsed_G456.PE_I456.PS1.pairs
Fasta1   : Arabidopsis1_lyrata_pep.fasta
Fasta2   : Arabidopsis1_lyrata_tblastn6_parsed_G456.PE_I456.PS1.subj_coord.fa
Fasta dir: /home/hesbon/fasta36/bin
Working d: tmp_1747293931.697676/
Flags    : -m 3 -q
E thres  : 1.0
Read gene pairs...
 36501 pairs
Read fasta files...
Do sw...
). yield up to pseudogene number (Ps036501). 
I find it strangely odd, since I would like to know the actual number of pseudogenes in my sample.

when  I change the  "overlap=1" parameter, I get significantly reduced number of pseudogenes in "test_prot25_tblastn6_parsed.4col.true.RMfilt.hiConf.cdnm" file, but still inflated. For instance, running this pipeline in different Arabisopsis ecotypes, some yielded 55000 pseudogenes, reduced to 12000 when "overlap=0" and "overlap=1" were used respectively, which is high considering that estimation from different tools in other literature reports around 3000-4000 pseudogenes in Arabidopsis thaliana. what am I missing? 

Kindly reach out at your earliest convenience.

Regards, 
Opypyy (opiyohhesbon@gmail.com)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The number of Pseudogenes generated #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The number of Pseudogenes generated #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions