Skip to content

The number of Pseudogenes generated #20

@Opypyy

Description

@Opypyy

Hello, this pipeline is well-detailed and simple to interact with, unlike others that I have so far used. However, after running this software with the "overlap=0" parameter, the "test_prot25_tblastn6_parsed.4col.true.RMfilt.hiConf.cdnm" file in the results folder, which is the list of Pseudogenes are inflated and equal to the number of "read gene pairs" in the step 4 of the pipeline, example (#### Step 4. Get Smith-Waterman alignments with fasta

Find stop and framshifts
here
Program : tfasty36
Pair list: Arabidopsis1_lyrata_tblastn6_parsed_G456.PE_I456.PS1.pairs
Fasta1 : Arabidopsis1_lyrata_pep.fasta
Fasta2 : Arabidopsis1_lyrata_tblastn6_parsed_G456.PE_I456.PS1.subj_coord.fa
Fasta dir: /home/hesbon/fasta36/bin
Working d: tmp_1747293931.697676/
Flags : -m 3 -q
E thres : 1.0
Read gene pairs...
36501 pairs
Read fasta files...
Do sw...
). yield up to pseudogene number (Ps036501).
I find it strangely odd, since I would like to know the actual number of pseudogenes in my sample.

when I change the "overlap=1" parameter, I get significantly reduced number of pseudogenes in "test_prot25_tblastn6_parsed.4col.true.RMfilt.hiConf.cdnm" file, but still inflated. For instance, running this pipeline in different Arabisopsis ecotypes, some yielded 55000 pseudogenes, reduced to 12000 when "overlap=0" and "overlap=1" were used respectively, which is high considering that estimation from different tools in other literature reports around 3000-4000 pseudogenes in Arabidopsis thaliana. what am I missing?

Kindly reach out at your earliest convenience.

Regards,
Opypyy (opiyohhesbon@gmail.com)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions