Skip to content

The data processing was unsuccessful #2

@Ljs1849351839

Description

@Ljs1849351839

Hello!
First of all, thank you for developing NeuralTE.
I attempted to train my own TE dataset using NeuralTE, but encountered the same problem both in the data preprocessing stage and the training stage:
NeuralTE only generated a.ref file containing a single category "Unknown".
My custom categories (such as LTR/Copia, etc.) were not recognized at all.
python /root/model/NT/NeuralTE/utils/preprocess_repbase.py
--repbase_dir /root/model/NT/data/train.fa
--out_dir /root/model
This is the output.
non_TE_count: 0
pre-processed Repbase database sequence size: 1000, total species num: 1

Or during direct training:
Warning: The input TE library contains unknown superfamily labels,
total size = 216375, which saved at /root/model/NT/re/unconverted_TE.fa
(0, 305, 1) (0,)
Running time of DataProcessor: 2.107806 s
The two input formats I tried

I tried two fasta formats, but neither was correctly recognized:

test-0 LTR/Copia Unknown
ATGCGTACGTTAGC...
test-0#LTR/Copia
ATGCGTACGTTAGC...
No matter which form it takes, the program outputs Unknown.

May I ask how I can make NeuralTE train models with custom category names?
How can I modify it?
thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions