-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hello!
First of all, thank you for developing NeuralTE.
I attempted to train my own TE dataset using NeuralTE, but encountered the same problem both in the data preprocessing stage and the training stage:
NeuralTE only generated a.ref file containing a single category "Unknown".
My custom categories (such as LTR/Copia, etc.) were not recognized at all.
python /root/model/NT/NeuralTE/utils/preprocess_repbase.py
--repbase_dir /root/model/NT/data/train.fa
--out_dir /root/model
This is the output.
non_TE_count: 0
pre-processed Repbase database sequence size: 1000, total species num: 1
Or during direct training:
Warning: The input TE library contains unknown superfamily labels,
total size = 216375, which saved at /root/model/NT/re/unconverted_TE.fa
(0, 305, 1) (0,)
Running time of DataProcessor: 2.107806 s
The two input formats I tried
I tried two fasta formats, but neither was correctly recognized:
test-0 LTR/Copia Unknown
ATGCGTACGTTAGC...
test-0#LTR/Copia
ATGCGTACGTTAGC...
No matter which form it takes, the program outputs Unknown.
May I ask how I can make NeuralTE train models with custom category names?
How can I modify it?
thank you