-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
enhancementNew feature or requestNew feature or request
Description
I suggest we lower the default --target_mono_exonic_pct from 20 to 5%
With some species with smaller gene sets finding 20% of 1200 train and test genes wont be possible, this was the case for a recent fungal genome.
REAT Failed, the following file might contain information with the reasons behind the failure
/ei/.project-scratch/e/e701c73c-45b1-4784-9385-6c69cf3272cf/CB-GENANNO-508_ERGA_Spongipellis_delectans/Analysis/reat-dev-issue25/Prediction/cromwell-executions/ei_prediction/d18b476e-faa4-4c2f-98a7-b5797c30ddde/call-SelectAugustusTestAndTrain/execution/stderr
+ generate_augustus_test_and_train /ei/.project-scratch/e/e701c73c-45b1-4784-9385-6c69cf3272cf/CB-GENANNO-508_ERGA_Spongipellis_delectans/Analysis/reat-dev-issue25/Prediction/cromwell-executions/ei_prediction/d18b476e-faa4-4c2f-98a7-b5797c30ddde/call-SelectAugustusTestAndTrain/inputs/-1046222641/with_utr.extra.gff --train_min 400 --train_max 1000 --test_max 200 --target_mono_exonic_pct 20
+ gff2gbSmallDNA.pl test.gff /ei/.project-scratch/e/e701c73c-45b1-4784-9385-6c69cf3272cf/CB-GENANNO-508_ERGA_Spongipellis_delectans/Analysis/reat-dev-issue25/Prediction/cromwell-executions/ei_prediction/d18b476e-faa4-4c2f-98a7-b5797c30ddde/call-SelectAugustusTestAndTrain/inputs/1001504700/gfSpoDele1_1.curated_primary.softmasked.fa 200 test.gb
Couldn't open test.gff.
When examined I could see that we simply dont have 240 single exon genes and the generate_augustus_test_and_train script generates no output with no info in an error log so it's not entirely transparant to a user what caused the error
Note the -f force option does not override the target_mono_exonic_pct 20% requirement though this does give an error indication
generate_augustus_test_and_train /ei/.project-scratch/e/e701c73c-45b1-4784-9385-6c69cf3272cf/CB-GENANNO-508_ERGA_Spongipellis_delectans/Analysis/reat-dev-issue25/Prediction/cromwell-executions/ei_prediction/2482d9fe-d7e9-42dc-bbaa-8259e9e25fb8/call-SelectAugustusTestAndTrain/inputs/-578101069/with_utr.extra.gff --train_min 400 --train_max 1000 --test_max 200 --target_mono_exonic_pct 20 -f
Requested minimum number of mono-exonic models: 240
Real possible minimum number of mono-exonic models: 6
Number of train models: 32
Number of mono-exonic models in train set: 6
Traceback (most recent call last):
File "/ei/software/cb/reat/dev-issue32/x86_64/bin/generate_augustus_test_and_train", line 138, in <module>
main()
File "/ei/software/cb/reat/dev-issue32/x86_64/bin/generate_augustus_test_and_train", line 101, in main
test_models = random.sample(train_models, args.test_max)
File "/ei/software/cb/reat/dev-issue32/x86_64/lib/python3.9/random.py", line 449, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
The idea was that target_mono_exonic_pct set a maximum percentage of single exon genes, as coded it works as a target. That being the case I would just lower it to 5%
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request