-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Description:
I encountered multiple errors while running the Mikado serialization process using the following command:
singularity exec --cleanenv ../../mikado_2.3.2.sandbox mikado serialise --json-conf mikado_2.3.2_custom.conf --xml mikado_prepared.blast.tsv --orfs mikado_prepared.fasta.transdecoder.bed --junctions portcullis_filtered.pass.junctions.bed
The errors captured in the SLURM log file are as follows:
Process Preparer-44:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.10/site-packages/Mikado/serializers/blast_serializer/tabular_utils.py", line 387, in run
curr_hit, curr_hsps = prep_hit(key, rows)
File "/usr/local/lib/python3.10/site-packages/Mikado/serializers/blast_serializer/tabular_utils.py", line 247, in prepare_tab_hit
hit_dict["target_start"] = int(t_aligned.min())
File "/usr/local/lib/python3.10/site-packages/numpy/core/_methods.py", line 44, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation minimum which has no identity
...
Process Preparer-45:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.10/site-packages/Mikado/serializers/blast_serializer/tabular_utils.py", line 387, in run
curr_hit, curr_hsps = prep_hit(key, rows)
File "/usr/local/lib/python3.10/site-packages/Mikado/serializers/blast_serializer/tabular_utils.py", line 250, in prepare_tab_hit
raise ValueError("Invalid target end point: {}, {}".format(hit_dict["target_end"], sends))
ValueError: Invalid target end point: 202, (449,)
...
Scoring File Used:(YAML)
requirements:
expression:
- cdna_length and ((exon_num.multi and verified_introns_num and min_intron_length and max_intron_length) or (exon_num.mono and combined_cds_length))
parameters:
cdna_length: {operator: ge, value: 300}
exon_num.multi: {operator: ge, value: 2}
verified_introns_num: {operator: gt, value: 0}
min_intron_length: {operator: ge, value: 5}
max_intron_length: {operator: le, value: 2000}
exon_num.mono: {operator: eq, value: 1}
combined_cds_length: {operator: gt, value: 0}
scoring:
snowy_blast_score: {rescaling: max}
is_complete: {rescaling: target, value: true}
has_start_codon: {rescaling: target, value: true}
has_stop_codon: {rescaling: target, value: true}
number_internal_orfs: {rescaling: target, value: 1}
cds_not_maximal: {rescaling: min}
cds_not_maximal_fraction: {rescaling: min}
selected_cds_fraction: {rescaling: target, value: 0.7}
selected_cds_length: {rescaling: max}
selected_cds_intron_fraction: {rescaling: max}
selected_cds_intron_fraction: {rescaling: max}
cdna_length: {rescaling: max}
exon_num: {rescaling: max, filter: {operator: ge, value: 3}}
five_utr_num: {rescaling: target, value: 2, filter: {operator: lt, value: 4}}
five_utr_length: {rescaling: target, value: 100, filter: {operator: le, value: 2500}}
three_utr_num: {rescaling: target, value: 1, filter: {operator: lt, value: 3}}
three_utr_length: {rescaling: target, value: 200, filter: {operator: lt, value: 2500}}
proportion_verified_introns_inlocus: {rescaling: max}
non_verified_introns_num: {rescaling: min}
end_distance_from_junction: {rescaling: min, filter: {operator: lt, value: 55}}
as_requirements:
expression: [cdna_length and three_utr_length and five_utr_length and utr_length and suspicious_splicing]
parameters:
cdna_length: {operator: ge, value: 200}
utr_length: {operator: le, value: 2500}
five_utr_length: {operator: le, value: 2500}
three_utr_length: {operator: le, value: 2500}
suspicious_splicing: {operator: ne, value: true}
not_fragmentary:
expression: [((exon_num.multi and (cdna_length.multi or selected_cds_length.multi)), or, (exon_num.mono and ((snowy_blast_score and selected_cds_length.zero) or selected_cds_length.mono)))]
parameters:
selected_cds_length.zero: {operator: gt, value: 300} # 600
exon_num.multi: {operator: gt, value: 2}
cdna_length.multi: {operator: ge, value: 300}
selected_cds_length.multi: {operator: gt, 250}
exon_num.mono: {operator: eq, value: 1}
snowy_blast_score: {operator: gt, value: 0} # 0.3
selected_cds_length.mono: {operator: gt, value: 600} # 900
exon_num.mono: {operator: le, value: 2}
Error Explanation:
ValueError: zero-size array to reduction operation minimum which has no identity:
This error occurs when attempting to find the minimum value of an empty array. It suggests that the input array t_aligned is empty during the execution of int(t_aligned.min()) in prepare_tab_hit.
ValueError: Invalid target end point:
This error indicates that the target end points are not matching expected values, causing the prepare_tab_hit function to raise an error.
Questions:
What could be the underlying cause for these arrays being empty or target end points being invalid?
Are there any specific input checks or preprocessing steps that I might be missing to prevent these errors?
Is it safe to ignore these errors, or do they indicate a critical problem that needs addressing?
Additional Information:
The serialized.log file does not output any errors; these are only captured in the SLURM log file when the job is submitted using sbatch.
I used docker pull quay.io/biocontainers/mikado:2.3.2--py37h9c5868f_0 to build the Singularity image.
Any guidance on how to resolve or debug these issues would be greatly appreciated.
Thank you for your assistance.