Skip to content

ValueError and Invalid Target End Point Errors during Mikado Serialization #458

@joseph144155

Description

@joseph144155

Description:

I encountered multiple errors while running the Mikado serialization process using the following command:

singularity exec --cleanenv ../../mikado_2.3.2.sandbox mikado serialise --json-conf mikado_2.3.2_custom.conf --xml mikado_prepared.blast.tsv --orfs mikado_prepared.fasta.transdecoder.bed --junctions portcullis_filtered.pass.junctions.bed

The errors captured in the SLURM log file are as follows:

Process Preparer-44:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.10/site-packages/Mikado/serializers/blast_serializer/tabular_utils.py", line 387, in run
curr_hit, curr_hsps = prep_hit(key, rows)
File "/usr/local/lib/python3.10/site-packages/Mikado/serializers/blast_serializer/tabular_utils.py", line 247, in prepare_tab_hit
hit_dict["target_start"] = int(t_aligned.min())
File "/usr/local/lib/python3.10/site-packages/numpy/core/_methods.py", line 44, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation minimum which has no identity

...

Process Preparer-45:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.10/site-packages/Mikado/serializers/blast_serializer/tabular_utils.py", line 387, in run
curr_hit, curr_hsps = prep_hit(key, rows)
File "/usr/local/lib/python3.10/site-packages/Mikado/serializers/blast_serializer/tabular_utils.py", line 250, in prepare_tab_hit
raise ValueError("Invalid target end point: {}, {}".format(hit_dict["target_end"], sends))
ValueError: Invalid target end point: 202, (449,)

...
Scoring File Used:(YAML)

requirements:
expression:

  • cdna_length and ((exon_num.multi and verified_introns_num and min_intron_length and max_intron_length) or (exon_num.mono and combined_cds_length))
    parameters:
    cdna_length: {operator: ge, value: 300}
    exon_num.multi: {operator: ge, value: 2}
    verified_introns_num: {operator: gt, value: 0}
    min_intron_length: {operator: ge, value: 5}
    max_intron_length: {operator: le, value: 2000}
    exon_num.mono: {operator: eq, value: 1}
    combined_cds_length: {operator: gt, value: 0}
    scoring:
    snowy_blast_score: {rescaling: max}
    is_complete: {rescaling: target, value: true}
    has_start_codon: {rescaling: target, value: true}
    has_stop_codon: {rescaling: target, value: true}
    number_internal_orfs: {rescaling: target, value: 1}
    cds_not_maximal: {rescaling: min}
    cds_not_maximal_fraction: {rescaling: min}
    selected_cds_fraction: {rescaling: target, value: 0.7}
    selected_cds_length: {rescaling: max}
    selected_cds_intron_fraction: {rescaling: max}
    selected_cds_intron_fraction: {rescaling: max}
    cdna_length: {rescaling: max}
    exon_num: {rescaling: max, filter: {operator: ge, value: 3}}
    five_utr_num: {rescaling: target, value: 2, filter: {operator: lt, value: 4}}
    five_utr_length: {rescaling: target, value: 100, filter: {operator: le, value: 2500}}
    three_utr_num: {rescaling: target, value: 1, filter: {operator: lt, value: 3}}
    three_utr_length: {rescaling: target, value: 200, filter: {operator: lt, value: 2500}}
    proportion_verified_introns_inlocus: {rescaling: max}
    non_verified_introns_num: {rescaling: min}
    end_distance_from_junction: {rescaling: min, filter: {operator: lt, value: 55}}
    as_requirements:
    expression: [cdna_length and three_utr_length and five_utr_length and utr_length and suspicious_splicing]
    parameters:
    cdna_length: {operator: ge, value: 200}
    utr_length: {operator: le, value: 2500}
    five_utr_length: {operator: le, value: 2500}
    three_utr_length: {operator: le, value: 2500}
    suspicious_splicing: {operator: ne, value: true}
    not_fragmentary:
    expression: [((exon_num.multi and (cdna_length.multi or selected_cds_length.multi)), or, (exon_num.mono and ((snowy_blast_score and selected_cds_length.zero) or selected_cds_length.mono)))]
    parameters:
    selected_cds_length.zero: {operator: gt, value: 300} # 600
    exon_num.multi: {operator: gt, value: 2}
    cdna_length.multi: {operator: ge, value: 300}
    selected_cds_length.multi: {operator: gt, 250}
    exon_num.mono: {operator: eq, value: 1}
    snowy_blast_score: {operator: gt, value: 0} # 0.3
    selected_cds_length.mono: {operator: gt, value: 600} # 900
    exon_num.mono: {operator: le, value: 2}
    Error Explanation:

ValueError: zero-size array to reduction operation minimum which has no identity:

This error occurs when attempting to find the minimum value of an empty array. It suggests that the input array t_aligned is empty during the execution of int(t_aligned.min()) in prepare_tab_hit.
ValueError: Invalid target end point:

This error indicates that the target end points are not matching expected values, causing the prepare_tab_hit function to raise an error.

Questions:

What could be the underlying cause for these arrays being empty or target end points being invalid?
Are there any specific input checks or preprocessing steps that I might be missing to prevent these errors?
Is it safe to ignore these errors, or do they indicate a critical problem that needs addressing?
Additional Information:

The serialized.log file does not output any errors; these are only captured in the SLURM log file when the job is submitted using sbatch.
I used docker pull quay.io/biocontainers/mikado:2.3.2--py37h9c5868f_0 to build the Singularity image.
Any guidance on how to resolve or debug these issues would be greatly appreciated.
Thank you for your assistance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions