Skip to content

MPI Errors when running maker 3.01.04 #22

@hans-vg

Description

@hans-vg

Recently, I updated from v2 to v3 maker for a new annotation project. I compiled maker v3 using the same MPICH module I used previously for maker v2.

module load mpich/ge/gcc/64/3.3.2

However, now when I run maker in MPI mode, crashes after 3-20 hours of processing.

Any suggestions on how to troubleshoot or get MPI to run would be greatly appreciated.

Thank you,
-Hans

Below are some example errors:

FATAL: Thread terminated, causing all processes to fail
--> rank=69, hostname=cpu-54
[proxy:0:2@cpu-55] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:2@cpu-55] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:2@cpu-55] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:0@cpu-53] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:0@cpu-53] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@cpu-53] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
srun: error: cpu-53: task 0: Exited with exit code 7
srun: error: cpu-55: task 2: Exited with exit code 7
[mpiexec@cpu-53] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:75): one of the processes terminated badly; aborting
[mpiexec@cpu-53] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:22): launcher returned error waiting for completion
[mpiexec@cpu-53] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:215): launcher returned error waiting for completion
[mpiexec@cpu-53] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
FATAL: Thread terminated, causing all processes to fail
--> rank=94, hostname=cpu-54
[proxy:0:2@cpu-55] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:2@cpu-55] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:2@cpu-55] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:0@cpu-53] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:0@cpu-53] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@cpu-53] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
srun: error: cpu-55: task 2: Exited with exit code 7
srun: error: cpu-53: task 0: Exited with exit code 7
[mpiexec@cpu-53] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:75): one of the processes terminated badly; aborting
[mpiexec@cpu-53] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:22): launcher returned error waiting for completion
[mpiexec@cpu-53] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:215): launcher returned error waiting for completion
[mpiexec@cpu-53] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
deleted:1 hits
Calling FastaDB::new at /data/gpfs/assoc/inbre/projects/software_installs/maker-Version_3.01.04/bin/../lib/FastaSeq.pm line 139.
Calling out to BioPerl get_PrimarySeq_stream at /data/gpfs/assoc/inbre/projects/software_installs/maker-Version_3.01.04/bin/../lib/GI.pm line 2287.
collecting tblastx reports
flattening altEST clusters
Fatal error in PMPI_Send: Unknown error class, error stack:
PMPI_Send(159).............: MPI_Send(buf=0x555559942d30, count=4, MPI_CHAR, dest=71, tag=1111, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1845): Communication error with rank 71: Connection refused

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions