Recently, I updated from v2 to v3 maker for a new annotation project. I compiled maker v3 using the same MPICH module I used previously for maker v2.
module load mpich/ge/gcc/64/3.3.2
However, now when I run maker in MPI mode, crashes after 3-20 hours of processing.
Any suggestions on how to troubleshoot or get MPI to run would be greatly appreciated.
Thank you,
-Hans
Below are some example errors:
FATAL: Thread terminated, causing all processes to fail
--> rank=69, hostname=cpu-54
[proxy:0:2@cpu-55] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:2@cpu-55] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:2@cpu-55] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:0@cpu-53] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:0@cpu-53] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@cpu-53] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
srun: error: cpu-53: task 0: Exited with exit code 7
srun: error: cpu-55: task 2: Exited with exit code 7
[mpiexec@cpu-53] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:75): one of the processes terminated badly; aborting
[mpiexec@cpu-53] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:22): launcher returned error waiting for completion
[mpiexec@cpu-53] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:215): launcher returned error waiting for completion
[mpiexec@cpu-53] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
FATAL: Thread terminated, causing all processes to fail
--> rank=94, hostname=cpu-54
[proxy:0:2@cpu-55] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:2@cpu-55] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:2@cpu-55] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:0@cpu-53] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:0@cpu-53] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@cpu-53] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
srun: error: cpu-55: task 2: Exited with exit code 7
srun: error: cpu-53: task 0: Exited with exit code 7
[mpiexec@cpu-53] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:75): one of the processes terminated badly; aborting
[mpiexec@cpu-53] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:22): launcher returned error waiting for completion
[mpiexec@cpu-53] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:215): launcher returned error waiting for completion
[mpiexec@cpu-53] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
deleted:1 hits
Calling FastaDB::new at /data/gpfs/assoc/inbre/projects/software_installs/maker-Version_3.01.04/bin/../lib/FastaSeq.pm line 139.
Calling out to BioPerl get_PrimarySeq_stream at /data/gpfs/assoc/inbre/projects/software_installs/maker-Version_3.01.04/bin/../lib/GI.pm line 2287.
collecting tblastx reports
flattening altEST clusters
Fatal error in PMPI_Send: Unknown error class, error stack:
PMPI_Send(159).............: MPI_Send(buf=0x555559942d30, count=4, MPI_CHAR, dest=71, tag=1111, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1845): Communication error with rank 71: Connection refused
Recently, I updated from v2 to v3 maker for a new annotation project. I compiled maker v3 using the same MPICH module I used previously for maker v2.
module load mpich/ge/gcc/64/3.3.2
However, now when I run maker in MPI mode, crashes after 3-20 hours of processing.
Any suggestions on how to troubleshoot or get MPI to run would be greatly appreciated.
Thank you,
-Hans
Below are some example errors: