Hello
We have installed PartitionFinder on our cluster, and we notice a strange behavior when we increase the number of threads (not MPI) combined with option --raxml to process a huge dataset :
The whole PartitionFinder process stays frozen waiting for raxml.linux sub processes, often marked as zombies.
With the example nucleotide dataset we noticed the same behavior, even with -p 8.
With debugging option and --save-phylofiles we checked if there was something wrong with RAxML ... launched sequentially alone outside PartitionFinder on the same data, all RAxML processes run without any problem.
We suspected a problem off buffer size (not set in the subprocess.Popen call) ... we set a comfortable one, and can go further in the data processing, but we still have main process blocked in the same way.
I've changed the code of run_program in partfinder/util.py to the following one, replacing subprocess.Popen by a basic old os.system, and now everything is OK :
def run_program(binary, command):
unique_filename = uuid.uuid4()
command = ""%s" %s 2> %s.err > %s.out" % (binary, command, unique_filename, unique_filename)
log.debug("Running '%s'", command)
returncode=os.system(command)
if returncode != 0:
raise ExternalProgramError("Exit %s: %s" % (returncode,command), "see %s.err %s.out files in project folder" % (unique_filename, unique_filename))
else:
os.remove("%s.err" % (unique_filename))
os.remove("%s.out" % (unique_filename))
I've not tested another "old" solution found here https://bugs.python.org/issue12739
The tests were performed with :
partitionfinder-2.1.1
python 2.7.13
RAxML 8.2.9 compiled with gcc 4.4.7 (tests also made with gcc 6.1.0)
CentOS release 6.5 (Final)
on Dell PowerEdge C6220 (2 x Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz 10 cores with Hyper-Threading, 256G RAM)
The initial command line was : python PartitionFinder.py -p 20 --raxml --no-ml-tree examples/nucleotide/
We noticed also that in fact a -p 2 gave quite the same processing time than a -p 20 ...
Yours faithfully
Patrice Déhais
Hello
We have installed PartitionFinder on our cluster, and we notice a strange behavior when we increase the number of threads (not MPI) combined with option --raxml to process a huge dataset :
The whole PartitionFinder process stays frozen waiting for raxml.linux sub processes, often marked as zombies.
With the example nucleotide dataset we noticed the same behavior, even with -p 8.
With debugging option and --save-phylofiles we checked if there was something wrong with RAxML ... launched sequentially alone outside PartitionFinder on the same data, all RAxML processes run without any problem.
We suspected a problem off buffer size (not set in the subprocess.Popen call) ... we set a comfortable one, and can go further in the data processing, but we still have main process blocked in the same way.
I've changed the code of run_program in partfinder/util.py to the following one, replacing subprocess.Popen by a basic old os.system, and now everything is OK :
def run_program(binary, command):
unique_filename = uuid.uuid4()
command = ""%s" %s 2> %s.err > %s.out" % (binary, command, unique_filename, unique_filename)
log.debug("Running '%s'", command)
returncode=os.system(command)
if returncode != 0:
raise ExternalProgramError("Exit %s: %s" % (returncode,command), "see %s.err %s.out files in project folder" % (unique_filename, unique_filename))
else:
os.remove("%s.err" % (unique_filename))
os.remove("%s.out" % (unique_filename))
I've not tested another "old" solution found here https://bugs.python.org/issue12739
The tests were performed with :
partitionfinder-2.1.1
python 2.7.13
RAxML 8.2.9 compiled with gcc 4.4.7 (tests also made with gcc 6.1.0)
CentOS release 6.5 (Final)
on Dell PowerEdge C6220 (2 x Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz 10 cores with Hyper-Threading, 256G RAM)
The initial command line was : python PartitionFinder.py -p 20 --raxml --no-ml-tree examples/nucleotide/
We noticed also that in fact a -p 2 gave quite the same processing time than a -p 20 ...
Yours faithfully
Patrice Déhais