Test setup and execution

This README describes how to prepare a scalability study workflow.

Test setup and execution

In the examples that follow, the shell variable SCRATCH defines a scratch directory associated with the user.

Input file selection

To prepare scalability studies, you must first select which XML files you want involved in the study. Use the linkXmlFiles.bash script to create links to the files to be preprocessed into pickle files for more efficient execution of the pubmed script. In this example the 64 largest files are selected:

# The path to all of the MEDLINE XML files
ALL\_RAW\_XML\_FILES=$SCRATCH/MEDLINE/raw

mkdir $SCRATCH/medline/raw/64files
./linkXmlFiles.bash $ALL_RAW_XML_FILES $SCRATCH/medline/raw/64files

Input XML file conversion to pickle file format

The XML files selected for the scalability study can then be converted to pickle file format. Before the conversion can take place, the medline/config/default.cfg file in the medline python package must be updated to specify the destination directory for the pickle files. Set the temp.data.directory value in the configuration file to $SCRATCH/medline/pickled/64files where $SCRATCH should be replaced with the full path of the user's scratch location. The conversion of the selected XML files to pickle format can then be performed with the following command:

./runFileConversion.bash $SCRATCH/medline/raw/64files

Read the comments in the runFileConversion.bash script for more details.

Preparation of performance tests

The configurations and the batch submission scripts for conducting the scalability studies can then be generated with the prepareScalingTests.bash script. The script generates a seperate configuration directory for each combination of number of nodes, number of threads, and number of clusters to be tested in the scalability study. A seperate pubmed configuation file will be generated for each test based on these three parameters. The first argument to the script is the base directory containing the subdirectories for each set of pickled data files. In our examples, this directory is $SCRATCH/medline/pickled. The second argument is the name of the test data set. In our examples, this is the string 64files. The third argument is the default port number H2O will use. The H2O server will attempt to acqure this port number and the next one higher. The default port number is often overridden by the scaleH2OTest.bash script. The tests to be generated by the script are determined by the num_nodes, num_threads, and num_clusters shell script arguments. An example exection of the script is:

num_nodes="01 02 03 04 08 16"
num_threads="01 02 04 08 16"
num_clusters="01000 02000 04000 08000 15000"
./prepareScalingTests.bash $SCRATCH/medline/pickled 64files 54321 \
  ${num_nodes} ${num_threads} ${num_clusters}

Read the comments in the prepareScalingTests.bash script for more details.

Launching the performance test batch jobs

Once the the configuration directories have been generated for the scalability study, the batch job for each test can be launched with the scaleH2OTest.bash script. The script must first be customized to the user's environment before proceeding with the launching the batch script though. Follow the instructions in the script file for setting the PORT_RANGE_START, PORT_RANGE_END, H2O_JAR, SOURCE_DIR, PATH, and PYTHONPATH shell variables. The FEATURE_EXTRACTION_PATH variable should only have to be changed in rare circumstances.

The script runScalingTests.bash is executed to launch the batch job for each performance test. It loops over a subset of the test directories generated by the prepareScalingTests.bash script. The script takes the name of the test data set the scalability study is being performed with, the wall clock time in qsub format, an array of the numbers of nodes, an array of the numbers of threads, and an array of the numbers of clusters to be discovered. An example execution of the script to run the tests created in the previous example is:

num_nodes="01 02 03 04 08 16"
num_threads="01 02 04 08 16"
num_clusters="01000 02000 04000 08000 15000"
./runScalingTests.bash 64files 48:00:00 \
  ${num_nodes} ${num_threads} ${num_clusters}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
convertFiles.bash		convertFiles.bash
launchH2O.bash		launchH2O.bash
linkXmlFiles.bash		linkXmlFiles.bash
prepareScalingTests.bash		prepareScalingTests.bash
printResults.bash		printResults.bash
runFileConversion.bash		runFileConversion.bash
runScalingTests.bash		runScalingTests.bash
scaleTestH2O.bash		scaleTestH2O.bash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Test setup and execution

Input file selection

Input XML file conversion to pickle file format

Preparation of performance tests

Launching the performance test batch jobs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Test setup and execution

Input file selection

Input XML file conversion to pickle file format

Preparation of performance tests

Launching the performance test batch jobs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages