Using signac workflows provide the following benefits:
-
The
signacworkflows provide contained and totally reproducible results, since all the project steps and calculations are contained within this a singlesignacproject. Although, to ensure total reproduciblity, the project should be run from a container. Note: This involves building a container (Docker, Apptainer, Podman, etc.), using it to run the original calculations, and providing it the future parties that are trying to reproduce the exact results. -
The
signacworkflows can simply track the progress of any project on locally or on the HPC, providing as much or a little details of the project status as the user programs into theproject.pyfile. -
These
signacworkflows are designed to track the progress of all the project's parts or stages, only resubmitting the jobs locally or to the HPC if they are not completed or not already in the queque. -
These
signacworkflows also allow colleagues to quickly transfer their workflows to each other, and easily add new state points to a project, without the fear of rerunning the original state points. -
Please also see the signac website, which outlines some of the other major features.
This is a signac Workflow example/tutorial for a simple Numpy calculation, which utilizes the following workflow steps:
-
Part 1: For each individual job (set of state points), this code generates the
signac_job_document.jsonfile from thesignac_statepoint.jsondata. Thesignac_statepoint.jsononly stores the set of state points or required variables for the given job. Thesignac_job_document.jsoncan be used to store any other variables that the user wants to store here for later use or searching. -
Part 2: This writes the input values into a file that
Numpywill use to do a calculation inPart 3. There are four (4) random numbers generated that used the initialvalue_0_intvalue and thereplicate_number_intvalue to seed the random number generator. -
Part 3: Calulate the dot product of the four (4) random numbers generated in
Part 2(4 numbers dot [1, 2, 3, 4]). Also, run a bash commandecho "Running the echo command or any other bash command here", which is an example of how to run a bash command to run a software package inside the commands for each state point. -
Part 4: Obtain the average and standard deviation for each input
value_0_intvalue across all the replicates, and print the output data file (analysis/output_avg_std_of_replicates_txt_filename.txt). Signac is setup to automatically loop through all the json files (signac_statepoint.json), calculating the average and standard deviation for the jobs with the state points that only have a differentreplicate_number_intnumbers.
-
src directory: This directory can be used to store any custom function that are required for this workflow. This includes any developed
Pythonfunctions or any template files used for the custom workflow (Example: A base template file that is used for a find and replace function, changing the variables with the differing state point inputs). -
templates directory: This directory is used to store the custom HPC submission scripts and any template files used for the custom workflow (Example: A base template file that is used for a find and replace function, changing the variables with the differing state point inputs). These find and replace template files could also be put in the
srcdirectory, but the HPC submission scripts must remain in thetemplatesdirectory. All the standard or custom module load commands, conda activate commands, and any other custom items that needed to be HPC submission scripts should in included here for every project (Example: Specific queues, CPU/GPU models, etc.).
- The signac documentation and the signac GitHub can be used for reference.
Please cite this GitHub repository.
- This repository: Add repository here
These signac workflows "this project" can be built using conda:
cd signac_numpy_tutorialconda env create -f environment.ymlconda activate signac_numpy_tutorialAll commands in this section are run from the <local_path>/signac_numpy_tutorial/signac_numpy_tutorial/project directory.
This can be done at the start of a new project, but is not always required. If you moved the directory after starting a project or signac can not find the path correctly, you will need to run the following command (signac init) from the project directory:
signac initInitialize all the state points for the jobs (generate all the separate folders with the same variables).
- Note: This command generates the
workspacefolder, which includes a sub-folder for each state point (different variable combinations), These sub-folders are numbered uniquely based of the state point values. The user can add more state points via theinit.pyfile at any time, running the below command to create the new state points files and sub-folders that are in theinit.pyfile.
python init.pyCheck the status of your project (i.e., what parts are completed and what parts are available to be run).
python project.py statusRun all available jobs for the whole project locally with the run command. Note: Using the run command like this will run all parts of the projects until completion. Note: This feature is not available when submitting to HPCs.
python project.py runRun all available part 1 sections of the project locally with the run command.
python project.py run -o part_1_initial_parameters_commandRun all available part 2 sections of the project locally with the run command.
python project.py run -o part_2_write_numpy_input_commandRun all available part 3 sections of the project locally with the run command.
python project.py run -o part_3_numpy_calcs_commandRun all available part 4 sections of the project locally with the run command.
python project.py run -o part_4_analysis_replicate_averages_commandAdditionally, you can run the following flags for the run command, controlling the how the jobs are executed on the local machine (does not produce HPC job submission scripts):
--parallel 2: This only works this way when usingrun. This runs several jobs in parallel (2 in this case) at a time on the local machine.- See the
signacdocumenation for more information, features, and the Project Command Line Interface.
All commands in this section are run from the <local_path>/signac_numpy_tutorial/signac_numpy_tutorial/project directory.
First, you need to be sure that the templates/phoenix.sh or the used HPC template file is correct for the given HPC. Additionally, the templates/phoenix.sh file is correct for the given HPC in the project.py file, specifically it is setup for the DefaultSlurmEnvironment (only for a Slurm enviroment), and the class for it is set properly (Example: class Phoenix(DefaultSlurmEnvironment):).
Second, in general, the signac labels (Example: @Project.label in the project.py file) that check the status of each workflow part should not be written in a way that is computationally expensive, removing the need to run an interactive job on the HPC when using the signac status command. Otherwise, you need to run an interactive job when using the signac status command on the HPC, as it will be computationally expensive.
Initialize all the state points for the jobs (generate all the separate folders with the different state points).
- Note: This command generates the
workspacefolder, which includes a sub-folder for each state point (different variable combinations), These sub-folders are numbered uniquely based of the state point values. The user can add more state points via theinit.pyfile at any time, running the below command to create the new state points files and sub-folders that are in theinit.pyfile.
python init.pyCheck the status of your project (i.e., what parts are completed and what parts are available to be run).
python project.py statusSubmit all the currently available jobs to the HPC with the submit command.
python project.py submitSubmit all available part 1 sections of the project to the HPC with the submit command.
python project.py submit -o part_1_initial_parameters_commandSubmit all available part 2 sections of the project to the HPC with the submit command.
python project.py submit -o part_2_write_numpy_input_commandSubmit all available part 3 sections of the project to the HPC with the submit command.
python project.py submit -o part_3_numpy_calcs_commandSubmit all available part 4 sections of the project to the HPC with the submit command.
python project.py submit -o part_4_analysis_replicate_averages_commandAdditionally, you can run the following flags for the submit command, controlling the how the jobs are submitted to the HPC:
--bundle 2: Only available when usingsubmit. This bundles multiple jobs (2 in this case) into a single run or HPC submittion script, auto adjusting the time, CPU cores, etc., based on the total command selections.--parallel: This only works this way when usingsubmit. TheNvalue in--parallel Nis not read; therefore, it only runs all the jobs in a HPC submittion script at the same time (in parallel), auto adjusting some variables.- See the
signacdocumenation for more information, features, and the Project Command Line Interface.
Warning, the user should always confirm the job submission to the HPC is working properly before submitting jobs using the --pretend flag, especially when using --parallel and --bundle. This may involve programming the correct items in the custom HPC submission script (i.e., the files in the templates folder) as needed to make it work for their unique setup.