Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
2cb2a2d
merged kais last minute config into prep_release
Nov 25, 2020
289273d
forgot to export last_minute
Nov 25, 2020
ce0f4a9
possible bug fixed
Nov 25, 2020
cf41b55
fixes to branch more_dyn_conf_merged
Dec 2, 2020
3abcfe5
Merge branch 'fixes/more_dyn_conf_merged' into develop
Dec 3, 2020
d6b35e4
Merge branch 'hotfix/venv_install_plugins' into develop
pgierz Dec 8, 2020
8eb42c2
Bump version: 5.0.2 → 5.0.3
pgierz Dec 8, 2020
ad6277e
Merge pull request #49 from esm-tools/hotfix/model_environment_cases
Dec 10, 2020
1305060
merged release into develop
Dec 10, 2020
20d191e
colorful diffs
pgierz Dec 11, 2020
5565af4
allows to exit right away from venv question
pgierz Dec 11, 2020
0d201d6
removes merge markers
pgierz Dec 11, 2020
4fe0bf2
Merge pull request #51 from esm-tools/feature/color_diff
pgierz Dec 11, 2020
5e4eec6
Merge pull request #54 from esm-tools/release
mandresm Dec 11, 2020
b1f0ae2
Merge pull request #57 from esm-tools/hotfix/trace_log_into_log_dir
mandresm Dec 17, 2020
bf821bb
Merge pull request #58 from esm-tools/hotfix/copy_namelists
mandresm Dec 18, 2020
3a7e538
feat(virtual_env_builder): recycles the virtual env if one already ex…
pgierz Dec 31, 2020
379c030
Bump version: 5.0.7 → 5.0.8
denizural Jan 5, 2021
814a430
Merge remote-tracking branch 'origin/fix_ulimit' into develop
denizural Jan 5, 2021
c5d1a19
merged bugfix coupler_yac
Jan 11, 2021
faae85b
Merge pull request #60 from esm-tools/feature/no_duplicate_venv
Jan 11, 2021
fc831c7
Merge pull request #52 from esm-tools/feature/exit_venv
Jan 11, 2021
dd19ed8
Merge pull request #65 from esm-tools/release
denizural Jan 12, 2021
85f93f8
Merge branch 'release' into develop
Jan 13, 2021
3c4df0f
Merge pull request #68 from esm-tools/hotfix/coupling_fields_differen…
mandresm Jan 15, 2021
080d79d
Merge pull request #70 from esm-tools/hotfix/subdirs_on_targets
mandresm Jan 26, 2021
7ec354c
Merge pull request #72 from esm-tools/hotfix/subdirs_on_targets_followup
mandresm Jan 28, 2021
a3da441
allows to define the 'reusable_filetypes' variable inside the general…
mandresm Jan 29, 2021
40190d5
feat: allows multiple srun commands to be placed in the sad file
pgierz Feb 3, 2021
d12178a
Merge pull request #75 from esm-tools/feature/awicm3
mandresm Feb 11, 2021
15dcbf8
fixes the issues with dependencies destroying the editable/branched o…
mandresm Feb 12, 2021
dfd92d9
a syntax fix
mandresm Feb 12, 2021
fb6e237
more fixing, testing finally successful
mandresm Feb 12, 2021
769185f
Merge pull request #76 from esm-tools/fix/venv_with_editable_branched…
mandresm Feb 12, 2021
c5acb25
Merge branch 'feat/multi_srun' into develop
Feb 15, 2021
d269d9d
implementation of EsmToolsDir class
mandresm Feb 15, 2021
0d8a59d
multi_srun issue affecting files without general.multi_srun fixed
mandresm Feb 15, 2021
814fc56
Merge pull request #77 from esm-tools/fix/multi_srun
mandresm Feb 15, 2021
f94edfa
further fixes
mandresm Feb 15, 2021
84383d8
Merge pull request #78 from esm-tools/fix/multi_srun
mandresm Feb 15, 2021
645589b
Merge pull request #79 from esm-tools/fix/multi_srun
mandresm Feb 15, 2021
60d5cfe
Merge branch 'feature/EsmToolsDir_class' into develop
Feb 16, 2021
8e4b874
added the possibility to control the reusable_filetypes both from eac…
mandresm Feb 23, 2021
31a55d3
debugging lines
Feb 23, 2021
d5f0152
fixed it
Feb 23, 2021
af6b102
Bump version: 5.0.14 → 5.0.15
Feb 23, 2021
02bc9f6
Merge branch 'release' into develop
Feb 23, 2021
d544113
fix: date from environmental variable
pgierz Feb 25, 2021
bd1aa11
fix: adds exit to last-minute date parse
pgierz Feb 25, 2021
12b35dc
Merge branch 'fix/date_in_envvar' into develop
pgierz Feb 26, 2021
6da8d4e
Bump version: 5.0.15 → 5.0.16
pgierz Feb 26, 2021
989fc92
Merge pull request #83 from esm-tools/release
mandresm Feb 26, 2021
0eabc86
typo in comments fixed
mandresm Feb 26, 2021
988d284
suggestions by Deniz
mandresm Mar 2, 2021
b9578c8
Merge branch 'feature/model_specific_reusable_filetypes' of https://g…
mandresm Mar 2, 2021
08189dd
Merge pull request #84 from esm-tools/feature/model_specific_reusable…
mandresm Mar 2, 2021
df8e88b
first test for MPI + MPI/OMP parallelization
JanStreffing Mar 10, 2021
474d8a9
second pass at implementing tasksets
JanStreffing Mar 11, 2021
eb968d0
move hostfile creation into the loop to allow for the creaion of more…
JanStreffing Mar 12, 2021
3c0b18a
fixes from call with Deniz
JanStreffing Mar 12, 2021
f71db37
small fixes to make prog and script file match the ksh version
JanStreffing Mar 12, 2021
ee21891
small fix for IFS line
JanStreffing Mar 12, 2021
75011b3
status at the end of the day
JanStreffing Mar 13, 2021
c088b19
corrected switches
JanStreffing Mar 15, 2021
595f5d0
better switches again
JanStreffing Mar 15, 2021
a3fc629
removed unused artibute comment
JanStreffing Mar 15, 2021
a7a2765
is this what you meant, Paul?
JanStreffing Mar 15, 2021
c00adde
doh
JanStreffing Mar 15, 2021
49adcda
solved bug of the content duplication of the hostfile_srun after seve…
mandresm Mar 17, 2021
1b597f4
Merge pull request #86 from esm-tools/slurm_hostlist_dist_arbitrary
JanStreffing Mar 17, 2021
28c1934
some improvements for better readability
denizural Mar 19, 2021
75ec750
allows for writing environment files that can be sourced from pre- an…
mandresm Mar 21, 2021
46406fc
Bump version: 5.0.16 → 5.0.17
denizural Mar 21, 2021
bcdff80
Merge pull request #90 from esm-tools/feature/better_runscript_output
denizural Mar 21, 2021
0962432
overcommit feature: possibility to use less CPUs than the number of p…
denizural Mar 22, 2021
794443e
fix on the SLURM_HOSTFILE variable, now an absolute path
mandresm Mar 23, 2021
d98151c
Merge pull request #89 from esm-tools/refac/esm_environment
mandresm Mar 24, 2021
284178c
Merge branch 'develop' into fix/slurm_hostlist_dis_arbitrary
mandresm Mar 24, 2021
029ff1d
Merge pull request #92 from esm-tools/fix/slurm_hostlist_dis_arbitrary
mandresm Mar 24, 2021
ec809a2
fixes to the venv editable installs
mandresm Mar 31, 2021
0c4b028
fixes to the wheel calls for the virtual environment
mandresm Apr 2, 2021
07fb5d5
Merge pull request #91 from esm-tools/feature/overcommit_option
denizural Apr 8, 2021
c709321
Merge pull request #96 from esm-tools/fix/venv_editable_install
mandresm Apr 8, 2021
cb36307
HIST scenario is working at least for T63
denizural Apr 12, 2021
cda27a3
bugfix for the non-dict variables containing @YEAR@
denizural Jul 14, 2021
be54dbd
added support for need_2years_before & after variables
denizural Jul 19, 2021
5389c6d
feat(namelist): allows user to override streams by checking what is d…
pgierz Jul 28, 2021
51d71e6
Revert "feat(namelist): allows user to override streams by checking w…
Oct 12, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion esm_runscripts/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

__author__ = """Dirk Barbi"""
__email__ = 'dirk.barbi@awi.de'
__version__ = "5.0.14"
__version__ = "5.0.17"

from .sim_objects import *
from .batch_system import *
Expand All @@ -11,6 +11,7 @@
from .compute import *
from .tidy import *
from .prepare import *
from .last_minute import *
from .postprocess import *
from .filelists import *
from .tidy import *
Expand Down
236 changes: 227 additions & 9 deletions esm_runscripts/batch_system.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import os
import textwrap
import sys

import esm_environment
Expand Down Expand Up @@ -60,20 +61,31 @@ def get_batch_header(config):
this_batch_system = config["computer"]
if "sh_interpreter" in this_batch_system:
header.append("#!" + this_batch_system["sh_interpreter"])
tasks = batch_system.calculate_requirements(config)
tasks, nodes = batch_system.calculate_requirements(config)
replacement_tags = [("@tasks@", tasks)]
all_flags = [
"partition_flag",
"time_flag",
"tasks_flag",
"output_flags",
"name_flag",
]
if config["general"].get("taskset", False):
replacement_tags = [("@nodes@", nodes)]
all_flags = [
"partition_flag",
"time_flag",
"nodes_flag",
"output_flags",
"name_flag",
]
else:
all_flags = [
"partition_flag",
"time_flag",
"tasks_flag",
"output_flags",
"name_flag",
]
conditional_flags = [
"accounting_flag",
"notification_flag",
"hyperthreading_flag",
"additional_flags",
"overcommit_flag"
]
if config["general"]["jobtype"] in ["compute", "tidy_and_resume"]:
conditional_flags.append("exclusive_flag")
Expand All @@ -93,10 +105,13 @@ def get_batch_header(config):
@staticmethod
def calculate_requirements(config):
tasks = 0
nodes = 0
if config["general"]["jobtype"] == "compute":
for model in config["general"]["valid_model_names"]:
if "nproc" in config[model]:
tasks += config[model]["nproc"]
if config["general"].get("taskset", False):
nodes +=int((config[model]["nproc"]*config[model]["omp_num_threads"])/config['computer']['cores_per_node'])
elif "nproca" in config[model] and "nprocb" in config[model]:
tasks += config[model]["nproca"] * config[model]["nprocb"]

Expand All @@ -111,14 +126,47 @@ def calculate_requirements(config):

elif config["general"]["jobtype"] == "post":
tasks = 1
return tasks
return tasks, nodes

@staticmethod
def get_environment(config):
environment = []
env = esm_environment.environment_infos("runtime", config)
return env.commands

@staticmethod
def determine_nodelist(config):
setup_name = config['general']['setup_name']
if config['general'].get('multi_srun'):
for run_type in config['general']['multi_srun']:
print(run_type)
total_tasks = 0
for model in config['general']['multi_srun'][run_type]['models']:
print(total_tasks)
# determine how many nodes that component needs
if "nproc" in config[model]:
print("Adding to total_tasks")
total_tasks += int(config[model]["nproc"])
print(total_tasks)
elif "nproca" in config[model] and "nprocb" in config[model]:
print("Adding to total_tasks")
total_tasks += int(config[model]["nproca"])*int(config[model]["nprocb"])
print(total_tasks)

# KH 30.04.20: nprocrad is replaced by more flexible
# partitioning using nprocar and nprocbr
if "nprocar" in config[model] and "nprocbr" in config[model]:
if config[model]["nprocar"] != "remove_from_namelist" and config[model]["nprocbr"] != "remove_from_namelist":
print("Adding to total_tasks")
total_tasks += config[model]["nprocar"] * config[model]["nprocbr"]
print(total_tasks)

else:
continue
config['general']['multi_srun'][run_type]['total_tasks'] = total_tasks
print(config['general']['multi_srun'])


@staticmethod
def get_extra(config):
extras = []
Expand Down Expand Up @@ -151,11 +199,16 @@ def get_run_commands(config): # here or in compute.py?
commands.append(
"echo " + line + " >> " + config["general"]["experiment_log_file"]
)
if config['general'].get('multi_srun'):
return get_run_commands_multisrun(config, commands)
commands.append("time " + batch_system["execution_command"] + " &")
return commands



@staticmethod
def get_submit_command(config, sadfilename):
# FIXME(PG): Here we need to include a multi-srun thing
commands = []
batch_system = config["computer"]
if "submit" in batch_system:
Expand All @@ -175,6 +228,8 @@ def write_simple_runscript(config):
sadfilename = batch_system.get_sad_filename(config)
header = batch_system.get_batch_header(config)
environment = batch_system.get_environment(config)
# NOTE(PG): This next line allows for multi-srun simulations:
batch_system.determine_nodelist(config)
extra = batch_system.get_extra(config)

if config["general"]["verbose"]:
Expand All @@ -200,6 +255,11 @@ def write_simple_runscript(config):
print("ERROR -- Not sure if you were in a contained or open run!")
print("ERROR -- See write_simple_runscript for the code causing this.")
sys.exit(1)

if "modify_config_file_abspath" in config["general"]:
if config["general"]["modify_config_file_abspath"]:
tidy_call += " -m " + config["general"]["modify_config_file_abspath"]

elif config["general"]["jobtype"] == "post":
tidy_call = ""
commands = config["general"]["post_task_list"]
Expand All @@ -214,12 +274,40 @@ def write_simple_runscript(config):
sadfile.write(line + "\n")
sadfile.write("\n")
sadfile.write("cd " + config["general"]["thisrun_work_dir"] + "\n")
if config["general"].get("taskset", False):
sadfile.write("\n"+"#Creating hostlist for MPI + MPI&OMP heterogeneous parallel job" + "\n")
sadfile.write("rm -f ./hostlist" + "\n")
sadfile.write(f"export SLURM_HOSTFILE={config['general']['thisrun_work_dir']}/hostlist\n")
sadfile.write("IFS=$'\\n'; set -f" + "\n")
sadfile.write("listnodes=($(< <( scontrol show hostnames $SLURM_JOB_NODELIST )))"+"\n")
sadfile.write("unset IFS; set +f" + "\n")
sadfile.write("rank=0" + "\n")
sadfile.write("current_core=0" + "\n")
sadfile.write("current_core_mpi=0" + "\n")
for model in config["general"]["valid_model_names"]:
if model != "oasis3mct":
sadfile.write("mpi_tasks_"+model+"="+str(config[model]["nproc"])+ "\n")
sadfile.write("omp_threads_"+model+"="+str(config[model]["omp_num_threads"])+ "\n")
import pdb
#pdb.set_trace()
sadfile.write("for model in " + str(config["general"]["valid_model_names"])[1:-1].replace(',', '').replace('\'', '') +" ;do"+ "\n")
sadfile.write(" eval nb_of_cores=\${mpi_tasks_${model}}" + "\n")
sadfile.write(" eval nb_of_cores=$((${nb_of_cores}-1))" + "\n")
sadfile.write(" for nb_proc_mpi in `seq 0 ${nb_of_cores}`; do" + "\n")
sadfile.write(" (( index_host = current_core / " + str(config["computer"]["cores_per_node"]) +" ))" + "\n")
sadfile.write(" host_value=${listnodes[${index_host}]}" + "\n")
sadfile.write(" (( slot = current_core % " + str(config["computer"]["cores_per_node"]) +" ))" + "\n")
sadfile.write(" echo $host_value >> hostlist" + "\n")
sadfile.write(" (( current_core = current_core + omp_threads_${model} ))" + "\n")
sadfile.write(" done" + "\n")
sadfile.write("done" + "\n\n")
for line in commands:
sadfile.write(line + "\n")
sadfile.write("process=$! \n")
sadfile.write("cd " + config["general"]["experiment_scripts_dir"] + "\n")
sadfile.write(tidy_call + "\n")


config["general"]["submit_command"] = batch_system.get_submit_command(
config, sadfilename
)
Expand All @@ -234,8 +322,29 @@ def write_simple_runscript(config):
six.print_("Contents of ", self.bs.filename, ":")
with open(self.bs.filename, "r") as fin:
print(fin.read())

# Write the environment in a file that can be sourced from preprocessing and
# postprocessing scripts
batch_system.write_env(config, environment, sadfilename)

return config

@staticmethod
def write_env(config, environment, sadfilename):
folder = config["general"]["thisrun_scripts_dir"]
this_batch_system = config["computer"]
sadfilename_short = sadfilename.split("/")[-1]
envfilename = folder + "/env.sh"

with open(envfilename, "w") as envfile:
if "sh_interpreter" in this_batch_system:
envfile.write("#!" + this_batch_system["sh_interpreter"] + "\n")
envfile.write(f"# ENVIRONMENT used in {sadfilename_short}\n")
envfile.write("# Use this file to source the environment in your\n")
envfile.write("# preprocessing or postprocessing scripts\n\n")
for line in environment:
envfile.write(line + "\n")

@staticmethod
def submit(config):
if not config["general"]["check"]:
Expand All @@ -256,3 +365,112 @@ def submit(config):
)
print()
return config


def get_run_commands_multisrun(config, commands):
default_exec_command = config['computer']["execution_command"]
print("---> This is a multi-srun job.")
print("The default command:")
print(default_exec_command)
print("Will be replaced")
# Since I am already confused, I need to write comments.
#
# The next part is actually a shell script fragment, which will be injected
# into the "sad" file. sad = Sys Admin Dump. It's sad :-(
#
# In this part, we figure out what compute nodes we are using so we can
# specify nodes for each srun command. That means, ECHAM+FESOM will use one
# pre-defined set of nodes, PISM another, and so on. That should be general
# enough to also work for other model combos...
#
# Not sure if this is specific to Mistral as a HPC, Slurm as a batch
# system, or whatever else might pop up...
# @Dirk, please move this where you see it best (I guess slurm.py)
job_node_extraction = r"""
# Job nodes extraction
nodeslurm=$SLURM_JOB_NODELIST
echo "nodeslurm = ${nodeslurm}"
# Get rid of the hostname and surrounding brackets:
tmp=${nodeslurm#"*["}
nodes=${tmp%]*}
# Turn it into an array seperated by newlines:
myarray=(`echo ${nodes} | sed 's/,/\n/g'`)
#
idx=0
for element in "${myarray[@]}"; do
if [[ "$element" == *"-"* ]]; then
array=(`echo $element | sed 's/-/\n/g'`)
for node in $(seq ${array[0]} ${array[1]}); do
nodelist[$idx]=${node}
idx=${idx}+1
done
else
nodelist[$idx]=${element}
idx=${idx}+1
fi
done

for element in "${nodelist[@]}"; do
echo "${element}"
done
"""

def assign_nodes(run_type, need_length=False, start_node=0, num_nodes_first_model=0):
template = f"""
# Assign nodes for {run_type}
{run_type}=""
%%NEED_LENGTH%%
for idx in $srbseq {start_node} $srbsrb???-1erberberb; do
if ssbssb $idx == $srbsrb???-1erberb esbesb; then
{run_type}="$scb{run_type}ecb$scbnodelist[$idx]ecb"
else
{run_type}="$scb{run_type}ecb$scbnodelistssb$idxesbecb,"
fi
done
echo "{run_type} nodes: $scb{run_type}ecb"
"""
# Since Python f-strings and other braces don't play nicely together,
# we replace some stuff:
#
# For the confused:
# scb = start curly brace {
# ecb = end curly brace }
# ssb = start square brace [
# esb = end square brace ]
# srb = start round brace (
# erb = end round brace )
template = template.replace("scb", "{")
template = template.replace("ecb", "}")
template = template.replace("ssb", "[")
template = template.replace("esb", "]")
template = template.replace("srb", "(")
template = template.replace("erb", ")")
# Get rid of the starting spaces (they come from Python as the string
# is defined inside of this function which is indented (facepalm))
template = textwrap.dedent(template)
# TODO: Some replacements
if need_length:
length_stuff = r"length=${#nodelist[@]}"
template = template.replace("%%NEED_LENGTH%%", length_stuff)
template = template.replace("???", "length")
else:
template = template.replace("%%NEED_LENGTH%%", "")
template = template.replace("???", str(num_nodes_first_model))
return template


commands.append(textwrap.dedent(job_node_extraction))
for idx, run_type in enumerate(config['general']['multi_srun']):
if idx == 0:
start_node = run_type
num_nodes_first_model = config['general']['multi_srun'][run_type]['total_tasks'] / config['computer']['cores_per_node']
num_nodes_first_model = int(num_nodes_first_model)
nodes = assign_nodes(run_type, need_length=False, num_nodes_first_model=num_nodes_first_model)
else:
nodes = assign_nodes(run_type, need_length=True, start_node=start_node)
commands.append(nodes)
for run_type in config['general']['multi_srun']:
new_exec_command = default_exec_command.replace("hostfile_srun", config['general']['multi_srun'][run_type]['hostfile'])
new_exec_command += f" --nodelist ${run_type}"
commands.append("time " + new_exec_command + " &")
return commands
17 changes: 13 additions & 4 deletions esm_runscripts/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,14 @@ def parse_shargs():
action="store_true",
)

parser.add_argument(
"--modify-config",
"-m",
dest="modify",
help="[m]odify configuration",
default="", # kh 15.07.20 "usermods.yaml"
)

parser.add_argument(
"-j",
"--last_jobtype",
Expand Down Expand Up @@ -122,6 +130,7 @@ def main():
verbose = False
inspect = None
use_venv = None
modify_config_file = None

parsed_args = vars(ARGS)

Expand Down Expand Up @@ -153,10 +162,8 @@ def main():
use_venv = parsed_args["contained_run"]
if parsed_args["open_run"] is not None:
use_venv = not parsed_args["open_run"]




if "modify" in parsed_args:
modify_config_file = parsed_args["modify"]

command_line_config = {}
command_line_config["check"] = check
Expand All @@ -170,6 +177,8 @@ def main():
command_line_config["verbose"] = verbose
command_line_config["inspect"] = inspect
command_line_config["use_venv"] = use_venv
if modify_config_file:
command_line_config["modify_config_file"] = modify_config_file

command_line_config["original_command"] = original_command.strip()
command_line_config["started_from"] = os.getcwd()
Expand Down
Loading