-
Notifications
You must be signed in to change notification settings - Fork 5
Description
I'd like to put the following for discussion (as it bugs me since I got to know the tools):
The file management and directory management should be simplified as tons of files are copied back and forth which makes (for those who didn't code the core parts (compute.py, jobclass.py, ...) very tricky to track down errors. In addition copying large (restart, forcing) files several times may significantly slow down job throughput. Having worked with the MPI-ESM runtime manager mkexp (python with Jinja2 style .config files) I find their file and directory management simpler and more efficient (while other things are horrible in mkexp); so here comes my suggestion:
- Upon start,
esm_runscriptscreates the the directory structureexpid/restart/,expid/outdata/,expid/forcing... like it is done at the moment. - Copy/Link required forcing files for the current run into
expid/forcing. On cold start optionally create a copy ofesm_toolsthere as well. - create a work folder
expid/work/run_XXXX-YYYY/. - copy/link all files (forcing, restart, namelists) required for the current run into
expid/work/run_XXXX-YYYY/. - cd
expid/work/run_XXXX-YYYY/,sbatch ..... - Once done copy only the restart files into
expid/restart/. - trigger a subjob (like the post jobs at the moment) the does the cleanup (i.e. copying outdata, logs etc in place) of
expid/work/run_XXXX-YYYY/following the bullet-proof method used in mkexp (details later) - increment date and go to 2.) and continue until run is done.
And last: Have all logs (model logs, esm_runscript logs, filelist, *finished.yaml, ...) in one place.
I know this against the current philosophy that everything related to the current run shall be in expid/run_XXXX-YYYY/ but it would certainly simplify the complete config dict and hence make error tracking easier.