Lightweight setup to submit CMSSW jobs with Crab for large-scale production
In the following, you will find some general description of the package. For explicit production examples, please have a look at the production_examples folder.
A valid CMSSW checkout and environment can be obtained via:
source /cvmfs/cms.cern.ch/cmsset_default.sh;
cmsrel CMSSW_${RELEASE_NUMBER}
cd CMSSW_${RELEASE_NUMBER}/src
cmsenvFor ${RELEASE_NUMBER} as of now, the latest stable one would be (has a memory leak...) 14_2_0_pre213_0_21.
Feel free to choose CMSSW version which suits you most.
You need the ability to source crab3 software and create a VOMS proxy certificate via:
source /cvmfs/cms.cern.ch/common/crab-setup.sh
export CRAB3_CACHE_FILE=/dev/null
voms-proxy-init --valid 192:00:00 --voms cms:/cms/dcms --rfcCheck out this setup:
git clone git@github.com:KIT-CMS/CMSSWWithCrab.gitConsidering the example of nanoAOD production.
The script create_configs.py creates cmsRun configs via cmsDriver.py, which are processed further
to create crab3 configs per dataset to process. The script needs the following information in .yaml format:
- Conditions for processing CMS data. Example:
configuration/conditions.yaml - CMSSW specific settings to be passed to cmsDriver commands. Example:
configuration/cmsdriver_nanoaod_specifics.yaml - Datasets to process. Example:
configuration/datasets_miniaod_boostedhtt_2018UL.yaml
In addition, general crab3 configuration is provided via crab_configuration/crab_template.py.
Please get familiar with these files and their structure to be able to construct corresponding files for your use-case.
Often it might make sense to run cmsRun multi-threaded (and even with different number of streams). To set this up properly, please use the --nThreads and --nStreams
options of create_configs.py, which are passed to cmsRun. If doing so for the case of grid jobs, you need to take into account
how many threads/streams are usually used within the CMS grid environment. Suggestion: please use 8 cores, since these would better fit into CMSSW job containers,
which are designed for jobs with up to 8 threads.
Furthermore, it might be that depending on these settings, the memory consumption of the job significantly changes.
Feel free to set these numbers differently, e.g. --nThreads 4, but --nStreams 2, or even --nStreams 1.
You are strongly advised to test the cmsRun config locally on a representative input file, check both the runtime and memory consumption, and project
the outcome onto proper requirements for crab3 config.
It might well be that there are differences in performance and resource consumption between processing recorded data samples, or simulated samples. In consequence, please test these two main use-cases separately from each other.
One example of an issue would be a memory leak appearing, for example, only for simulated datasets. Please check in such cases, whether a more suitable/stable CMSSW release can be used without any memory leak.
After creating the cmsRun configs, you should further consider the options available for
crab3 configuration.
For create_configs.py, the following were found to be important and are added explicitly to the script to be passed to the configuration files:
--numCoresregulates the number of requested CPUs. Please choose that consistently with thecmsRunsettings for--nThreadsand--nStreamsto avoid overuse of requested resources.--maxMemoryMBperCoreis what you would like to request for memory per CPU for your job. This number is then multiplied by what was configured in--numCores, and is passed to--maxMemoryMBfrom crab.--splittingand--unitsPerJobdefine, how your input data is distributed among the jobs. Usually,Automaticsplitting is a good choice, and--unitsPerJobrepresents in that case the desired job runtime - try to target something below 24 hours, e.g. 1250 minutes. Please get familiar with this setting. However, since testing acmsRunconfig locall works easiest with a specific number of files or events, you might want to considerFileBasedorEventAwareLumiBasedsplitting to have more control over the runtime and memory consumption of your jobs. This might prevent too frequent resubmissions of jobs.maxJobRuntimeMindefines the maximum expected runtime of your jobs and is passed to crab configuration. Jobs with larger runtimes will be potentially aborted and fail, so try to estimate this number properly.--publicationis a flag, which allows you to publish your output data in CMS DBS underphys03instance. This makes it much simpler to collect lists of output files. Feel free to use that for your actual production campaigns. In case you choose to publish your dataset, it obtains a customoutputDatasetTagused in the dataset name, constructed fromrequestNameand the timestamp integer at the time you are running./create_configs.py. This ensures, that datasets have (more or less) unique names. In that context, please avoid submitting acrab3task, then deleting the task directory after some time, and then resubmitting the same task with the same configuation again. That might lead to duplication of content of published datasets. At each task submission attempt, please ensure, that you use a new configuration (e.g. by re-running./create_configs.py).--siteWhitelistand--siteBlacklistcan be used to have (come) control over which sites are used to run there your jobs. Under ideal circumstances, it would be good to use all available sites where the input files are placed, since restricting to only a few sites via whitelist might lead to a bad task throughput, such that you would need to wait quite long for a production campaign to finish. On the other hand, there might be bad sites with problems, or even with corrupt files which look OK on the outside (yes, that might happen, too...). In such cases, a blacklist of sites can help.
Furthermore, you have to consider a few server-sided restrictions/constraints for jobs running on the grid:
- Usually, jobs are allowed to use up to 2.5GB per core. Submission of jobs with high requirements than that might be refused.
- Increasing the runtime significantly over 24h might lead to lack of available and matching resources. So try to keep the runtimes to be below 24h, e.g. at 12h or 15h.
- Per
crab3task, you are only allowed to have a maximum of 10k jobs. So try to keep the work per job at a large enough amount, e.g. the number of events to process should be kept high enough.
These restrictions have to be taken into account especially when you encounter memory leaks. These usually lead to memory overuse, and consequently, to failing jobs, or a trend towards a too small amount of work per job. The latter consequence is detrimental to the processing of large datasets. A way out of that might be older CMSSW releases, which have proven to be fine for official large-scale productions.
./create_configs.py --work-directory ~/crab_work_dir \
--datasets configuration/datasets_miniaod_boostedhtt_2018UL.yaml \
--conditions configuration/conditions.yaml \
--cmsdriver configuration/cmsdriver_nanoaod_specifics.yaml \
--numCores 4 --nThreads 4 --nStreams 2 --maxMemoryMBperCore 1250 --publicationAfter all required crab3 configuration files were created as expected, you can proceed with managing their submission.
For that purpose, the script crab_manager.py was designed. It processes a list of crab3 configs, with a number of workers running asynchronously in parallel, specified by --nworkers. The parallelization is based on a queue concept. The workflow is as follows:
- After getting a
crab3config from the submission queue, and in case the task directory assigned to thiscrab3config does not exist, the submission of this task is executed. After that, the task directory is available and properly set. To make sure that there are no problems related to parallel processing, the submission is done sequentially for the configs from the submission queue. After the submission was performed, the configuration is passed further to the status queue to process it with the remaining workflow, using multiple workers. - For a properly created task directory, the status is queried regularly, checking the number of all, intermediate, (idle and running), finished, failed, and published jobs.
- In case failed jobs are encountered, and there aren't any intermediate jobs left, a resubmission attempt could be initiated, if at least one of the options
--maxmemoryand--maxjobruntimewere specified, when starting./crab_manager.py. Please be careful when resubmitting to avoid too many attempts by paying attention to specifying good values for these two options. Furthermore,--siteblacklistand--sitewhitelistcan be used to have better control over which sites do resubmit the jobs to. - If everything was processed successfully, it is expected that numbers of all, finished, and published jobs are equal. In that case, the name of the published dataset is printed, and the
crab3task is considered as done, in case the check of the sample statistics is successful. This test involves a comparison of the number of events between the input and the output datasets withdasgoclient, and a check of the actual number of events in the output files withROOT. Please note that the last check can get wrong numbers, if the connection to the files is interrupted for some reason. - If the queue is not empty, the next
crab3config is taken by the worker to be processed.
Information on the progress, including statistics of finished tasks, is included in the logs logs/worker_*.txt.
./crab_manager.py --crab-config-pattern ~/crab_work_dir/crabconfigs/data_2018UL_singlemuon_SingleMuon_Run2018*.py \
--maxjobruntime 1300 \
--maxmemory 8000- A very good monitoring page for crab tasks: https://monit-grafana.cern.ch/d/15468761344/personal-tasks-monitoring-globalview?orgId=11&from=now-1h&to=now. Try to adjust the settings according to your CERN account name, and the time period of interest.