-
Notifications
You must be signed in to change notification settings - Fork 4
Private MC Production Guide
This guide will explain how to use the resources provided in the GeneratorTools repository to produce private Monte Carlo samples. This guide will explain how to create a gridpack using Madgraph and generate LHE, GENSIM and MiniAOD level files. Be aware that if you have any trouble with any part of this guide you may find help at hypernews.cern.ch
- Grid Certificate
- CERN computing account
- Access to LXPLUS
-
Clone this repository by typing
git clone https://github.com/CMSAachen3B/GeneratorTools.gitin the directory you would like to clone it into. When I tested these scripts I had this extracted into CMSSW_7_1_20_patch2/src/ so it may be advisable to do this if anything does not work. Note: may be more correct to either clone to 7_4_7 or no CMSSW directory as this caused problems when trying to generate LHE files (as they also used CMSSW_7_1_20_patch2). -
Run
checkout_madgraph.sh. This should install the latest version of madgraph. note: If this does not work a new version may have come out. You can probably amend this by changing the version written in the script. -
The following resources can be used to help get to grips with Madgraph
https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/ManualAndHelp
https://twiki.cern.ch/twiki/bin/view/CMSPublic/MadgraphTutorial
-
If you plan to use a model that is not already included in madgraph or available online you may need to create your own by editing the files inside the directory
/MG5_aMC_v2_5_3/models/usrmod_v4. TheREADME.txtis fairly self explanatory. You can test that this works by using it in event production but you should not use the model when generating the event that you intend to make a gridpack for. -
Once you are comfortable with madgraph produce a gridpack using the following guide
https://twiki.cern.ch/twiki/bin/viewauth/CMS/QuickGuideMadGraph5aMCatNLO
and the following repository
https://github.com/mharrend/privateMCproduction
If you would like to make a gridpack for an event you have already generated you can find the cards you need in the event folder. note: you may need to change a card from an LO card into an NLO card. As you will work on lxplus for this step it is useful to know that you can copy files between lxplus and your machine using
scp, for example:scp -r <username>@lxplus.cern.ch:/<file path>/<file> <local directory>will copy a file from the lxplus machine to your local machine.
-
If you want to use your model, unzip your gridpack using
tar xfJ <gridpackname>.tar.xz -C <gridpackname>, place your model folder in<gridpack folder>/mgbasedir/models, edit/process/madevent/Cards/proc_card_mg5.datto include the lineimport model <model name>and add any other processes that may not have been allowed without the model and then zip it back up usingtar -cJpsf <gridpackname>.tar.xz mgbasedir process runcmsgrid.sh gridpack_generation.log.
-
Place your gridpack into the data folder
CMSAachen3B/GeneratorTools/data/and edit the appropriate parts oflfv_LHElevelProduction.shto have it use your gridpack. -
You may either edit the lfv private MC production files or copy them and edit the scripts to be intended for only your analysis for the sake of this guide I will assume the user is editing the lfv files. Edit the file named
lfv_LHElevelProduction.shto change the number of events, work directory, gridpack location and whether or not to use CRAB. These parameters are all located at the very top of the code and should be quite easy to find. The number of events is fairly self explanatory, the gridpack location is the relative directory of the gridpack, the work directory is the relative directory in which you want to install the various CMSSW versions required and it is recommended to run the code once with USECRAB set to false and then set USECRAB to true (to make sure that the events are produced correctly locally before submitting to the grid and potentially publishing bad data). (NOTE: all directories specified in the code should be relative directories. -
If you are submitting to CRAB you should change
config.Data.outputPrimaryDatasetinpython/lfv/lfv_LHEcrabConfig.py. These primary dataset names should be fairly descriptive, for example: I use LFV_ZToL1L2_13TeV_madgraph_pythia8. As a rule you can follow the convention but you can look at other datasets on dbs in order to get an idea. It is important to make sure to changeconfig.Data.outputDatasetTagandconfig.Data.outputPrimaryDatasetbecause if these are the same as a dataset already on DAS the old dataset will be overwritten. If you intend to overwrite old files which may have been incorrectly produced you can keep these values the same but if the name of the output files is different (e.g. you change MiniAOD.root to miniaod.root) or if there are more output files in the dataset you intend to overwrite you should invalidate the old files that remain using this guide. -
Be sure that your grid certificate is initialised (check with
voms-proxy-infoand if it is not initialised usevoms-proxy-initormyvomsproxyinit) and then type./lfv_LHElevelProduction.shin order to produce LHE level events. -
The CRAB jobs will tend to fail for various reasons. One such reason is that sometimes more memory is required to perform the task than expected. After the jobs are submitted and running you can use this command to resubmit the ones which fail
while true; do crab resubmit <directory of crab.log> --maxmemory 6000; sleep 300; done. The crab.log file can usually be found in the CMSSW base used for the production step and then undersrc/crab_projects/crab_privateMCProduction...This command resubmits failed jobs with more memory available, waits 5 minutes and then tries again. Depending on how many events you are generating it may take some time for these jobs to finish. It is recommended that you run these jobs over night or over the weekend. -
After the LHE events have been generated you should paste the DAS url or directory of the LHE files into
lfv_GENSIMlevelProduction.shand also edit the crab config file as appropriate. You can find the DAS url by searching for your dataset on DAS or by usingcrab status <directory of crab.log>. Run this script similarly to the LHE level production script and once again run the resubmission script. -
Repeat step 6 for the
lfv_MiniAODlevelProduction.shscript. -
After this step the MiniAOD files should be finished and you can begin skimming and artus analysis.
These scripts, when given the correct inputs should submit crab jobs that produce LHE, GENSIM or MiniAOD root files. They edit the various other files to contain information required to submit the crab jobs and perform the necessary tasks on the node and also set up the required CMSSW versions.
The crab config files are normally used in this command crab submit <crabconfig.py> which is in our case included in the run files. This file tells CRAB exactly what to do and how to submit the jobs to the grid.
The node script contains the tasks to be carried out by the node computers. In most cases only tells node to use the PSet file but lines can be added here for debugging purposes.
The PSet files tell the cmsRun command what to do on the node machines. Gives the input file names and output file names and tells the node computers ho exactly to manipulate the input and output root files.