Private MC Production Guide

Introduction

This guide will explain how to use the resources provided in the GeneratorTools repository to produce private Monte Carlo samples. This guide will explain how to create a gridpack using Madgraph and generate LHE, GENSIM and MiniAOD level files. Be aware that if you have any trouble with any part of this guide you may find help at hypernews.cern.ch

Clone this repository by typing git clone https://github.com/CMSAachen3B/GeneratorTools.git in the directory you would like to clone it into. When I tested these scripts I had this extracted into CMSSW_7_1_20_patch2/src/ so it may be advisable to do this if anything does not work. Note: may be more correct to either clone to 7_4_7 or no CMSSW directory as this caused problems when trying to generate LHE files (as they also used CMSSW_7_1_20_patch2).
Run checkout_madgraph.sh. This should install the latest version of madgraph. note: If this does not work a new version may have come out. You can probably amend this by changing the version written in the script.
The following resources can be used to help get to grips with Madgraph

https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/ManualAndHelp

https://twiki.cern.ch/twiki/bin/view/CMSPublic/MadgraphTutorial
If you plan to use a model that is not already included in madgraph or available online you may need to create your own by editing the files inside the directory /MG5_aMC_v2_5_3/models/usrmod_v4. The README.txt is fairly self explanatory. You can test that this works by using it in event production but you should not use the model when generating the event that you intend to make a gridpack for.
Once you are comfortable with madgraph produce a gridpack using the following guide

https://twiki.cern.ch/twiki/bin/viewauth/CMS/QuickGuideMadGraph5aMCatNLO

and the following repository

https://github.com/mharrend/privateMCproduction

If you would like to make a gridpack for an event you have already generated you can find the cards you need in the event folder. note: you may need to change a card from an LO card into an NLO card. As you will work on lxplus for this step it is useful to know that you can copy files between lxplus and your machine using scp, for example:

scp -r <username>@lxplus.cern.ch:/<file path>/<file> <local directory>

will copy a file from the lxplus machine to your local machine.
If you want to use your model, unzip your gridpack using tar xfJ <gridpackname>.tar.xz -C <gridpackname>, place your model folder in <gridpack folder>/mgbasedir/models, edit /process/madevent/Cards/proc_card_mg5.dat to include the line import model <model name> and add any other processes that may not have been allowed without the model and then zip it back up using tar -cJpsf <gridpackname>.tar.xz mgbasedir process runcmsgrid.sh gridpack_generation.log.

MC Production

Place your gridpack into the data folder CMSAachen3B/GeneratorTools/data/ and edit the appropriate parts of lfv_LHElevelProduction.sh to have it use your gridpack.
You may either edit the lfv private MC production files or copy them and edit the scripts to be intended for only your analysis for the sake of this guide I will assume the user is editing the lfv files. Edit the file named lfv_LHElevelProduction.sh to change the number of events, work directory, gridpack location and whether or not to use CRAB. These parameters are all located at the very top of the code and should be quite easy to find. The number of events is fairly self explanatory, the gridpack location is the relative directory of the gridpack, the work directory is the relative directory in which you want to install the various CMSSW versions required and it is recommended to run the code once with USECRAB set to false and then set USECRAB to true (to make sure that the events are produced correctly locally before submitting to the grid and potentially publishing bad data). (NOTE: all directories specified in the code should be relative directories.
If you are submitting to CRAB you should change config.Data.outputPrimaryDataset in python/lfv/lfv_LHEcrabConfig.py. These primary dataset names should be fairly descriptive, for example: I use LFV_ZToL1L2_13TeV_madgraph_pythia8. As a rule you can follow the convention but you can look at other datasets on dbs in order to get an idea. It is important to make sure to change config.Data.outputDatasetTag and config.Data.outputPrimaryDataset because if these are the same as a dataset already on DAS the old dataset will be overwritten. If you intend to overwrite old files which may have been incorrectly produced you can keep these values the same but if the name of the output files is different (e.g. you change MiniAOD.root to miniaod.root) or if there are more output files in the dataset you intend to overwrite you should invalidate the old files that remain using this guide.
Be sure that your grid certificate is initialised (check with voms-proxy-info and if it is not initialised use voms-proxy-init or myvomsproxyinit) and then type ./lfv_LHElevelProduction.sh in order to produce LHE level events.
The CRAB jobs will tend to fail for various reasons. One such reason is that sometimes more memory is required to perform the task than expected. After the jobs are submitted and running you can use this command to resubmit the ones which fail while true; do crab resubmit <directory of crab.log> --maxmemory 6000; sleep 300; done. The crab.log file can usually be found in the CMSSW base used for the production step and then under src/crab_projects/crab_privateMCProduction... This command resubmits failed jobs with more memory available, waits 5 minutes and then tries again. Depending on how many events you are generating it may take some time for these jobs to finish. It is recommended that you run these jobs over night or over the weekend.
After the LHE events have been generated you should paste the DAS url or directory of the LHE files into lfv_GENSIMlevelProduction.sh and also edit the crab config file as appropriate. You can find the DAS url by searching for your dataset on DAS or by using crab status <directory of crab.log>. Run this script similarly to the LHE level production script and once again run the resubmission script.
Repeat step 6 for the lfv_MiniAODlevelProduction.sh script.
After this step the MiniAOD files should be finished and you can begin skimming and artus analysis.

File Structure

Run Script

These scripts, when given the correct inputs should submit crab jobs that produce LHE, GENSIM or MiniAOD root files. They edit the various other files to contain information required to submit the crab jobs and perform the necessary tasks on the node and also set up the required CMSSW versions.

Crab Config

The crab config files are normally used in this command crab submit <crabconfig.py> which is in our case included in the run files. This file tells CRAB exactly what to do and how to submit the jobs to the grid.

Node Script

The node script contains the tasks to be carried out by the node computers. In most cases only tells node to use the PSet file but lines can be added here for debugging purposes.

PSet file

The PSet files tell the cmsRun command what to do on the node machines. Gives the input file names and output file names and tells the node computers ho exactly to manipulate the input and output root files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Private MC Production Guide

Introduction

Table of Contents

1. Requirements

2. Madgraph & Gridpack

3. MC Production

4. File Structure

Requirements

Madgraph & Gridpack