-
Notifications
You must be signed in to change notification settings - Fork 4.6k
EXOnanoAODv1 producers and setup #48502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48502/45439
|
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48502/45445
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48502/45446
|
|
Pull request #48502 was updated. |
|
enable nano |
|
please test |
|
-1 Failed Tests: Build ClangBuild BuildI found compilation warning when building: See details on the summary page. Clang BuildI found compilation warning while trying to compile with clang. Command used: See details on the summary page. |
|
please test |
|
-1 Failed Tests: RelVals-NANO Failed RelVals-NANO
Comparison SummarySummary:
|
|
The tests failed because the displaced tau training data is not yet merged in cms-data. Should we merge that PR first or can we test this PR together with cms-data/RecoTauTag-TrainingFiles#15? |
|
please test with cms-data/RecoTauTag-TrainingFiles#15 |
|
@kerstinlovisa I've sent the test with cms-data/RecoTauTag-TrainingFiles#15, but note that the tests there were failing. So there may be something you need to act on. |
|
-1 Failed Tests: RelVals-INPUT The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Failed RelVals-INPUT
Comparison SummarySummary:
|
| if (!boost::filesystem::is_regular_file(graphPath_)) { | ||
| throw std::runtime_error("DisTauTag model not found: " + graphPath_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to use edm::FileInPath (already as the type of the graphPath configuration parameter), see
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEdmFileInPath
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideAboutPythonConfigFile
Then the file would be found also with patch releases or when running cmsRun in other directory than $CMSSW_BASE/src.
| } | ||
|
|
||
| void DisTauTag::initializeTensorFlow() const { | ||
| std::lock_guard<std::mutex> lock(session_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mutex must be avoided.
makortel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mutex must be avoided. These comments give a direction how that could be done.
| std::vector<float> v_score1(jets_size, -9); | ||
|
|
||
| if (!session_) { | ||
| initializeTensorFlow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can initializeTensorFlow() be called from the constructor? Then the mutex would not be needed for that case.
| std::lock_guard<std::mutex> lock(session_mutex_); | ||
| tensorflow::run(session_, {{"input_1", input_1}, {"input_2", input_2}}, {"final_out"}, &outputs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensorflow::run() is thread safe, so the lock is not needed.
| } | ||
|
|
||
| void DisTauTag::cleanupTensorFlow() { | ||
| std::lock_guard<std::mutex> lock(session_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock is not needed here. That would be more clear if the content of this function would be moved to the destructor (there is no way a destructor could be called concurrently).
makortel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further suggestions for some simplification opportunities that caught my eye (for your consideration).
| @@ -0,0 +1,191 @@ | |||
| namespace Scaling { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespaces should start with lower case letter (see 2.7 in https://cms-sw.github.io/cms_coding_rules.html)
Also the include guard is missing (4.1 in https://cms-sw.github.io/cms_coding_rules.html)
| }; | ||
| }; // namespace Scaling | ||
|
|
||
| namespace Setup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setup is a very common name (that should start with lower case), so there is a risk for name collisions.
| edm::Handle<std::vector<reco::Track>> dsaMuonHandle; | ||
| iEvent.getByToken(dsaMuonTag_, dsaMuonHandle); | ||
| edm::Handle<std::vector<pat::Muon>> muonHandle; | ||
| iEvent.getByToken(muonTag_, muonHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like these could be
| edm::Handle<std::vector<reco::Track>> dsaMuonHandle; | |
| iEvent.getByToken(dsaMuonTag_, dsaMuonHandle); | |
| edm::Handle<std::vector<pat::Muon>> muonHandle; | |
| iEvent.getByToken(muonTag_, muonHandle); | |
| const std::vector<reco::Track>& dsaMuon = iEvent.get(dsaMuonTag_); | |
| const std::vector<pat::Muon>& muonHandle = iEvent.get(muonTag_); |
same for the other Handle+getByToken() pairs (while they work, this suggestion would be less code and tiny bit faster by avoiding the checks in Handle de-references).
| GlobalPoint beamSpot(bs.x(), bs.y(), bs.z()); | ||
| reco::Vertex beamSpotVertex(beamSpotHandle->position(), beamSpotHandle->covariance3D()); | ||
|
|
||
| edm::ESHandle<TransientTrackBuilder> builder = iSetup.getHandle(transientTrackBuilderToken_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be
| edm::ESHandle<TransientTrackBuilder> builder = iSetup.getHandle(transientTrackBuilderToken_); | |
| const TransientTrackBuilder& builder = iSetup.getData(transientTrackBuilderToken_); |
| } | ||
|
|
||
| void DisTauTag::produce(edm::StreamID, edm::Event& event, const edm::EventSetup& setup) const { | ||
| auto jets = getHandle(event, jets_token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the Handle property is used, so this could be
| auto jets = getHandle(event, jets_token); | |
| auto jets = event.getHandle(jets_token); |
after which the local getHandle() function is not needed anymore.
| edm::EDGetTokenT<reco::VertexCollection> vtxTag_; | ||
| edm::EDGetTokenT<reco::VertexCollection> secVtxTag_; | ||
|
|
||
| const TransientTrackBuilder* theB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theB seems to be used only inside the produce() function, so this data member does not seem to be needed.
| ~DispJetTableProducer() override {} | ||
|
|
||
| void produce(edm::Event& iEvent, const edm::EventSetup& iSetup) override { | ||
| theB = &iSetup.getData(ttbToken_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e.
| theB = &iSetup.getData(ttbToken_); | |
| const TransientTrackBuilder* theB = &iSetup.getData(ttbToken_); |
or
| theB = &iSetup.getData(ttbToken_); | |
| const TransientTrackBuilder& theB = iSetup.getData(ttbToken_); |
| edm::Handle<std::vector<pat::Electron>> electronHandle; | ||
| iEvent.getByToken(electronTag_, electronHandle); | ||
|
|
||
| edm::Handle<std::vector<pat::Muon>> muonHandle; | ||
| iEvent.getByToken(muonTag_, muonHandle); | ||
|
|
||
| edm::Handle<reco::VertexCollection> primaryVertexHandle; | ||
| iEvent.getByToken(vtxTag_, primaryVertexHandle); | ||
|
|
||
| edm::Handle<reco::VertexCollection> secondaryVertexHandle; | ||
| iEvent.getByToken(secVtxTag_, secondaryVertexHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like these could be
| edm::Handle<std::vector<pat::Electron>> electronHandle; | |
| iEvent.getByToken(electronTag_, electronHandle); | |
| edm::Handle<std::vector<pat::Muon>> muonHandle; | |
| iEvent.getByToken(muonTag_, muonHandle); | |
| edm::Handle<reco::VertexCollection> primaryVertexHandle; | |
| iEvent.getByToken(vtxTag_, primaryVertexHandle); | |
| edm::Handle<reco::VertexCollection> secondaryVertexHandle; | |
| iEvent.getByToken(secVtxTag_, secondaryVertexHandle); | |
| const std::vector<pat::Electron>& electrons = iEvent.get(electronTag_); | |
| const std::vector<pat::Muon>& muons = iEvent.get(muonTag_); | |
| const reco::VertexCollection& primaryVertex = iEvent.get(vtxTag_); | |
| const reco::VertexCollection& secondaryVertex = iEvent.get(secVtxTag_); |
| std::vector<int> el_idx; | ||
| std::vector<bool> el_lIVF_match; | ||
| std::vector<int> el_IVF_df, el_IVF_ntracks, el_IVF_elid; | ||
| std::vector<float> el_IVF_x, el_IVF_y, el_IVF_z, el_IVF_cx, el_IVF_cy, el_IVF_cz, el_IVF_chi2, el_IVF_pt, el_IVF_eta, | ||
| el_IVF_phi, el_IVF_E, el_IVF_mass; | ||
| std::vector<int> el_IVF_trackcharge, el_IVF_trackelid, el_IVF_trackvtxid; | ||
| std::vector<float> el_IVF_trackpt, el_IVF_tracketa, el_IVF_trackphi, el_IVF_trackE, el_IVF_trackdxy, el_IVF_trackdz; | ||
| std::vector<float> el_IVF_tracksignedIP2D, el_IVF_tracksignedIP3D, el_IVF_tracksignedIP2Dsig, | ||
| el_IVF_tracksignedIP3Dsig; | ||
| std::vector<float> el_IVF_signedIP2D, el_IVF_signedIP3D, el_IVF_signedIP2Dsig, el_IVF_signedIP3Dsig; | ||
|
|
||
| std::vector<int> mu_idx; | ||
| std::vector<bool> mu_lIVF_match; | ||
| std::vector<int> mu_IVF_df, mu_IVF_ntracks, mu_IVF_muid; | ||
| std::vector<float> mu_IVF_x, mu_IVF_y, mu_IVF_z, mu_IVF_cx, mu_IVF_cy, mu_IVF_cz, mu_IVF_chi2, mu_IVF_pt, mu_IVF_eta, | ||
| mu_IVF_phi, mu_IVF_E, mu_IVF_mass; | ||
| std::vector<int> mu_IVF_trackcharge, mu_IVF_trackmuid, mu_IVF_trackvtxid; | ||
| std::vector<float> mu_IVF_trackpt, mu_IVF_tracketa, mu_IVF_trackphi, mu_IVF_trackE, mu_IVF_trackdxy, mu_IVF_trackdz; | ||
| std::vector<float> mu_IVF_tracksignedIP2D, mu_IVF_tracksignedIP3D, mu_IVF_tracksignedIP2Dsig, | ||
| mu_IVF_tracksignedIP3Dsig; | ||
| std::vector<float> mu_IVF_signedIP2D, mu_IVF_signedIP3D, mu_IVF_signedIP2Dsig, mu_IVF_signedIP3Dsig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these be local variables in the produce() function? As data members they take memory throughout the job.
| edm::Handle<double> rhoHandle; | ||
| iEvent.getByToken(rhoTag_, rhoHandle); | ||
| edm::Handle<std::vector<pat::Electron>> electrons; | ||
| iEvent.getByToken(electronTag_, electrons); | ||
| edm::Handle<reco::VertexCollection> primaryVertices; | ||
| iEvent.getByToken(vtxTag_, primaryVertices); | ||
| edm::Handle<std::vector<pat::Jet>> jetHandle; | ||
| iEvent.getByToken(jetTag_, jetHandle); | ||
| edm::Handle<std::vector<pat::Jet>> jetFatHandle; | ||
| iEvent.getByToken(jetFatTag_, jetFatHandle); | ||
| edm::Handle<std::vector<pat::Jet>> jetSubHandle; | ||
| iEvent.getByToken(jetSubTag_, jetSubHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this could be
| edm::Handle<double> rhoHandle; | |
| iEvent.getByToken(rhoTag_, rhoHandle); | |
| edm::Handle<std::vector<pat::Electron>> electrons; | |
| iEvent.getByToken(electronTag_, electrons); | |
| edm::Handle<reco::VertexCollection> primaryVertices; | |
| iEvent.getByToken(vtxTag_, primaryVertices); | |
| edm::Handle<std::vector<pat::Jet>> jetHandle; | |
| iEvent.getByToken(jetTag_, jetHandle); | |
| edm::Handle<std::vector<pat::Jet>> jetFatHandle; | |
| iEvent.getByToken(jetFatTag_, jetFatHandle); | |
| edm::Handle<std::vector<pat::Jet>> jetSubHandle; | |
| iEvent.getByToken(jetSubTag_, jetSubHandle); | |
| const double rho = iEvent.get(rhoTag_); | |
| const std::vector<pat::Electron>& electrons = iEvent.get(electronTag_); | |
| const reco::VertexCollection& primaryVertices = iEvent.get(vtxTag_); | |
| edm::Handle<std::vector<pat::Jet>> jetHandle = iEvent.getHandle(jetTag_); | |
| edm::Handle<std::vector<pat::Jet>> jetFatHandle = iEvent.getHandle(jetFatTag_); | |
| edm::Handle<std::vector<pat::Jet>> jetSubHandle = iEvent.getHandle(jetSubTag_); |
(also similarly elsewhere, but I stop commenting here)
|
Are the changes of this PR such that 100-200 MB memory increase would be expected? The max memory report in #48502 (comment) and cms-data/RecoTauTag-TrainingFiles#16 (comment) seem to suggest that, but the two tests report such increases in different workflows, that makes me wonder if the effect is real or if the magnitude of the "usual variations" has grown (#46966). |
|
Milestone for this pull request has been moved to CMSSW_16_1_X. Please open a backport if it should also go in to CMSSW_16_0_X. |
battibass
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As promised, some review of the muon part from my side comes now, apologies for the belated action.
I will also have a look at the rest of the code in the coming days.
| int getMatches(const pat::Muon& muon, const reco::Track& dsaMuon, float minPositionDiff = 1e-6) const; | ||
| int getDTMatches(const pat::Muon& muon, const reco::Track& dsaMuon, float minPositionDiff = 1e-6) const; | ||
| int getCSCMatches(const pat::Muon& muon, const reco::Track& dsaMuon, float minPositionDiff = 1e-6) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three functions are essentially performing the same logic modulo minor differences. Such approach implies repeating almost-identical loops multiple times.
I would suggest writing a single function that accumulates on a struct containing three int reporting about all matches, DT matches and CSC matches, and returning the struct.
Also you could even store in the struct just two int if all matches is fully redundant.
| pt.push_back(dsaMuon.pt()); | ||
| ptErr.push_back(dsaMuon.ptError()); | ||
| eta.push_back(dsaMuon.eta()); | ||
| etaErr.push_back(dsaMuon.etaError()); | ||
| phi.push_back(dsaMuon.phi()); | ||
| phiErr.push_back(dsaMuon.phiError()); | ||
| charge.push_back(dsaMuon.charge()); | ||
| dxy.push_back(dsaMuon.dxy()); | ||
| dz.push_back(dsaMuon.dz()); | ||
| vx.push_back(dsaMuon.vx()); | ||
| vy.push_back(dsaMuon.vy()); | ||
| vz.push_back(dsaMuon.vz()); | ||
| chi2.push_back(dsaMuon.chi2()); | ||
| ndof.push_back(dsaMuon.ndof()); | ||
|
|
||
| trkNumPlanes.push_back(dsaMuon.hitPattern().muonStationsWithValidHits()); | ||
| trkNumHits.push_back(dsaMuon.hitPattern().numberOfValidMuonHits()); | ||
| trkNumDTHits.push_back(dsaMuon.hitPattern().numberOfValidMuonDTHits()); | ||
| trkNumCSCHits.push_back(dsaMuon.hitPattern().numberOfValidMuonCSCHits()); | ||
| normChi2.push_back(dsaMuon.normalizedChi2()); | ||
|
|
||
| outerEta.push_back(dsaMuon.extra().isNonnull() && dsaMuon.extra().isAvailable() ? dsaMuon.outerEta() : -999); | ||
| outerPhi.push_back(dsaMuon.extra().isNonnull() && dsaMuon.extra().isAvailable() ? dsaMuon.outerPhi() : -999); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't variables such as those ones be ntuplised using a SimpleTrackFlatTableProducer? If so, I would recommend using existing modules and configure them to extract the variables you need.
| innerTrackValidFraction.push_back((!muon.innerTrack().isNull()) ? muon.innerTrack()->validFraction() : -1); | ||
| globalTrackNormalizedChi2.push_back((!muon.globalTrack().isNull()) ? muon.globalTrack()->normalizedChi2() : -1); | ||
| CQChi2Position.push_back(muon.combinedQuality().chi2LocalPosition); | ||
| CQTrackKink.push_back(muon.combinedQuality().trkKink); | ||
| numberOfMatchedStation.push_back(muon.numberOfMatchedStations()); | ||
| numberOfValidPixelHits.push_back( | ||
| (!muon.innerTrack().isNull()) ? muon.innerTrack()->hitPattern().numberOfValidPixelHits() : 0); | ||
| numberOfValidTrackerHits.push_back( | ||
| (!muon.innerTrack().isNull()) ? muon.innerTrack()->hitPattern().numberOfValidTrackerHits() : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in this module, one can delegate the extraction of variables such as these ones to a simplePATMuonFlatTableProducer.
| DEFINE_FWK_MODULE(SimpleTriggerTrackFlatTableProducer); | ||
| DEFINE_FWK_MODULE(SimpleGsfTrackFlatTableProducer); | ||
| DEFINE_FWK_MODULE(SimpleCompositeCandidateFlatTableProducer); | ||
| DEFINE_FWK_MODULE(SimpleMuonRecHitClusterFlatTableProducer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ever used?
| int getTotSegments(const reco::Track& dsaMuon) const; | ||
| int getDTSegments(const reco::Track& dsaMuon) const; | ||
| int getCSCSegments(const reco::Track& dsaMuon) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same strategy proposed above could apply here as well
PR description:
This PR includes producers and setup for a first version of EXO PAG customizations of nanoAOD, with customization combined from multiple analysis teams working with long-lived particles, including:
Previously discussed at the nanoAOD deep dive, within XPOG April 2 2025 and July 9 2025, and within EXO at the EXO workshop and general meeting.
The PR depends on cms-data/RecoTauTag-TrainingFiles#15 PR with training data for the displaced tau tag.
PR validation:
Collections with number of variables and sizes for different datasets has been evaluated:

Timing study done for running AOD -> EXOnanoAODv1 in one step tested on one file from TTto4Q MC dataset, accounting for differences on different machines. The study shows the relative time difference between producing standard nanoAOD and EXOnanoAODv1 as a function of number of events, with less than a factor 2 relative time increase.

The test script PhysicsTools/NanoAOD/test/test-exoNano.sh runs CMS driver command for AOD->EXOnanoAODv1 in one step (with --step PAT,NANO:(at)EXO) and works for this setup.
@enibigir @jniedzie