Skip to content

Conversation

@PonIlya
Copy link
Contributor

@PonIlya PonIlya commented Nov 23, 2023

PR description:

The purpose of this PR is to improve the CondDBESSource.cc dump method.
Records and their consumption will be output to a JSON file rather than a log file or console due to the fact that they are usually too long for visual analysis.
This is more convenient for further use, for example, generating сonsumption tables a log parsing script was previously used to fill it out

Previously, this issue was raised in the following PR but was not approved

Taking into account the comment, I left the old dump method but added the option to upload JSON specifying the file name via the command:
process.GlobalTag.JsonDumpFileName =cms.untracked.string("CondDBESSource.json")
The command above starts the creation of a JSON dump file if the file name or path to it is specified.

PR validation:

The generated .json file should have the same contents as the previous version of dumpstat after running.

cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:phase1_2023_realistic_postBPix -n 5 --era Run3_2023 --geometry DB:Extended -s GEN --fileout output_step1_GEN.root --beamspot Realistic25ns13p6TeVEarly2023Collision --customise_commands='process.GlobalTag.DumpStat =cms.untracked.bool(True) \n process.GlobalTag.JsonDumpFileName =cms.untracked.string("CondDBESSource.json")'|& tee output_step1_GEN.log
OR (without dump into .log)
cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:phase1_2023_realistic_postBPix -n 5 --era Run3_2023 --geometry DB:Extended -s GEN --fileout output_step1_GEN.root --beamspot Realistic25ns13p6TeVEarly2023Collision --customise_commands='process.GlobalTag.JsonDumpFileName =cms.untracked.string("CondDBESSource.json")'|& tee output_step1_GEN.log
Not a backport

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43374/37852

  • This PR adds an extra 24KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 23, 2023

A new Pull Request was created by @PonIlya for master.

It involves the following packages:

  • CondCore/ESSources (db, alca)

@consuegs, @cmsbuild, @saumyaphor4252, @francescobrivio, @perrotta can you please review it and eventually sign? Thanks.
@mmusich, @PonIlya, @yuanchao, @tocheng, @rsreds this is something you requested to watch as well.
@sextonkennedy, @rappoccio, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@perrotta
Copy link
Contributor

@PonIlya this is going to disrupt the possibility for dumping the twiki format from the text logs. Nothing that cannot get fixed by the scripts in AlCaTools later on, but I would let the possibility to dump either in the text or in the json file ruled by a configuration parameter.

Moreover, the json files in output are numbered incrementally, and their name depends on what's already in the repository where you run them. I think that also those names should be made configurable, so that scripts in AlCaTools/ConditionsConsumed (for example runCMSDrivers_data2023D.sh) can be used to give them the same name as of the python config which is run.

@vlimant
Copy link
Contributor

vlimant commented Nov 24, 2023

this is an extremely useful feature! many thanks

@PonIlya
Copy link
Contributor Author

PonIlya commented Nov 27, 2023

@perrotta
I created a script for processing json files. So this shouldn't be a blocker.
Although, it's a good idea to give the user a choice

I propose it this way:
Leave the old output method as default, and activate the new one instead of the old one when passing the argument in the command line --j --json.
Example:
cmsDriver.py step1 --conditions auto:run3_hlt_relval -n 5 --era Run3_2023 -s L1REPACK:Full --data --scenario pp --datatier FEVTDEBUGHLT --eventcontent FEVTDEBUGHLT --filein /store/data/Run2023D/JetMET0/RAW/v1/000/369/978/00000/00b9eba7-c847-465b-a6de-98bceae93613.root --fileout output_step1_L1.root --customise_commands="process.load('Configuration.StandardSequences.Digi_cff') \n process.GlobalTag.DumpStat =cms.untracked.bool(True)" --json --outputCommands "keep *" |& tee output_step1_L1.log

As for the names of the files, I will change the naming method so that it is taken from the arguments and corresponds to the name .log (output_step1_L1.json and etc.)

@malbouis
Copy link
Contributor

@perrotta , @PonIlya , is there anything preventing this PR from moving forward?

@perrotta
Copy link
Contributor

@perrotta , @PonIlya , is there anything preventing this PR from moving forward?

Yes, #43374 (comment) must be addressed before we can proceed (in particular the second part, that canno be cured by an additional script in AlCaTools)

@vlimant
Copy link
Contributor

vlimant commented Dec 14, 2023

how about dumping the info both in the log file (the old way) and in a json (new way) on the side, systematically, when process.GlobalTag.DumpStat =cms.untracked.bool(True)

@vlimant
Copy link
Contributor

vlimant commented Dec 14, 2023

IMO the name of the json file should be fixed and unambiguous like "CondDBESSource_stats.json" or "CondDBESSource.json"

@PonIlya
Copy link
Contributor Author

PonIlya commented Dec 14, 2023

IMO the name of the json file should be fixed and unambiguous like "CondDBESSource_stats.json" or "CondDBESSource.json"

I going to add the configuration parameter which can take the name of the JSON file. Like --j [filename].json and by default
CondDBESSource.json.

@PonIlya
Copy link
Contributor Author

PonIlya commented Dec 14, 2023

But, I have a problem with the way to pass the JSON file name and start flag to the CondDBESSource.cc from cmsDriver.py
We need some acceptable way.
Maybe with os.environ
os.environ["JSON_FILENAME"] = json_filename
and
std::getenv("JSON_FILENAME")
Or add a config file like config.ini
instead of "Auto generated configuration file" because this kind of config file is named by the user.

For now, i chose this way, with customise_commands:
--customise_commands='process.GlobalTag.JsonDumpFileName =cms.untracked.string("Testfilename") \n process.GlobalTag.DumpStat=cms.untracked.bool(True)
If I don't come up with anything better.

@malbouis
Copy link
Contributor

@PonIlya , I would go with #43374 (comment) and just give the file a static (non-configurable) name.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43374/38218

  • This PR adds an extra 24KB to repository

@cmsbuild
Copy link
Contributor

Pull request #43374 was updated. @francescobrivio, @consuegs, @saumyaphor4252, @perrotta, @cmsbuild can you please check and sign again.

@PonIlya PonIlya marked this pull request as draft December 15, 2023 11:46
@PonIlya PonIlya closed this Dec 15, 2023
@PonIlya
Copy link
Contributor Author

PonIlya commented Dec 18, 2023

@perrotta I changed the code so that you can select the dump method and file name through custom commands.
process.GlobalTag.JsonDumpFileName =cms.untracked.string("CondDBESSource.json")
I hope the current implementation will suit everyone.

@PonIlya
Copy link
Contributor Author

PonIlya commented Dec 18, 2023

@malbouis My opinion is that one file name is not convenient because... if you run several dumps in a row, it will be overwritten. It will not be possible to collect information in JSON format immediately about GEN, SIM, etc. with one .sh script.

I have already added a new approach, I think it has become more convenient

@perrotta
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c35264/36550/summary.html
COMMIT: 2114bdf
CMSSW: CMSSW_14_0_X_2023-12-18-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43374/36550/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

@vlimant
Copy link
Contributor

vlimant commented Jan 10, 2024

is this good to go ?

@perrotta
Copy link
Contributor

please test
(just to refresh them, no surprise expected, though)

@perrotta
Copy link
Contributor

+1

  • Thank you @PonIlya , and sorry for the long delay after the restart at the beginning of the year: the PR as such is good to go, the scripts in AlCaTools/ConditionsConsumed can now get easily adapted to output also the json file, and there is no risk any more of ovewriting those outputs.
  • Since I do not expect suprises from the tests, I'm signing this now, so that the release managers can merge it as soon as tests certify that it is still compatible with the release

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @sextonkennedy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c35264/36802/summary.html
COMMIT: 2114bdf
CMSSW: CMSSW_14_0_X_2024-01-10-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43374/36802/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

@rappoccio
Copy link
Contributor

+1

recordData["timeLookupPayloadIds"].push_back(payloadIdData);
}

jsonData[recName].push_back(recordData);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, I was wondering, does it make sense to limit the json output to the payloads that are actually consumed in the job (i.e. if !pids.empty() or is there any particular reason to print all the tags that would be consumed but are actually not ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants