Reimplement DY #2248

peterkrack · 2024-12-11T14:59:44Z

Overall, this is close to completion except for DYE605_Z0_38P8GEV_DW_PXSEC in which the legacy implementation contains one extra-source of systematic uncertainties (cc @enocera).

Dataset	Status	Check CovMat	Check $t0$ CovMat	Comments
DYE605_Z0_38P8GEV_DW_PXSEC	❌	❌	❌	The old implementation contains one extra-source of uncertainties
DYE866_Z0_800GEV_PXSEC	✅	❌	❌	Slight numerical differences due to rawdata source (with maxdiff of ~2%)
DYE866_Z0_800GEV_DW_RATIO_PDXSECRATIO	✅	✅	✅	-
DYE906_Z0_120GEV_DW_PDXSECRATIO	✅	✅	✅	-

These being said, none of the remaining differences are really visible at the data vs theory comparisons report.

Radonirinaunimi

Hi @peterkrack! Happy new year and thanks for this!

Here are some preliminary comments:

Could you please fix the metedata regarding the arXiv, inSpire, and HepData urls? These will be crucial when moving into #2228
Could you make sure that the filters run properly. Currently, this is not the case as these filters contain some calls to non-existing functions.

Add the following to the filters to properly format the floats:

nnpdf/nnpdf_data/nnpdf_data/commondata/ATHENA_NC_45GEV_EP/filter.py

Lines 12 to 14 in 3f59024

    
           from nnpdf_data.filter_utils.utils import prettify_float 
        
           yaml.add_representer(float, prettify_float)

Use pre-commit hooks https://pre-commit.com/ to make sure that all the files are properly formatted before each commit.

Could you please take of these first before I can move further into the details of the implementation?

nnpdf_data/nnpdf_data/commondata/DYE605_Z0_38P8GEV_DW/filter.py

nnpdf_data/nnpdf_data/commondata/DYE866_Z0_800GEV/filter.py

Radonirinaunimi · 2025-01-10T05:53:37Z

Hi @peterkrack, please let me know if there is anything I can help with in the meantime.

Radonirinaunimi · 2025-01-14T07:28:19Z

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

peterkrack · 2025-01-14T10:14:20Z

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

For DYE605_Z0_38P8GEV_DW is suspect it is a bug in the old buildmaster. The other two I have to figure out what exactly is causing the discrepancy between legacy and reimplementation.

Radonirinaunimi · 2025-01-14T13:08:23Z

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

For DYE605_Z0_38P8GEV_DW is suspect it is a bug in the old buildmaster. The other two I have to figure out what exactly is causing the discrepancy between legacy and reimplementation.

I can also have a look in details, but would you be able to take care of #2248 (review) above?

Already resolved!

peterkrack · 2025-01-22T10:48:20Z

Overall, this is close to completion except for DYE605_Z0_38P8GEV_DW_PXSEC in which the legacy implementation contains one extra-source of systematic uncertainties (cc @enocera).

Concerning the extra uncertainty:
rep00001 to rep00999
https://raw.githubusercontent.com/NNPDF/nnpdf/refs/tags/4.0.6/buildmaster/rawdata/DYE605/nuclear/output/tables/group_result_table.csv

Then in the old buildmaster
nrep is set to 1000

nnpdf/buildmaster/filters/FTDY.cc

Line 151 in f6c49ae

int nrep=1000;

then later on the loop runs from irep=0 to irep=999; one uncertainty too much

nnpdf/buildmaster/filters/FTDY.cc

Line 206 in f6c49ae

for(int irep=0; irep<nrep; irep++)

enocera · 2025-01-22T14:39:29Z

Dear @peterkrack @Radonirinaunimi, let me try to clarify the uncertainties for the DY E605 data set.

In the "old" commondata implementation, there were one statistical (uncorrelated) uncertainty and 1002 sources of systematic uncertainties. Of these 1002 sources, the first was a 10% uncorrelated (additive) uncertainty and the second a 15% correlated (multiplicative) normalisation uncertainty. The other 1000 uncertainties were "nuclear uncertainties", estimated as the difference between predictions obtained with proton and nuclear PDFs, taking the proton PDF fixed to the NNPDF4.0 central value and varying the nuclear predictions for each of the 1000 replicas in nNNPDF3.0.
It seems to me that this implementation is correctly propagated into the legacy data set. Indeed, if I look at uncertainties_legacy_PXSEC.yaml, there are one statistical uncertainty, the two aforementioned systematic uncertainties, and 1000 nuclear uncertainties. In uncertainties_reimplemented_PXSEC.yaml, I'd say that one nuclear uncertainty is missing. This is consistent with what @peterkrack noticed above.
Now, you're right in saying that there are only 999 nuclear uncertainties in the input file rawdata/nuclear/output/tables/group_result_table.csv. A replica must have gone missed in the generation process.

I see two ways of proceeding forward.

I re-generate the rawdata/nuclear/output/tables/group_result_table.csv with the missing nuclear uncertainty.
We live with 999 nuclear uncertainties for the time being and we forget about the 1000th. Nuclear uncertainties will have to be revisited anyways for NNPDF4.1.

enocera · 2025-01-22T14:40:45Z

I have a preference for option 2.

Radonirinaunimi · 2025-01-22T14:48:52Z

I see two ways of proceeding forward.

1. I re-generate the `rawdata/nuclear/output/tables/group_result_table.csv` with the missing nuclear uncertainty.

2. We live with 999 nuclear uncertainties for the time being and we forget about the 1000th. Nuclear uncertainties will have to be revisited anyways for NNPDF4.1.

Thanks for your reply @enocera! I agree with all of your points - and the conclusions. As you see in the report, this differences is negligible so I tend to lean towards the 2nd option (on top of the reason you said).

scarlehoff

The kinematics_override needs to be set to the identity. The result_transform we can live with for the time being.

Probably the process type needs to be changed as well, we shoudn't have a process type per dataset...

Radonirinaunimi · 2025-01-23T05:41:08Z

Probably the process type needs to be changed as well, we shoudn't have a process type per dataset...

This is probably very minor, but how would you call the process types then?

And are you happy with how the variants are called?

scarlehoff · 2025-01-23T08:12:50Z

The variables are ok. For the process type... use DYP, otherwise you will need to dive into validphys to change a few things...

although FTDY would be better imho but 🤷‍♂️

Radonirinaunimi · 2025-01-23T09:56:50Z

The variables are ok. For the process type... use DYP, otherwise you will need to dive into validphys to change a few things...

Yep, this I know. But I was wondering if you want something specific after the _, ie DYP_XX.

although FTDY would be better imho but 🤷‍♂️

Accounting for this, I went for DYP_FT.

scarlehoff · 2025-01-23T11:43:09Z

yes, that sounds ok, as long as the first three letters are DYP it will go through vp ok. It needs to be added to process options though.

Radonirinaunimi · 2025-01-23T21:29:40Z

yes, that sounds ok, as long as the first three letters are DYP it will go through vp ok. It needs to be added to process options though.

When modifying the process options, I went for the easiest solution which is to simply add the variable M2 into the _dyboson_xq2map instead of modifying the variables here to be m_Z2 (although that should be the proper variable name). Doing the later turns out to be very messy as it also involves adding modifications to the filter rules.

Final Report

scarlehoff · 2025-01-24T12:28:11Z

nnpdf_data/nnpdf_data/commondata/DYE605_Z0_38P8GEV_DW/metadata.yaml

+    reimplemented:
+      data_uncertainties:
+      - uncertainties_reimplemented_PXSEC.yaml
+  data_central: data_reimplemented_PXSEC.yaml


Some final comments. This should not be reimplemented, this should just be the normal data_uncertainties, not a variant.

Also, if we have data_reimplemented and it is the same as the old one, the old one should be removed. And I think for one of the 866 it needs to be kept because there were small differences in the data (in the rawdata)

Same for all the others.

For one out of the four datasets, the new implementation has a slightly different central values (numerical fluctuations due to rawdata source), so not sure if we want to keep reimplemented for that one (?). But for the rest, I'll do asap.

we want to keep the old data under legacy for that dataset, and the new implementation be the default for all (so not keep reimplemented for any)

Sorry for the delay in this, but now I think everything is cleaned up.

Thanks. You kept the reimplemented variant though (although you removed it from the names). Maybe you didn't commit that part?

scarlehoff

Looks good. I think we should change M2 to M (left a note in #2264)

Another thing to note is that for E866 one extra point (89 in master pass the cuts vs 88 here) is now cut by the internal cuts. This seems to be due to a small (% level) differences on y, but it looks like it is coming from hepdata so not much to do there. The chi2 just changes from 1.59 to 1.57 so I don't think we have to worry about it.

peterkrack requested a review from enocera December 11, 2024 15:01

RoyStegeman mentioned this pull request Dec 11, 2024

Final revision of the 4.0 dataset #2242

Closed

5 tasks

Radonirinaunimi self-requested a review December 17, 2024 06:53

Radonirinaunimi previously requested changes Jan 8, 2025

View reviewed changes

nnpdf_data/nnpdf_data/commondata/DYE605_Z0_38P8GEV_DW/filter.py Outdated Show resolved Hide resolved

nnpdf_data/nnpdf_data/commondata/DYE605_Z0_38P8GEV_DW/filter.py Outdated Show resolved Hide resolved

nnpdf_data/nnpdf_data/commondata/DYE866_Z0_800GEV/filter.py Outdated Show resolved Hide resolved

scarlehoff added the data toolchain label Jan 8, 2025

Radonirinaunimi force-pushed the reimplement-DY branch from 422f1d1 to 567ec45 Compare January 10, 2025 05:53

Radonirinaunimi force-pushed the reimplement-DY branch from 567ec45 to fb1b0a7 Compare January 21, 2025 06:38

Radonirinaunimi marked this pull request as ready for review January 22, 2025 08:13

Radonirinaunimi force-pushed the reimplement-DY branch from bf2ff28 to 44c117e Compare January 22, 2025 08:19

scarlehoff requested changes Jan 22, 2025

View reviewed changes

scarlehoff reviewed Jan 24, 2025

View reviewed changes

scarlehoff approved these changes Jan 27, 2025

View reviewed changes

peterkrack and others added 4 commits January 27, 2025 17:12

initial commit

2290332

add tables for uncertainties to rawdata

a17b8ae

initial commit

ee64696

Run pre-commit hooks

5ec92f4

Radonirinaunimi and others added 9 commits January 27, 2025 17:12

Fix metadata

41b00c2

Make filters run and clean up

0f6fdd7

Use re-implemented kinematics and data in metadata

3f40d91

Minor cosmetic changes

2ea3f6d

Fix process_type and kinematic_override

1bb8b82

Fix process_options and minor metadata entries

a3c1c0f

Clean and make naming of the .yaml files consistent

1bfb6e8

Fix bug in top-level data_uncertainties

069884d

s -> sqrts

22a8646

scarlehoff force-pushed the reimplement-DY branch from 473820a to 22a8646 Compare January 27, 2025 16:12

scarlehoff merged commit 8327d16 into master Jan 27, 2025
9 checks passed

scarlehoff deleted the reimplement-DY branch January 27, 2025 18:24

	from nnpdf_data.filter_utils.utils import prettify_float

	yaml.add_representer(float, prettify_float)

Reimplement DY #2248

Reimplement DY #2248

Uh oh!

Conversation

peterkrack commented Dec 11, 2024 • edited by Radonirinaunimi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Radonirinaunimi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Radonirinaunimi commented Jan 10, 2025

Uh oh!

Radonirinaunimi commented Jan 14, 2025

Uh oh!

peterkrack commented Jan 14, 2025

Uh oh!

Radonirinaunimi commented Jan 14, 2025

Uh oh!

peterkrack commented Jan 22, 2025

Uh oh!

enocera commented Jan 22, 2025

Uh oh!

enocera commented Jan 22, 2025

Uh oh!

Radonirinaunimi commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scarlehoff left a comment

Choose a reason for hiding this comment

Uh oh!

Radonirinaunimi commented Jan 23, 2025

Uh oh!

scarlehoff commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Radonirinaunimi commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scarlehoff commented Jan 23, 2025

Uh oh!

Radonirinaunimi commented Jan 23, 2025

Uh oh!

scarlehoff Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Radonirinaunimi Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

scarlehoff Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

Radonirinaunimi Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

scarlehoff Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

scarlehoff left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

peterkrack commented Dec 11, 2024 •

edited by Radonirinaunimi

Loading

Radonirinaunimi commented Jan 22, 2025 •

edited

Loading

scarlehoff commented Jan 23, 2025 •

edited

Loading

Radonirinaunimi commented Jan 23, 2025 •

edited

Loading

scarlehoff Jan 24, 2025 •

edited

Loading

scarlehoff left a comment •

edited

Loading