Skip to content

Conversation

@peterkrack
Copy link
Contributor

@peterkrack peterkrack commented Dec 11, 2024

Overall, this is close to completion except for DYE605_Z0_38P8GEV_DW_PXSEC in which the legacy implementation contains one extra-source of systematic uncertainties (cc @enocera).

Dataset Status Check CovMat Check $t0$ CovMat Comments
DYE605_Z0_38P8GEV_DW_PXSEC The old implementation contains one extra-source of uncertainties
DYE866_Z0_800GEV_PXSEC Slight numerical differences due to rawdata source (with maxdiff of ~2%)
DYE866_Z0_800GEV_DW_RATIO_PDXSECRATIO -
DYE906_Z0_120GEV_DW_PDXSECRATIO -

These being said, none of the remaining differences are really visible at the data vs theory comparisons report.

@peterkrack peterkrack requested a review from enocera December 11, 2024 15:01
@Radonirinaunimi Radonirinaunimi self-requested a review December 17, 2024 06:53
Copy link
Member

@Radonirinaunimi Radonirinaunimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @peterkrack! Happy new year and thanks for this!

Here are some preliminary comments:

  • Could you please fix the metedata regarding the arXiv, inSpire, and HepData urls? These will be crucial when moving into #2228
  • Could you make sure that the filters run properly. Currently, this is not the case as these filters contain some calls to non-existing functions.
  • Add the following to the filters to properly format the floats:
    from nnpdf_data.filter_utils.utils import prettify_float
    yaml.add_representer(float, prettify_float)
  • Use pre-commit hooks https://pre-commit.com/ to make sure that all the files are properly formatted before each commit.

Could you please take of these first before I can move further into the details of the implementation?

@Radonirinaunimi
Copy link
Member

Hi @peterkrack, please let me know if there is anything I can help with in the meantime.

@Radonirinaunimi
Copy link
Member

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

@peterkrack
Copy link
Contributor Author

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

For DYE605_Z0_38P8GEV_DW is suspect it is a bug in the old buildmaster. The other two I have to figure out what exactly is causing the discrepancy between legacy and reimplementation.

@Radonirinaunimi
Copy link
Member

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

For DYE605_Z0_38P8GEV_DW is suspect it is a bug in the old buildmaster. The other two I have to figure out what exactly is causing the discrepancy between legacy and reimplementation.

I can also have a look in details, but would you be able to take care of #2248 (review) above?

@Radonirinaunimi Radonirinaunimi dismissed their stale review January 21, 2025 07:59

Already resolved!

@Radonirinaunimi Radonirinaunimi marked this pull request as ready for review January 22, 2025 08:13
@peterkrack
Copy link
Contributor Author

Overall, this is close to completion except for DYE605_Z0_38P8GEV_DW_PXSEC in which the legacy implementation contains one extra-source of systematic uncertainties (cc @enocera).

Concerning the extra uncertainty:
rep00001 to rep00999
https://raw.githubusercontent.com/NNPDF/nnpdf/refs/tags/4.0.6/buildmaster/rawdata/DYE605/nuclear/output/tables/group_result_table.csv

Then in the old buildmaster
nrep is set to 1000

int nrep=1000;

then later on the loop runs from irep=0 to irep=999; one uncertainty too much

for(int irep=0; irep<nrep; irep++)

@enocera
Copy link
Contributor

enocera commented Jan 22, 2025

Dear @peterkrack @Radonirinaunimi, let me try to clarify the uncertainties for the DY E605 data set.

  • In the "old" commondata implementation, there were one statistical (uncorrelated) uncertainty and 1002 sources of systematic uncertainties. Of these 1002 sources, the first was a 10% uncorrelated (additive) uncertainty and the second a 15% correlated (multiplicative) normalisation uncertainty. The other 1000 uncertainties were "nuclear uncertainties", estimated as the difference between predictions obtained with proton and nuclear PDFs, taking the proton PDF fixed to the NNPDF4.0 central value and varying the nuclear predictions for each of the 1000 replicas in nNNPDF3.0.
  • It seems to me that this implementation is correctly propagated into the legacy data set. Indeed, if I look at uncertainties_legacy_PXSEC.yaml, there are one statistical uncertainty, the two aforementioned systematic uncertainties, and 1000 nuclear uncertainties. In uncertainties_reimplemented_PXSEC.yaml, I'd say that one nuclear uncertainty is missing. This is consistent with what @peterkrack noticed above.
  • Now, you're right in saying that there are only 999 nuclear uncertainties in the input file rawdata/nuclear/output/tables/group_result_table.csv. A replica must have gone missed in the generation process.

I see two ways of proceeding forward.

  1. I re-generate the rawdata/nuclear/output/tables/group_result_table.csv with the missing nuclear uncertainty.
  2. We live with 999 nuclear uncertainties for the time being and we forget about the 1000th. Nuclear uncertainties will have to be revisited anyways for NNPDF4.1.

@enocera
Copy link
Contributor

enocera commented Jan 22, 2025

I have a preference for option 2.

@Radonirinaunimi
Copy link
Member

Radonirinaunimi commented Jan 22, 2025

I see two ways of proceeding forward.

1. I re-generate the `rawdata/nuclear/output/tables/group_result_table.csv` with the missing nuclear uncertainty.

2. We live with 999 nuclear uncertainties for the time being and we forget about the 1000th. Nuclear uncertainties will have to be revisited anyways for NNPDF4.1.

Thanks for your reply @enocera! I agree with all of your points - and the conclusions. As you see in the report, this differences is negligible so I tend to lean towards the 2nd option (on top of the reason you said).

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kinematics_override needs to be set to the identity. The result_transform we can live with for the time being.

Probably the process type needs to be changed as well, we shoudn't have a process type per dataset...

@Radonirinaunimi
Copy link
Member

Probably the process type needs to be changed as well, we shoudn't have a process type per dataset...

This is probably very minor, but how would you call the process types then?

And are you happy with how the variants are called?

@scarlehoff
Copy link
Member

scarlehoff commented Jan 23, 2025

The variables are ok. For the process type... use DYP, otherwise you will need to dive into validphys to change a few things...

although FTDY would be better imho but 🤷‍♂️

@Radonirinaunimi
Copy link
Member

Radonirinaunimi commented Jan 23, 2025

The variables are ok. For the process type... use DYP, otherwise you will need to dive into validphys to change a few things...

Yep, this I know. But I was wondering if you want something specific after the _, ie DYP_XX.

although FTDY would be better imho but 🤷‍♂️

Accounting for this, I went for DYP_FT.

@scarlehoff
Copy link
Member

yes, that sounds ok, as long as the first three letters are DYP it will go through vp ok. It needs to be added to process options though.

@Radonirinaunimi
Copy link
Member

yes, that sounds ok, as long as the first three letters are DYP it will go through vp ok. It needs to be added to process options though.

When modifying the process options, I went for the easiest solution which is to simply add the variable M2 into the _dyboson_xq2map instead of modifying the variables here to be m_Z2 (although that should be the proper variable name). Doing the later turns out to be very messy as it also involves adding modifications to the filter rules.

Final Report

reimplemented:
data_uncertainties:
- uncertainties_reimplemented_PXSEC.yaml
data_central: data_reimplemented_PXSEC.yaml
Copy link
Member

@scarlehoff scarlehoff Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final comments. This should not be reimplemented, this should just be the normal data_uncertainties, not a variant.

Also, if we have data_reimplemented and it is the same as the old one, the old one should be removed. And I think for one of the 866 it needs to be kept because there were small differences in the data (in the rawdata)

Same for all the others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For one out of the four datasets, the new implementation has a slightly different central values (numerical fluctuations due to rawdata source), so not sure if we want to keep reimplemented for that one (?). But for the rest, I'll do asap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to keep the old data under legacy for that dataset, and the new implementation be the default for all (so not keep reimplemented for any)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in this, but now I think everything is cleaned up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. You kept the reimplemented variant though (although you removed it from the names). Maybe you didn't commit that part?

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think we should change M2 to M (left a note in #2264)

Another thing to note is that for E866 one extra point (89 in master pass the cuts vs 88 here) is now cut by the internal cuts. This seems to be due to a small (% level) differences on y, but it looks like it is coming from hepdata so not much to do there. The chi2 just changes from 1.59 to 1.57 so I don't think we have to worry about it.

@scarlehoff scarlehoff merged commit 8327d16 into master Jan 27, 2025
9 checks passed
@scarlehoff scarlehoff deleted the reimplement-DY branch January 27, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants