[WIP] Implement ATLAS WPWM 13TEV DIF (Future test) #2382

ecole41 · 2025-10-06T11:24:09Z

This branch included an implementation of the ATLAS 13TEV WPWM Differential measurements for future test data. Another version of this implementation has been added in PR #2380.

Still To Do:

Check uncertainty definitions
Complete metadata.yamlfile
Cross check with #2380

ecole41 · 2025-10-06T13:39:13Z

@enocera I am not certain on the treatment of the uncertainties here. Do you know if any more should be added to the treatment and correlations dictionaries so that they are not treated as additive correlated uncertainties?

ecole41 · 2025-10-23T10:05:10Z

@enocera:
Jelle and I have discussed the two implementations of this dataset. We have both followed different structures when constructing the observables so we are unsure which is the preferred structure.
Jelle has implemented as:

$W^+$ single differential
$W^-$ single differential
$W^+$ double differential
$W^-$ double differential

Whereas I have implemented these as:

1D combined $W^+$ and $W^-$ in sequence (Tabs. 38-39);
1D muons $W^+$ and $W^-$ in sequence (Tabs. 20-21);
2D combined $W^+$ and $W^-$ in sequence (Tabs. 44-53);
2D muons $W^+$ and $W^-$ in sequence (Tabs. 23-32).

I have also added some changes to the uncertainty treatments after discussing this with an ATLAS experimentalist. She suggested that all unfolding systematics should be treated as uncorrelated and all normalisation systematics as multiplicative and uncorrelated - I have set this in this branch.

juanrojochacon · 2025-10-23T10:07:40Z

@ecole41 is this (de)correlation prescription approved by ATLAS in some manner? They always get quite nervous if we start to play with their correlation model, so having some kind of official endorsement always helps

juanrojochacon · 2025-10-23T14:07:40Z

Hi @ecole41 @jekoorn thanks for the work. Maybe @enocera has other ideas but my two cents are the following:

We would never want to fit separately W+ and W- data. So it is clear to me that a "W production" dataset should always consist in W+ and W- cross-sections.
We don't want to fit separately the muon data but always the combined datasets (electron + muons). So I would forget about the muon only and implement only the combined measurements
One cannot fit at the same time 1D and 2D distributions (same underlying dataset) so I would keep them separated.

so to me the preferred structure would be what @ecole41 has done but removing the muon datasets, if this is clear.

In any case it should be easy for @jekoorn to adopt Ella's implementation, and then you can cross-check each other concerning the implementation of systematic errors.

juanrojochacon · 2025-10-23T14:08:19Z

In any case as I mentioned above it is important to document our choice of correlation model, and make sure we can back it up with some official ATLAS recommendation

juanrojochacon · 2025-10-23T14:09:11Z

Once @enocera signs off the dataset implementation, we will move to the generation of NNLO grids using NNLOJET, which will also be a non-trivial amount of work specially the first time that it is done

jekoorn · 2025-10-23T14:25:04Z

hi @juanrojochacon thanks a lot for the comments!

I agree with everything, also in terms of fool-proofing so that there does not need to be any confusion about what dataset is to be fitted, and which is not. With your proposed structure one should enter one dataset to the runcard at the time.

Just to be sure:

never want to fit separately W+ and W- data

So that means we we should put them sequentially in the same file, as Ella already did?

In any case I will make these changes to my implementation. Clear, thanks!

juanrojochacon · 2025-10-23T14:35:26Z

yes indeed, we put one after the other. It is the exact same analysis, so there will never be a reason why we choose to fit W+ but not W-. This is the same as what is done for similar datasets.

So yes, follow Ella's implementation and then you can compare the two and check that they are the same

enocera · 2025-10-23T15:23:38Z

For your reference, I paste here what I recommended to @ecole41 in a private conversation.

I would implement the cross section single differential in m_W^T separately for positive and negative leptons (so only Tabs. 38 and 39). I would also implement the cross section double differential in m_W^T and eta, again separately for positive and negative leptons (Tabs. 44-53).

Two remarks.

The combination of electron and muon channels leads to smaller uncertainties (and, as I said many times, this is something we like). However, because of the way in which the combination is performed, the correspondence between each systematic uncertainty and its physical source may get lost. I understand that this is the reason why, in an attempt to recover this correspondence, the combination is performed by rotating to the orthogonal basis (which makes the combination easier) and then by rotating back to the physical basis. This procedure may be inaccurate and it may alter the correspondence between a given systematic uncertainty and its physical source. This means that there is an additional ambiguity in interpreting each of these uncertainties as corr or uncorr, add or mult.
Because of what is said above, it may be good to also implement the pure muon channel, again separately for positive and negative muons, for the 1D and the 2D distributions. In that case, whereas the measurement is less precise than the combined one, we do not break the correspondence between systematic uncertainties and their physical source, a fact that makes the interpretation of whether a sys unc is corr/uncorr cleaner.

I assume these include the correct tables to be implemented. Do you have a preference for how the dataset should be formed? E.g WPWM (l+ and l- in sequence, WP and WM as separate observables, also should the l+/- channel be considered?

I would implement four different observables in the same data set, as follows:

1D combined W+ and W- in sequence (Tabs. 38-39);
1D muons W+ and W- in sequence (Tabs. 20-21);
2D combined W+ and W- in sequence (Tabs. 44-53);
2D muons W+ and W- in sequence (Tabs. 23-32).

enocera · 2025-10-23T15:32:31Z

That being said, I think that @jekoorn and @ecole41 would like some input on their choice of treatment of the various sources of uncertainties, which I will give them asap.

juanrojochacon · 2025-10-23T15:35:12Z

Good point @enocera I agree. We can check that results based on the muon dataset are consistent with those of the combined dataset. In any case for this measurement I expect that we are limited by systematics, so actually it may be better to stick to the muon dataset to have a better grasp of the systematics.

So we have a plan

enocera · 2025-10-23T15:38:36Z

* We don't want to fit separately the muon data but always the combined datasets (electron + muons). So I would forget about the muon only and implement only the combined measurements

This is one point on which I don't agree completely, for reasons related to the interpretation of systematic uncertainties, that can get more ambiguous (especially w.r.t. correlations) in the combined case, as I explained above. Theoretical predictions will remain the same, therefore I recommend the implementation of both the muon cross sections and the combined cross sections in the commondata framework.

* One cannot fit at the same time 1D and 2D distributions (same underlying dataset) so I would keep them separated.

This is another point on which I (partly) disagree. Our commondata implementation is flexible enough to have multiple observables for the same data set. In other words: the data set is one, that incorporates both the 1D and the 2D distributions. But they are two mutually exclusive observables (in the same data set), of course, because we don't know correlations. We can elegantly implement them in a single data set, and call only a subset of observables (1D or 2D) in our fit runcard. I have listed above the preferred clustering.

juanrojochacon · 2025-10-23T15:41:13Z

sure @enocera I meant separated as in a different file, but we can keep them as subsets of the same dataset, as we do for many other datasets. So I agree with your remarks

enocera · 2025-10-23T15:44:49Z

sure @enocera I meant separated as in a different file, but we can keep them as subsets of the same dataset, as we do for many other datasets. So I agree with your remarks

OK, we are on the same page, then.

enocera · 2025-10-28T11:28:52Z

Dear @ecole41 I (finally!) had the chance to look at your implementation of the data set. I would say that most of it is very nicely done. I suggest to use your implementation as a baseline w.r.t. that of @jekoorn . I have some suggestions about the treatment of uncertainties, though.

muon channel (1D and 2D distributions). The label of the luminosity uncertainty should be changed from ATLASLUMI15 to ATLASLUMIRUNII (sorry, my bad). I think that the uncertainties with labels Data stat. unc., Sig. stat. unc., Bkg. stat. unc.,Alternative MC unf. unc. have to be treated as ADD UNCORR. The reason why I am saying this is that because, by reading Sects. 7.2-7.3 of the paper, I seem to understand that unc. in the uncertainty label stands for "uncorrelated" (and not for "uncertainty"). They indeed say that there are statistical uncorrelated components in the systematic uncertainties related to the muon trigger, identification, vertex association and isolation efficiency, and that the MC uncertainties are uncorrelated. I would treat all the other uncertainties as MULT CORR, including the normalisation uncertainties that are currently defined as UNCORR (why did you choose this? shouldn't a normalisation uncertainty be correlated across bins by definition?).
combined lepton channel (1D and 2D distributions). The label of the luminosity uncertainty should be changed from ATLASLUMI15 to ATLASLUMIRUNII (sorry, my bad). I would treat the Alternative MC unf. unc. and the Basic unf. unc. as ADD UNCORR. I would treat all the other uncertainties as ADD CORR (ADD because they are obtained by rotating back to the physical basis a set of uncertainties determined in the orthogonal basis). I see that you treat some of them as UNCORR, but I'm not able to understand whether this is correct or not just by reading the paper. Do you have any other source of information? Can you please clarify tour choice? Thanks!

jekoorn · 2025-10-31T11:00:29Z

Dear @ecole41, @enocera, and @juanrojochacon, I have updated my implementation following Emanuele's request in #2380 , and cross-checked with Ella's numbers, which should all add up perfectly. I
I suppose we can now move to FK-table generation.

juanrojochacon · 2025-10-31T11:02:46Z

Great! I understand that your numbers and those from Ella are identical?

juanrojochacon · 2025-10-31T11:03:35Z

If so yes, while @enocera completes his review i would start with the grid generation.

juanrojochacon · 2025-10-31T11:58:17Z

For the NNLO grid implementation, as we agreed I would suggest that @ecole41 and @jekoorn proceed in parallel with the implementation of the PineFarm cards etc, produce a low-stats grid with NNLOJET, and check that they get consistent numbers. Then for the final, high stat grids, we only need to do it once

juanrojochacon · 2025-10-31T11:58:50Z

at least this is the plan we made with @enocera and @scarlehoff at Morimondo, and I still think it is a good idea which saves time on the long run

jekoorn · 2025-10-31T12:00:25Z

Great! I understand that your numbers and those from Ella are identical?

Whereas I initially thougt yes, it seems there is some deviation in the numbers for the <only muon, double differential> set. Interestingly, the other double differential, which is generated using the same function, does seem to be correct.
I will investigate if somethig funny is going on with my code, and compare with the hepdata tables

juanrojochacon · 2025-10-31T12:03:47Z

ok, this is precisely why benchmarks are useful ;)

With the help of the benchmark, should be possible to understand where the problem is

Then we move to the NNLO grid generation

jekoorn · 2025-11-04T14:08:33Z

Hi all, I have looked a bit closer at the difference between my numbers and Ella's (why some were swapped around).

To be more precise, it seems that in my implementation and your implementation of the DDIF sets (I checked the data and kinematic tables), we have the following structure in the data file in terms of the HEPData tables:

lep_physical_plus_absetamtw_mtw0
lep_physical_plus_absetamtw_mtw1
lep_physical_plus_absetamtw_mtw2
lep_physical_plus_absetamtw_mtw3
lep_physical_plus_absetamtw_mtw4
lep_physical_minus_absetamtw_mtw0
lep_physical_minus_absetamtw_mtw1
lep_physical_minus_absetamtw_mtw2
lep_physical_minus_absetamtw_mtw3
lep_physical_minus_absetamtw_mtw4

But for the muon data, from what I understand, your filter swaps them around in the following way:

muo_plus_absetamtw_mtw0
muo_minus_absetamtw_mtw0
muo_plus_absetamtw_mtw1
muo_minus_absetamtw_mtw1
muo_plus_absetamtw_mtw2
muo_minus_absetamtw_mtw2
muo_plus_absetamtw_mtw3
muo_minus_absetamtw_mtw3
muo_plus_absetamtw_mtw4
muo_minus_absetamtw_mtw4

which makes sense given this line in your code

   elif observable == "WPWM_DDIF_LEP":
        tables = []
        for i in range(5):
            tables.append(f"lep_physical_plus_absetamtw_mtw{i}")
        for i in range(5):
            tables.append(f"lep_physical_minus_absetamtw_mtw{i}")
    elif observable == "WPWM_DDIF_MUON":
        tables = []
        for i in range(5):
            tables.append(f"muo_plus_absetamtw_mtw{i}")
            tables.append(f"muo_minus_absetamtw_mtw{i}")

where you append the data to your tables either 'first all plus then all minus' or 'alternating plus/min'.

But in the kinematics file for DDIF MUON they are not swapped around and instead follow the structure of "first all plus, then all minus". I assume that we would like to do the former and have first all plus tables, and then all minus.

So the numbers are correct in the end, just misaligned. I guess this is an easy fix ;-)

Nice! Then we are set.

ecole41 · 2025-11-06T11:03:49Z

@jekoorn Thanks for figuring that out. I have now adjusted my filter.py script so that this structure should now match the structure in your PR. Let me know if there are still any inconsistencies.

I have also changed the uncertainty treatments to match @enocera's suggestions. I just wanted to check that Stat. unc. and Uncor. syst. unc. should also be treated as ADD UNCORR?

enocera · 2025-11-06T11:22:50Z

I have also changed the uncertainty treatments to match @enocera's suggestions. I just wanted to check that Stat. unc. and Uncor. syst. unc. should also be treated as ADD UNCORR?

Yes, thanks.

juanrojochacon · 2025-11-06T14:08:14Z

Thanks for the work @ecole41 @jekoorn great that we are converging here.

Question for @enocera : the next step is to generate NNLO grids and compare them with the implemented data. Will work related to the NNLO grid calculation be discussed in this PR or should Ella ane Jelle open a separate one?

enocera · 2025-11-06T14:18:06Z

Question for @enocera : the next step is to generate NNLO grids and compare them with the implemented data. Will work related to the NNLO grid calculation be discussed in this PR or should Ella ane Jelle open a separate one?

@juanrojochacon Grids will be generated with NNLOjet. I understand that production has been automatised as much as possible relying on the piece of information contained in the commondata. According to our established workflow I expect:

that a data/theory comparison (including the computation of the chi2) be discussed as part of this PR, see e.g. CMS_Z0J_13TEV #2360;
that a PR be opened with the relevant gird(s) in the appropriate repository https://github.com/NNPDF/theories_slim, see e.g. https://github.com/NNPDF/theories_slim/pull/67.

Discussion can occur in either PR, though it should be possibly cross-referenced.

juanrojochacon · 2025-11-06T14:26:37Z

ok clear @enocera . Yes, indeed, grid production should be automated with pinefarm, but as usual the proof is in the pudding.

I suggest that @ecole41 and @jekoorn try independently to generate the NNLO grid and then they both cross-check each other results. Once the low-stats grid is produced, we can produced to the high-stats grid generation and then produce the FK tables etc

scarlehoff · 2025-11-12T11:39:42Z

@enocera looking closely at this dataset, this is a W+J at Leading Order (which is a few orders of magnitude more expensive to compute than just W).

So perhaps we want to do a first NLO check before moving to NNLO?

ecole41 · 2025-11-12T14:07:27Z

nnpdf_data/nnpdf_data/commondata/ATLAS_WPWMJ_13TEV_HMT/metadata.yaml

+    description: Combined Born-level single-differential cross-section in the $l^+$ and $l^-$ channels in sequence
+    label: ATLAS $W$ 13 TeV Differential 
+    units: 'pb/GeV'
+  process_type: DY_CC_PT


I am not sure if this process type is correct, I have chosen DY_CC_PT as the description given in process_options.py matches DY W + j

It is correct, but in process_option.py you might need to add a condition in order to compute the x-Q kinematic coverage from transverse mass instead of from pT.

Btw, please use WJ instead of WPWMJ, see https://docs.nnpdf.science/data/dataset-naming-convention.html#nnpdf-s-dataset-naming-convention

Ok, the reason I used WPWMJ was that this process would be handled correctly when generating pinecards as it splits the observable into WP and WM. But I will just alter the pinefarm interface to also treat WJ like this

Make sure to use this branch https://github.com/NNPDF/pinefarm/pull/107/files#diff-22d68a6023c028591ce66c2a6240ac2df65408ccd1736fb8f41c4e2bb038b389 and push the changes there directly.

It's what I was using for NNPDF/pinecards#187

enocera · 2025-11-12T16:13:17Z

So perhaps we want to do a first NLO check before moving to NNLO?

Of course. The idea is to first make a cheap run (perhaps you can even limit statistics a little?)

First commit

81ef835

ecole41 requested review from enocera, jekoorn and juanrojochacon October 6, 2025 11:24

ecole41 self-assigned this Oct 6, 2025

ecole41 added NNPDF4.1 data toolchain labels Oct 6, 2025

ecole41 added 3 commits October 6, 2025 14:05

Added ref to uncertainties in metadata

82a6c75

Prettify floats

e9aef6e

Added uncorr label

af9972a

scarlehoff mentioned this pull request Oct 21, 2025

future implementation ATLAS_WPWM_13TEV_140FB #2380

Closed

Edited uncertainties

3a10f63

Redefined unc treatment and adjusted data structure

e82f3ea

ecole41 added 3 commits November 11, 2025 12:21

Dataset name change

8a30ad4

Observable rename

89a83aa

Fixed errors with kin labels

3092469

ecole41 mentioned this pull request Nov 12, 2025

[WIP] Pinecards for ATLAS_WJ_13TEV_HMT NNPDF/pinecards#190

Open

Renamed the WPWMJ

0725802

ecole41 commented Nov 12, 2025

View reviewed changes

ecole41 mentioned this pull request Dec 11, 2025

Implementation of ATLAS_WPWM_13TEV_HMT #2410

Open

[WIP] Implement ATLAS WPWM 13TEV DIF (Future test) #2382

Are you sure you want to change the base?

[WIP] Implement ATLAS WPWM 13TEV DIF (Future test) #2382

Uh oh!

Conversation

ecole41 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ecole41 commented Oct 6, 2025

Uh oh!

ecole41 commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juanrojochacon commented Oct 23, 2025

Uh oh!

juanrojochacon commented Oct 23, 2025

Uh oh!

juanrojochacon commented Oct 23, 2025

Uh oh!

juanrojochacon commented Oct 23, 2025

Uh oh!

jekoorn commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juanrojochacon commented Oct 23, 2025

Uh oh!

enocera commented Oct 23, 2025

Uh oh!

enocera commented Oct 23, 2025

Uh oh!

juanrojochacon commented Oct 23, 2025

Uh oh!

enocera commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juanrojochacon commented Oct 23, 2025

Uh oh!

enocera commented Oct 23, 2025

Uh oh!

enocera commented Oct 28, 2025

Uh oh!

jekoorn commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juanrojochacon commented Oct 31, 2025

Uh oh!

juanrojochacon commented Oct 31, 2025

Uh oh!

juanrojochacon commented Oct 31, 2025

Uh oh!

juanrojochacon commented Oct 31, 2025

Uh oh!

jekoorn commented Oct 31, 2025

Uh oh!

juanrojochacon commented Oct 31, 2025

Uh oh!

jekoorn commented Nov 4, 2025

Uh oh!

ecole41 commented Nov 6, 2025

Uh oh!

enocera commented Nov 6, 2025

Uh oh!

juanrojochacon commented Nov 6, 2025

Uh oh!

enocera commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juanrojochacon commented Nov 6, 2025

Uh oh!

scarlehoff commented Nov 12, 2025

Uh oh!

ecole41 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

scarlehoff Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ecole41 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ecole41 commented Oct 6, 2025 •

edited

Loading

ecole41 commented Oct 23, 2025 •

edited

Loading

jekoorn commented Oct 23, 2025 •

edited

Loading

enocera commented Oct 23, 2025 •

edited

Loading

jekoorn commented Oct 31, 2025 •

edited

Loading

enocera commented Nov 6, 2025 •

edited

Loading