-
Notifications
You must be signed in to change notification settings - Fork 13
Remove libNNPDF::LHAPDFSet for a vp-based class #1433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove libNNPDF::LHAPDFSet for a vp-based class #1433
Conversation
|
The difference in speed is not relevant for most cases because I thought "well, 1 seconds vs 2 seconds, who cares" only this difference together with the difference in convolution adds up enough to be a pain. I'll see what I can do. For starters the "I'll do the convolution and then the central value" every time is probably adding a factor of two, I'll do it such that the convolution is done just once (no more computing the central value separately), play a bit more with I will also greatly modify the |
1bdd7d8 to
2c58d49
Compare
|
After a few benchmarks I'm actually happy with the speed of the |
|
I wonder if we want to have this c++ compatibility layer here or instead remove the one remaining case which is the t0 predictions. |
|
I haven't looked at the t0 code so I don't know how many other things need to be changed as well for that to be "extracted" from C++. |
|
I don't think there is that much to do in regards to the t0: we already have the t0covmat in pure python https://docs.nnpdf.science/vp/pydataobjs.html?highlight=dataset_inputs_covmat_from_systematics#loading-covariance-matrices so it would be a matter of putting it in the right place. That said I am coming to think doing it in two times might be good, on the grounds that one part is already done and would result in easier to review PRs. |
4e8c31f to
0bad6c1
Compare
fa7913a to
b872fda
Compare
|
For the missing test I I think will need to re-do the "next exponent runcard" with this branch because of moving from float32 to float64. |
Zaharid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to comment on this.
| # At this point only the result of the ThPredictions is a vp object | ||
| # which includes a special CV column which needs to be pop'd | ||
| try: | ||
| central_value = dataobj.pop("CV") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mutating the object like this will be dangerous/confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is probably due to my own limitations with pandas, so I'll write here "what I want" and let's see whether there's a way of doing this (/cc @Zaharid)
Right now when the predictions come from libNNPDF they are an object that contain a "central value" and "data" and these are two different quantities.
In vp instead the predictions are a just a pandas dataframe from which one cannot just get the cv because the central value for MC pdfs is not the same central value of hessian PDFs so my solution has been to add a new CV column with the "true" central value.
The problem is that I don't want that extra column to be part of the "data" of the prediction.
Is it clear / is there a solution to this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An important constraint I need to include here is that I need the computation of the central value and the envelope to happen at the same time (because it is faster to do fktable x 101 rather than fkatble x 100 and later fktable x 1, so at some point the CV will have to be included in the dataframe).
The way I'm doing that is more or less clear in the example in class PDF.
Maybe (probably) the solution is to make a copy of the dataframe at this point so that it doesn't get mutated but it's separated into CV and the envelope of replicas.
|
Sorry I have been slow here. In general I believe we should be doing something simpler. I would note that all this code is supposed to replace from the POV of vp is the For that, adding an interface with a stateful context manager looks a bit much. For one an explicit parameter to Another consideration is that |
|
I think these are different problems however:
Sadly, for now it needs to be done like that since the libNNPDF and vp version have to coexist.
I agree that the whole context manager since is horrible (not because of the context manager, but the mutability). But as I said, I'm not entirely sure how to do better without changing many different things, since:
And, most importantly, the point at what nnpdf/validphys2/src/validphys/convolution.py Line 293 in cd64fbc
include_central across many different levels or I use a context manager that activates the central value by any function calling the PDF inside the central value.
To me it was the simplest solution but I will be happy with any other ideas however they should be more or less concrete. I know this is not the best, but the We can discuss during the code meeting if this was not clear enough.
This I didn't understand? I think Personally I'm not unhappy with the class |
|
So the final proposal is:
I will close this because it is probably easier to do a new one. Also I would want to do first t0 in python and then lhapdfset. |
|
Wait isn't the python t0 code here? nnpdf/validphys2/src/validphys/covmats.py Line 243 in 3519070
|
|
But libNNPDF is still used so not sure how many steps are missing from python (as said before, I haven't look at the cpp code for t0, same is still true :P) |
|
What part uses libNNPDF? AFAICT this is using the python CommonData objects as well as the covmat generation code in python. The only difference is it multiplies the MULT uncertainties with the t0 central prediction which is also computed using python convolution. |
|
Here for instance libNNPDF is used nnpdf/validphys2/src/validphys/covmats.py Line 488 in 3519070
|
|
But these are the old functions right, in principle you don't need them |
|
I don't know whether they are used somewhere but they do exist. I can start by removing them, if nothing breaks that the easiest solution for me of course. |
@siranipour What do you mean by that? They are used and tested for. Maybe they are all "zombies" but I haven't used the internals of the vp covmat module to feel comfortable removing big chunks of code... specially if they make all test fail. The "offending" functions are: nnpdf/validphys2/src/validphys/covmats.py Line 476 in d68954e
nnpdf/validphys2/src/validphys/covmats.py Line 510 in d68954e
(the last one given the TODO on top I guess can be removed) |
Turns out it wasn't such a low-hanging fruit, but closes #1420
Some caveats (the reasons why this is still a draft):
as_libNNPDF()call.grid_valuesin python is not as slow as I initially thought it will be. I just tried with the most naive loop-nesting and it was actually reasonable, but I need to actually benchmark it. If it happens to be really slow for most use cases I have a few solutions in mind that will work for us, but maybe it is not even necessary.RE the current failure in the tests: is due to using float64 thought, when using vp-nextfitruncard there is a difference in the 4th digit. Reasonable. Maybe more failures will occur as I move forward though.
TODO:
pineapplDespite being marked for review, it cannot be merged until the example resources are updated.