-
Notifications
You must be signed in to change notification settings - Fork 9
MAINT: get orsopy working in 3.12. Closes #115 #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
HI @andyfaff , What do we gain from using dataclass at all, if we need a custom init function for bascially every class? Anyway, I'll have a look at it and will also check if it influences functionality anywhere. |
|
Unfortunately the stdlib dataclasses (from
However, in cp312 the stdlib
This means that our It may be possible to make a shim for this particular use case, but it's a fragile solution. I don't think we should be depending on our knowledge of the internals of dataclasses, or on private functions/attributes, because it's inherently subject to change outside our control. Every time a new Python version comes out we have to hope that nothing has changed (e.g. look at So, how is a solution reached? The first solution could be to vendor the This PR takes a different route. Instead of using an e.g. for In this way we can add extra kwds without any issues. The
It's basically the same |
| """ | ||
| ** NOTE ** | ||
| This script is extremely sensitive to the version of pydantic that is installed. | ||
| It works with pydantic-1.10.13 and lower, but not with pydantic >= 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the bump to pydantic 2.0 is basically a complete rewrite of pydantic. They broke a bunch of things that used to make dataclasses similar in functionality to pydantic BaseModel. I think pydantic.dataclasses are not the top priority for them so they won't be brought back up to feature-parity with 1.x for a little while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spoke too soon... looks like they have made everything work really well! I hadn't tried it for a while. All we have to do is
from pydantic import TypeAdapter
...
schema.update(TypeAdapter(Orso).json_schema())|
What about this as the init function for Header, inherited by all the subclasses? It seems to work. Then you don't have to define an def __init__(self, *args, **kwargs):
self._orso_optionals = []
own_fields = fields(self)
own_field_names = [f.name for f in own_fields]
field_index = 0
for a in args:
# field order is guaranteed
setattr(self, own_field_names[field_index], a)
field_index += 1
for (k, v) in kwargs.items():
setattr(self, k, v)
if not hasattr(self, 'comment'):
self.comment = kwargs.get('comment', None)
self.__post_init__()(I'm not sure if the special handling for comment is required) |
|
Update: this class Header:
"""
The super class for all of the items in the orso module.
"""
_orso_optionals: List[str] = []
def __init__(self, *args, **kwargs):
self._orso_optionals = []
own_fields = list(fields(self))
for a in args:
# field order is guaranteed
field = own_fields.pop(0)
setattr(self, field.name, a)
# process the rest of own_fields, in order:
for field in own_fields:
if field.name in kwargs:
field_value = kwargs.pop(field.name)
elif field.default_factory is not MISSING:
field_value = field.default_factory()
elif field.default is not MISSING:
field_value = field.default
else:
raise ValueError(f"missing positional argument \"{field.name}\" for class {self.__class__.__name__}")
setattr(self, field.name, field_value)
# process any remaining kwargs:
for (k, v) in kwargs.items():
setattr(self, k, v)
self.__post_init__()I am curious why extra kwargs are added as class attributes for every class except Orso? Why do we use the specially-created attribute Attached is a diff, that if applied to this PR (in its current state) makes it pass all the tests. |
I tried the patch. There are pros and cons. On the one hand it makes all the classes derived from There would seem to be a tradeoff here. Personally I would prefer to see the correct signature than remove the boilerplate. What are your thoughts?
I can't remember why it's called |
e.g. would give something like: which would be better than nothing |
|
I agree, the lack of meaningful signatures is not great. This problem we're trying to solve, creating classes where core attributes are strongly typed and documented, but where extra attributes can be added (and serialized, and deserialized!), is hard. |
|
The advantage of being able to cope with extra attributes can be important in versioning orsopy. For example, we could add new attributes in a later orsopy. Files created by that newer version should still be loadable by an older version that copes with extra attributes. I propose:
|
|
I would be against merging this PR, we should have an extended discussion about this as it is the basis for all orsopy classes. (And it might even break genx or other software the builds on orsopy by extending classes.) I have a few main objections with this solution:
For me there are two alternative routs that might be more practicle:
I'm sorry to slow you down at this point. I'm aware that this is just based on my opinion and if everyone disagrees you can obviously go ahead in this direction. |
|
I hope you have a good holiday. Given that you'll be away before I can reply in full, would the genx test suite pick up any issues if we were to go down this path? |
|
I think I agree with Artur that we should rethink our approach. It probably merits a longer discussion, when we're all back from the holidays. I think dataclasses provide some helpful features for defining a schema in python, but they are intentionally designed to not support arbitrary init kwargs, since it's kind of the point of a dataclass to define all your fields. I also don't understand why downstream users couldn't inherit from Header subclasses and add their own fields, instead of tacking attrs to existing classes. For the migration problem (supporting future changes to the schema), I think we'd be much better served by developing migration functions, which is a fairly standard way to handle schema changes. The likelihood that future changes to the ORSO standard structure would be purely additive or subtractive of fields is pretty small, and I think that is the only type of schema change we could try to accomodate with arbitrary kwargs. I know I am likely wrong about some of my concerns above, but my vote is to hold off on any refactor for now. There's not even e.g. a numba wheel for python 3.12 yet so I don't feel like there's a rush to have complete support for it at the moment. |
|
What do you think of a solution like this: (found on stackoverflow)... instead of trying to force extra kwargs into dataclass init functions (which is hard and fragile), how about adding a new method "from_kwargs" for instantiating one of our classes with extra attributes? https://stackoverflow.com/a/55101438 We could use that function in the deserializer when reading |
|
This could be a solution that could also be handled with inheritance for most cases without having to write a custom init for every single case. We will have to discuss how to treat the "comments" case. I'll have a look at the code a gain and will try some prototyping. I think after diving back into the code I can better weigh the various pros and cons, too. @andyfaff @bmaranville @arm61 |
|
I've made a test implementation, have a look at #120 |
|
Hi Artur, |
|
Resolved by #120 |
Adds a whole bunch of boilerplate in init functions, but would get orsopy working on 3.12.