Skip to content

Conversation

@ischoegl
Copy link
Member

@ischoegl ischoegl commented Aug 13, 2019

Pickle support for Solution objects would facilitate multiprocessing applications, where each thread/process is working with a separate copy of a Solution object. The approach is comparable to constructing the object from an input file plus setting phase and state vector.

Closes Cantera/enhancements#10

The PR illustrates a rudimentary implementation (XML/CTI support only), which should not be considered for adoption (at least not in its present form).now replaced by a permanent solution using YAML serialization

It allows for the following scenario (obviously, multiprocessing applications are more relevant):

import cantera as ct
import pickle

gas = ct.Solution('h2o2.cti', 'ohmech-multi')
gas.TPX = 500, 500000, 'H2:.75, O2:.25'
pickle.dump(gas, open('gas.pkl', 'wb'))

Recreating the object in a separate session via

import pickle
gas = pickle.load(open('gas.pkl','rb'))
gas()

yields:

 ohmech-multi:

       temperature             500  K
          pressure          500000  Pa
           density         1.14397  kg/m^3
  mean mol. weight          9.5115  amu

                          1 kg            1 kmol
                       -----------      ------------
          enthalpy      6.2484e+05        5.943e+06     J
   internal energy      1.8776e+05        1.786e+06     J
           entropy           16391        1.559e+05     J/K
    Gibbs function     -7.5708e+06       -7.201e+07     J
 heat capacity c_p          3127.2        2.974e+04     J/K
 heat capacity c_v            2253        2.143e+04     J/K

                           X                 Y          Chem. Pot. / RT
                     -------------     ------------     ------------
                H2           0.75         0.158965         -14.8055
                O2           0.25         0.841035           -24.87
     [   +7 minor]              0                0

The current implementation for pickling/unpickling makes use of the static XML tree that is retained after creation of a Solution from CTI/XML files. A more useful (but significantly more involved) implementation would generate YAML strings dynamically.

Changes proposed in this pull request:

  • N/A: at least at this point, the PR serves for illustration purposes only

Known issues:

  • The current implementation will fail if the Solution object is changed after instantiation (e.g. species are added, etc.)
  • Unpickling requires that an empty constructor, i.e. _SolutionBase() is allowed. While this works fine for pickle.load, an attempt for gas = Solution() results in a segmentation fault. ... fixed

@speth
Copy link
Member

speth commented Sep 28, 2019

As I mentioned in the thread on #570, implementation of YAML-based serialization is planned, which will work for objects regardless of how they were created and how they have been manipulated.

With the introduction of the YAML format, deprecation and removal of the CTI and XML formats is also planned, so I wouldn't recommend putting any significant effort into features based around these formats. Beyond that, I'm not sure whether there's a reason to keep this PR open.

@ischoegl
Copy link
Member Author

ischoegl commented Sep 28, 2019

Agreed. I simply wanted to supply an example for pickling (which was surprisingly simple as the XML tree was available - 40ish lines of code!). As I wrote

The PR illustrates a rudimentary implementation (XML/CTI support only), which should not be considered for adoption (at least not in its present form).

I’d still like to keep this PR open until the core of the code can be switched over to YAML. Ultimately, I hope that this feature can be implemented. See also #696, which i believe is a necessary step in that direction.

@ischoegl ischoegl changed the title WIP: investigate pickle serialization of Solution objects [stub] investigate pickle serialization of Solution objects Sep 28, 2019
@ischoegl ischoegl force-pushed the investigate-pickle-support branch from 6e94366 to ab1ee88 Compare October 24, 2019 03:14
@ischoegl ischoegl force-pushed the investigate-pickle-support branch from ab1ee88 to 12df33a Compare December 7, 2019 01:28
@ischoegl ischoegl force-pushed the investigate-pickle-support branch from 12df33a to e7116fa Compare January 10, 2020 02:30
@ischoegl ischoegl force-pushed the investigate-pickle-support branch from e7116fa to b3cebac Compare January 28, 2020 15:11
@ischoegl ischoegl marked this pull request as draft May 29, 2020 23:19
@speth speth changed the base branch from master to main June 30, 2020 23:13
@ischoegl ischoegl force-pushed the investigate-pickle-support branch 2 times, most recently from 45f1a81 to 49a99af Compare September 3, 2020 02:21
@ischoegl ischoegl force-pushed the investigate-pickle-support branch from 49a99af to 2647667 Compare February 9, 2021 01:21
@ischoegl
Copy link
Member Author

ischoegl commented Mar 9, 2021

Linking #984, which will create the basis for a permanent implementation.

@ischoegl ischoegl force-pushed the investigate-pickle-support branch from 2647667 to 013e03c Compare April 15, 2021 15:57
@ischoegl ischoegl force-pushed the investigate-pickle-support branch from 3b56eb4 to 147f49a Compare April 16, 2021 17:40
@ischoegl ischoegl changed the title [stub] investigate pickle serialization of Solution objects Pickle serialization of Solution objects Apr 16, 2021
@ischoegl
Copy link
Member Author

ischoegl commented Apr 16, 2021

🎉 ... unsurprisingly, thanks to @speth's recent addition of #984, this became even simpler.

PS: Enabling pickling of SolutionArray looks quite feasible, but I'd rather wait for comments at the moment.

@ischoegl ischoegl marked this pull request as ready for review April 16, 2021 17:54
@ischoegl ischoegl requested a review from speth April 16, 2021 17:54
@tsikes
Copy link
Contributor

tsikes commented Apr 16, 2021

This would be a wonderful addition. I ran into the problem of not being able to do this when I first wanted to use multiprocessing to speed up calculations for optimization. I know that others have had the same issue based on the Cantera google group. The current method of using Cantera in a multiprocessing application is rather unintuitive and finicky. This PR would greatly simplify this task for other users.

@ischoegl ischoegl marked this pull request as draft June 10, 2021 15:31
@ischoegl
Copy link
Member Author

ischoegl commented Jun 10, 2021

Kicking this back to draft - the implementation depends on the ability to create a Solution that is unconfigured, so it can be overwritten by __setstate__. What is implemented thus far works fine for the pickle case, but deactivates errors for calls of Solution() without arguments. Having the _SolutionBase in a safe state (i.e. no Segfaults) will take some more work.

PS: I probably won't work on this unless I hear back from one of the @Cantera/committers, i.e. comments are explicitly welcome.

@speth
Copy link
Member

speth commented Jun 14, 2021

I think it should be possible to instantiate an object of the ThermoPhase base class if no other information is available to instantiate the phase, like we do with the Kinetics and Transport objects. In that case, you should just get a phase with no species where most of the methods raise NotImplementedError, which would probably be fine.

I just tried using this in a modified version of the multiprocessing example, and it seems to work better than I had initially expected. If you use itertools.repeat to pass the Solution object as part of the iterable used by Pool.starmap for example, the default chunking behavior results in the Solution being pickled and unpickled 4 times for each worker process, which is not as bad as I had feared (I thought it might end up doing it once for each iteration). So given the easier user experience, I'll withdraw my earlier objection (Cantera/enhancements#10) to implementing this feature at all.

I'm glad the implementation needed here for standalone Solution objects is pretty straightforward. I'm not sure what's going to happen when you try to serialize a an Interface phase where you have to correctly link the neighboring phases.

@ischoegl
Copy link
Member Author

ischoegl commented Jun 14, 2021

I think it should be possible to instantiate an object of the ThermoPhase base class if no other information is available to instantiate the phase, like we do with the Kinetics and Transport objects.

@speth Could you clarify? None of these are currently instantiable without arguments. E.g. (using main)

In [3]: kin = ct.Kinetics()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-6f9f3674fb0c> in <module>()
----> 1 kin = ct.Kinetics()

/src/interfaces/cython/cantera/base.pyx in cantera._cantera._SolutionBase.__cinit__()
     64             self._init_parts(thermo, species, kinetics, adjacent, reactions)
     65         else:
---> 66             raise ValueError("Arguments are insufficient to define a phase")
     67 
     68         # Initialization of transport is deferred to Transport.__init__

ValueError: Arguments are insufficient to define a phase

As far as I can tell, all of these are limitations of the implementation of the Python interface, and not necessarily about the underlying C++ structure. The crux here is that half of the time, the required Cxx* references remain uninitialized, which leads to segfaults left and right.

@speth
Copy link
Member

speth commented Jun 14, 2021

Sorry for the confusion. What I meant was setting the thermo attribute of a _SolutionBase object to point to an instance of the base class that has been instantiated, e.g. the equivalent of new ThermoPhase(). This would be similar to how we never leave the _SolutionBase.kinetics or _SolutionBase.transport members as null pointers after initialization (at least for objects that derive from the corresponding Python classes), but have them point to objects of these base classes.

Probably the easiest way to do this would be to add this as a case for ThermoFactory, the same way that using KineticsFactory to create an object with the model none just instantiates the base class, at which point the existing _init_parts method would just about work (although I think we'd still need to require species input except when the thermo model is none).

@speth speth removed their request for review June 16, 2021 03:07
@ischoegl
Copy link
Member Author

ischoegl commented Dec 8, 2021

@speth ... interesting indeed. Could you post the code you used? (Edit: I may have overanalyzed the 'serialized' here. If it's just accessing the restored object I can probably take it from here)

@speth
Copy link
Member

speth commented Dec 8, 2021

Yes, that's all I did:

gas0 = ct.Solution('gri30.yaml')
with open('test.pkl', 'wb') as out:
    pickle.dump(gas0, out)

with open('test.pkl', 'rb') as infile:
    gas1 = pickle.load(infile)

print(gas1.thermo_model)
print(gas1.kinetics_model)
print(gas1.transport_model)  # segfaults here

@ischoegl ischoegl force-pushed the investigate-pickle-support branch from ddcef88 to 2be4f40 Compare December 9, 2021 01:03
@ischoegl
Copy link
Member Author

ischoegl commented Dec 9, 2021

@speth ... fixed. The bug was hiding in plain sight ... one of the derived class constructor that Python calls (Transport.__init__) usually takes care of setting the C++ transport object. That this doesn't happen here is obvious in hindsight ... it's probably a good idea to consolidate all those calls in base.pyx once CTI/XML are gone.

I also updated the unit test to ensure that the model is properly set (including "None", which is not the default).

@ischoegl ischoegl requested a review from speth December 9, 2021 01:45
Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ischoegl. I'm glad to see this working based on YAML serialization. I had just a few comments on the implementation.

@ischoegl ischoegl force-pushed the investigate-pickle-support branch 2 times, most recently from c91d0d2 to 7e3a516 Compare December 14, 2021 16:00
Also, catch edge cases that provide insufficient information for
phase instantiation.
@ischoegl ischoegl force-pushed the investigate-pickle-support branch from 7e3a516 to 4758541 Compare December 14, 2021 16:23
@ischoegl ischoegl force-pushed the investigate-pickle-support branch from 4758541 to e3a7e28 Compare December 14, 2021 16:43
@ischoegl
Copy link
Member Author

@speth … thank you for the careful review! I believe I took care of everything. I ended up disabling Kinetics() and Transport() as I don’t see a current use case (it was surprisingly hard to catch these due to the __slots__). I also clarified some of the edge cases you pointed out.

@ischoegl ischoegl requested a review from speth December 15, 2021 00:52
Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. This looks good to me.

@speth speth merged commit 50144ab into Cantera:main Dec 15, 2021
@ischoegl ischoegl deleted the investigate-pickle-support branch December 15, 2021 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support of pickle serialization of Cython objects in python API

3 participants