-
Notifications
You must be signed in to change notification settings - Fork 205
Description
Describe the bug
After instantiating a HugeCTR model and attempting to use cudf after, encounter a segmentation fault.
The scenario we've encountered this is in two separate pytest test functions. The first is one that creates a HugeCTR model (which runs ok). The second is something else that uses cudf. This test passes in isolation. However, if the test involving HugeCTR is run first, the second test fails with a segmenataion fault.
Presumably there is some global state that needs to be cleaned up that is not automatically happening as a result of the model reference going out-of-scope.
Steps To Reproduce
- Instantiate a HugeCTR model and fit it on a dataset
- delete the model object (pytest cleanup)
- (in the same python session) use
cudf.DataFrame.from_pandas(x) - Segmentation fault encoutered in a
from_arrowfunction in cudf
import hugectr
import cudf
def test_hugectr_model():
model = hugectr.Model(...)
...
model.compile(...)
model.fit(...)
def test_something_else():
....
cudf.DataFrame.from_pandas(x)Expected behavior
No segmentation fault calling cudf in a context where the HugeCTR model should be cleaned up.
Additional context
I don't currently have an environment where I can reproduce this locally (#337 ), so the reproduce steps are potentially not the most minimal.
Examples of this segmentation fault can be found in the nvidia-merlin-bot comments on this PR NVIDIA-Merlin/systems#129