Add support and unit test for PyOD models#34709
Conversation
Additionally, it includes: - A minor fix for error messages in the `specifiable` module. - Support for scoring offline detectors on a subset of features.
0c1f2fc to
b838cee
Compare
| class PyODModelHandler(ModelHandler[beam.Row, | ||
| PredictionResult, | ||
| PyODBaseDetector]): | ||
| """Implementation of the ModelHandler interface for PyOD [#]_ Models. |
There was a problem hiding this comment.
Oh does it create a footnote?
There was a problem hiding this comment.
Yes. That's for footnote.
https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#footnotes
There was a problem hiding this comment.
| raise unittest.SkipTest('PyOD dependencies are not installed') | ||
|
|
||
|
|
||
| class PyODIForestTest(unittest.TestCase): |
There was a problem hiding this comment.
I think these tests are skipped because you need to add the dependencies to
Line 485 in 238233d
There was a problem hiding this comment.
Good catch! Let me add pyod there.
| from apache_beam.ml.inference.base import KeyedModelHandler | ||
| from apache_beam.ml.inference.base import ModelHandler | ||
| from apache_beam.ml.inference.base import PredictionResult | ||
| from apache_beam.ml.inference.base import _PostProcessingModelHandler |
There was a problem hiding this comment.
Should we remove underscores from these (e.g. _PostProcessingModelHandler -> PostProcessingModelHandler) since they are being imported?
There was a problem hiding this comment.
That one cannot be changed, because in the following code the class will have to be turned into specifiable so that we can create a spec (see https://github.com/apache/beam/blob/release-2.64/sdks/python/apache_beam/ml/anomaly/transforms_test.py#L336) for the whole model handler object without pickling it.
| seed = 1234 | ||
| model = IForest(random_state=seed) | ||
| model.fit(self.get_train_data()) | ||
| self.tmp_fn = os.path.join(self.tmp_dir, 'iforest_pickled') |
There was a problem hiding this comment.
Can we rename tmp_fn pickled_model_uri?
| assert self._offline_detector._features is not None | ||
| k, v = elem | ||
| row_dict = v._asdict() | ||
| return k, beam.Row(**{k: row_dict[k] for k in self._offline_detector._features}) # pylint: disable=line-too-long |
There was a problem hiding this comment.
Any reason why we cant split this to two lines? Also here it looks a bit strange to access _features (with underscore) on _offline_detector.
There was a problem hiding this comment.
Somehow the formatter does not split the original line. Let me see if this can be fixed.
There was a problem hiding this comment.
Adding a pair of brackets seems to make the formatter render it differently.
There was a problem hiding this comment.
Also here it looks a bit strange to access _features (with underscore) on _offline_detector.
Ack. I think it is fine as it is referred in the same package.
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
047dac4 to
8f1f965
Compare
|
@claudevdm: Tests are all good now. Could you please take another look? Thanks! |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @tvalentyn for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Additionally, it includes:
specifiablemodule.