-
Notifications
You must be signed in to change notification settings - Fork 3
Refactor DataPreprocessor to steps-based pipeline, add CounterDiffTransformer and quickstart config generator #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ough the configuration file, by providing a steps specification instead of a long list of arguments. Extend DataPreprocessor tests with new steps parameter set up. Also ensures the default values of the DataTransformer classes are the same as the defaults in the DataPreprocessor.
…and update tests and test_data (test configuration files) accordingly
… to complement `features_to_exclude`. Update the basic configuration. Fix Docstring in fault_detection_model.py and move initialization of models in fault_detection_result.py here. Fix Docstring in arcana.py.
(cherry picked from commit a4061f8)
…aClipper features_to_exclude and features_to_clip)
…lear up the Hyperopt notebook.
… generate a configuration file.
afe3981 to
8679818
Compare
| return x_ | ||
|
|
||
| def inverse_transform(self, x: pd.DataFrame) -> pd.DataFrame: | ||
| """No-op inverse transformation. Not defined for this class, returns the input as is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be possible to define an inverse transformation by adding up the increments. Would inverse transform not be necessary for visualizing reconstructions and ARCANA-importances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The counter resets and data gap masking loses information, which makes the simple sum of diffs inaccurate (but close?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a plotting function to plot those counter diffs/rates if needed: https://github.com/AEFDI/EnergyFaultDetector/blob/feature-counter-transformer/energy_fault_detector/utils/visualisation.py#L98
At some we probably should restructure the visualisation module to make it easiert to use... :)
…counter diffs/rates if they are present in the reconstruction
Summary
DataPreprocessorto a flexible, steps-based pipeline. This makes it easier to add new steps and data transformers to the framework.CounterDiffTransformerfor robust counter-to-diff/rate conversion (resets, rollovers, gap masking).generate_quickstart_configto create a ready-to-train config (and optional YAML dump).features_to_selectto theColumnSelectoras alternative tofeatures_to_exclude.features_to_clipto theDataClipperas alternative tofeatures_to_exclude.Changes
Example config:
CounterDiffTransformer (new)
Quickstart config generator (new)
Documentation