fix(workflow-engine): Handle non-numeric DataCondition comparisons in combined-rules serializer#111743
fix(workflow-engine): Handle non-numeric DataCondition comparisons in combined-rules serializer#111743
Conversation
… combined-rules serializer
| # We need: for a given detector's DCGs + priority level → matching DCG IDs | ||
| # NOTE: Assumes DataConditions are limited to what would be dual written. | ||
| dcg_comparison_pairs: dict[int, set[int]] = defaultdict(set) | ||
| dcg_comparison_pairs: dict[int, set[int | float]] = defaultdict(set) |
There was a problem hiding this comment.
Unrelated to this PR - should we try to simplify the logic here and split out 3 different DataCondition validators? (Detector Triggers, Workflow Triggers, and Workflow Actions). Since we invoke them separately for each case already, might be easy to adopt as well.
That would allow us to have more specific typing / validation, for example detector triggers could always have a DetectorPriorityLevel as condition_result.
There was a problem hiding this comment.
We really need some simplification here, though I'm not sure splitting out different serializers helps us. We use this in exact one place to generate triggers, but since the relevant data is not really found on the DataCondition, it ends up doing lots of weird stuff and post-annotation to get the proper data in place.
I'm gonna look at a simplification pass next week, because I very much dislike trying to wade through this.
| # Only collect numeric comparison values; non-numeric values (e.g. dicts | ||
| # from anomaly detection conditions) don't match condition_result levels. |
There was a problem hiding this comment.
Won't this create a problem where we're silently skipping single written anomaly detection data conditions? Maybe that's a problem for later if we're only concerned with dual written ones for now, but if it is we should make a ticket to address later.
There was a problem hiding this comment.
This is a general problem here, though not entirely related to this change.
This is trying to use comparison value to later join the workflow actions with the detector trigger thresholds, but new data doesn't need to be structured that way.
By adding our check here, we don't lose any relevant DCGs, but this general join approach (and he layout of our target encoding) don't really work for a variety of situations, being incomplete at best.
Created ISWF-2329 to track.
The WorkflowEngineDataConditionSerializer assumed all DataCondition comparison values in workflow DCGs were numeric, which holds for dual-written rules but not for single-written rules where anomaly detection conditions can store dict comparisons. Filter to numeric values only when building the comparison lookup.
Fixes SENTRY-5MHW.