-
Notifications
You must be signed in to change notification settings - Fork 74
Ingest rules optimization #3075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Ingest rules optimization #3075
Conversation
| defp stringify(v), do: inspect(v) | ||
|
|
||
| # Groups all the rules associated with source by path in a tree | ||
| def build(rules) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be quite expensive, should benchmark this as well to see what kind of ips we are looking at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also will need some docs + illustrative code examples on how the rules tree structure.
something to consider is also that Iteraptor has a to_flatmap function, which might be an easier to work with without having much nesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something to consider is also that Iteraptor has a to_flatmap function, which might be an easier to work with without having much nesting
The nesting is intended as an optimization, grouping all the paths in rules to ensure each key is accessed at most once. I've tried using iteraptor to generate this data structure via from_flatmap, but it tries to convert keys to existing atoms without an option to disable that behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let me know if forking the repo to get that change in is needed, we can do that if it is simpler
| defp match_cond(true, rule_ids, acc), do: accumulate(rule_ids, acc) | ||
| defp match_cond(false, _rule_ids, acc), do: acc | ||
|
|
||
| defp accumulate([], acc), do: acc | ||
| defp accumulate([h | tail], acc), do: accumulate(tail, accumulate(h, acc)) | ||
|
|
||
| defp accumulate({rule_id, _filters_num}, acc) when is_map_key(acc, rule_id), | ||
| do: %{acc | rule_id => acc[rule_id] - 1} | ||
|
|
||
| defp accumulate({rule_id, filters_num}, acc), do: Map.put(acc, rule_id, filters_num - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not the easiest to read and understand, this module would need some docs for clarity and posterity.
| |> Enum.flat_map(fn | ||
| {id, 0} -> [id] | ||
| {_id, _matches_left} -> [] | ||
| end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of flatmapping to get the keys, matching_rule_ids/2 can return a tuple instead. might not even need to return the map, maybe just list of IDs would do, then we can avoid the iteration entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I had initially. The problem is that to match a rule, all FilterRules must match. I started by returning a list and calculating the number of matches, but it was slower for a large number of matches and roughly the same for a few matches.
A problematic case is the nested objects with a list, as one FilterRule may match multiple times, breaking my approach 😢 I will need to fix that next week, I'm thinking of using bit flags
| %{{:get, k} => acc} | ||
| end) | ||
| end | ||
| |> Enum.reduce(&deep_merge/2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: conversion to lists instead of maps cuts down memory usage and slightly increases performance
3952919 to
40d41d6
Compare
af9e3be to
08f82df
Compare
No description provided.