Fixing uniqueness performance #264
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
List.Extra.uniqueByapparently is not very performant with very long lists. The issue here is that the way it works is by iterating over every item in the listO(n)and then for each one, it adds it to aListof "known" values. If that value already exists in the known values (List.member) then it will NOT carry it over to the final list.List.member, however, is aO(n)operation itself, so that leads to this essentially beingO(n^2). This seems to tip over around 10-20k rows where it becomes noticeably slow. I was observing upwards of 40 seconds on my mac for lists of 100k.My alternative to this implementation is pretty simple, in fact it's the exact same algorithm, just swapping out the
Listof known values andList.membercheck with aSetof known values and aSet.membercheck.Without going into the Javascript implementation of core Elm, I'm guessing that Set membership is probably
O(n log n).For comparison, here's the old implementation:
List.Extra.uniqueBy
Note that they essentially follow the same algorithm aside from the presence check being a
List.member