Skip to content

Conversation

@tayjohno
Copy link

@tayjohno tayjohno commented Oct 6, 2025

List.Extra.uniqueBy apparently is not very performant with very long lists. The issue here is that the way it works is by iterating over every item in the list O(n) and then for each one, it adds it to a List of "known" values. If that value already exists in the known values (List.member) then it will NOT carry it over to the final list. List.member, however, is a O(n) operation itself, so that leads to this essentially being O(n^2). This seems to tip over around 10-20k rows where it becomes noticeably slow. I was observing upwards of 40 seconds on my mac for lists of 100k.

My alternative to this implementation is pretty simple, in fact it's the exact same algorithm, just swapping out the List of known values and List.member check with a Set of known values and a Set.member check.

Without going into the Javascript implementation of core Elm, I'm guessing that Set membership is probably O(n log n).

For comparison, here's the old implementation:

List.Extra.uniqueBy

-- https://github.com/elm-community/list-extra/blob/8.7.0/src/List/Extra.elm#L448
uniqueBy : (a -> b) -> List a -> List a
uniqueBy f list =
    uniqueHelp f [] list []

-- https://github.com/elm-community/list-extra/blob/8.7.0/src/List/Extra.elm#L474 
uniqueHelp : (a -> b) -> List b -> List a -> List a -> List a
uniqueHelp f existing remaining accumulator =
    case remaining of
        [] ->
            List.reverse accumulator

        first :: rest ->
            let
                computedFirst =
                    f first
            in
            if List.member computedFirst existing then
                uniqueHelp f existing rest accumulator

            else
                uniqueHelp f (computedFirst :: existing) rest (first :: accumulator)

Note that they essentially follow the same algorithm aside from the presence check being a List.member

@tayjohno tayjohno requested review from a team October 6, 2025 22:25
Copy link

@wstrinz wstrinz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a bit nerd sniped by this now that Chat GPT can help me think through asymptotics 😅 https://chatgpt.com/share/e/68e53e47-d1c0-8007-9e8c-8f50945ceed3

Turns out List.extra actually had a similarly performant version but it was removed because it required a comparable input and the new version benched ok... on small lists 🤦

Also Chat thinks the new version is actually O(n log n) , which of course is an improvement and fine for our purposes, and apparently the best you can do without importing other libraries. But if we wanted to get it to O(n) we could potentially use HashSets with something like

import HashSet as HS

uniqueByHash : (a -> k) -> (k -> comparable) -> List a -> List a
uniqueByHash toKey hash =
    let
        step x (seen, acc) =
            let k = toKey x in
            if HS.member k seen then
                ( seen, acc )
            else
                ( HS.insert k seen, x :: acc )
    in
    \xs -> xs |> List.foldl step (HS.empty hash, []) |> Tuple.second |> List.reverse

@wstrinz wstrinz requested a review from a team October 7, 2025 16:29
@tayjohno
Copy link
Author

tayjohno commented Oct 7, 2025

Turns out List.extra actually had a similarly performant version but elm-community/list-extra#151 because it required a comparable input and the new version benched ok... on small lists 🤦

Oh no! 😭

Also Chat thinks the new version is actually O(n log n) , [...]

You're totally right @wstrinz / ChatGPT. I thought that was what I put, but I guess not. 😅

@stevedrip
Copy link

Alright. Lets do it. 40% Better performance on the range of 0-100 sized Lists and being able to get uniqueness on any kind of element is worth the 30% drop in performance on >200 sized Lists.

🤦 Elm... it drives me crazy when languages/libraries opt you in automatically to the asymptotically worse but usually faster implementation

I was hoping to find something like SortedSet for this use case but don't see anything compelling from Elm
https://docs.oracle.com/javase/8/docs/api/java/util/SortedSet.html

@tayjohno tayjohno merged commit 8cfb702 into master Oct 7, 2025
1 check passed
@tayjohno tayjohno deleted the fix-much-select-performance branch October 7, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants