Fixing uniqueness performance #264

tayjohno · 2025-10-06T22:16:33Z

List.Extra.uniqueBy apparently is not very performant with very long lists. The issue here is that the way it works is by iterating over every item in the list O(n) and then for each one, it adds it to a List of "known" values. If that value already exists in the known values (List.member) then it will NOT carry it over to the final list. List.member, however, is a O(n) operation itself, so that leads to this essentially being O(n^2). This seems to tip over around 10-20k rows where it becomes noticeably slow. I was observing upwards of 40 seconds on my mac for lists of 100k.

My alternative to this implementation is pretty simple, in fact it's the exact same algorithm, just swapping out the List of known values and List.member check with a Set of known values and a Set.member check.

Without going into the Javascript implementation of core Elm, I'm guessing that Set membership is probably O(n log n).

For comparison, here's the old implementation:

List.Extra.uniqueBy

-- https://github.com/elm-community/list-extra/blob/8.7.0/src/List/Extra.elm#L448
uniqueBy : (a -> b) -> List a -> List a
uniqueBy f list =
    uniqueHelp f [] list []

-- https://github.com/elm-community/list-extra/blob/8.7.0/src/List/Extra.elm#L474 
uniqueHelp : (a -> b) -> List b -> List a -> List a -> List a
uniqueHelp f existing remaining accumulator =
    case remaining of
        [] ->
            List.reverse accumulator

        first :: rest ->
            let
                computedFirst =
                    f first
            in
            if List.member computedFirst existing then
                uniqueHelp f existing rest accumulator

            else
                uniqueHelp f (computedFirst :: existing) rest (first :: accumulator)

Note that they essentially follow the same algorithm aside from the presence check being a List.member

wstrinz

Got a bit nerd sniped by this now that Chat GPT can help me think through asymptotics 😅 https://chatgpt.com/share/e/68e53e47-d1c0-8007-9e8c-8f50945ceed3

Turns out List.extra actually had a similarly performant version but it was removed because it required a comparable input and the new version benched ok... on small lists 🤦

Also Chat thinks the new version is actually O(n log n) , which of course is an improvement and fine for our purposes, and apparently the best you can do without importing other libraries. But if we wanted to get it to O(n) we could potentially use HashSets with something like

import HashSet as HS

uniqueByHash : (a -> k) -> (k -> comparable) -> List a -> List a
uniqueByHash toKey hash =
    let
        step x (seen, acc) =
            let k = toKey x in
            if HS.member k seen then
                ( seen, acc )
            else
                ( HS.insert k seen, x :: acc )
    in
    \xs -> xs |> List.foldl step (HS.empty hash, []) |> Tuple.second |> List.reverse

tayjohno · 2025-10-07T16:41:47Z

Turns out List.extra actually had a similarly performant version but elm-community/list-extra#151 because it required a comparable input and the new version benched ok... on small lists 🤦

Oh no! 😭

Also Chat thinks the new version is actually O(n log n) , [...]

You're totally right @wstrinz / ChatGPT. I thought that was what I put, but I guess not. 😅

stevedrip · 2025-10-07T17:55:02Z

Alright. Lets do it. 40% Better performance on the range of 0-100 sized Lists and being able to get uniqueness on any kind of element is worth the 30% drop in performance on >200 sized Lists.

🤦 Elm... it drives me crazy when languages/libraries opt you in automatically to the asymptotically worse but usually faster implementation

I was hoping to find something like SortedSet for this use case but don't see anything compelling from Elm
https://docs.oracle.com/javase/8/docs/api/java/util/SortedSet.html

Fixing uniqueness performance

1d688a9

tayjohno requested review from a team October 6, 2025 22:25

wstrinz approved these changes Oct 7, 2025

View reviewed changes

wstrinz requested a review from a team October 7, 2025 16:29

tayjohno merged commit 8cfb702 into master Oct 7, 2025
1 check passed

tayjohno deleted the fix-much-select-performance branch October 7, 2025 19:52

tayjohno mentioned this pull request Oct 7, 2025

Updating version to 0.18.4 #265

Merged

tayjohno mentioned this pull request Dec 12, 2025

Adding a demo dropdown with 100k items for load testing #261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixing uniqueness performance #264

Fixing uniqueness performance #264

Uh oh!

tayjohno commented Oct 6, 2025 •

edited

Loading

Uh oh!

wstrinz left a comment

Uh oh!

tayjohno commented Oct 7, 2025 •

edited

Loading

Uh oh!

stevedrip commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fixing uniqueness performance #264

Fixing uniqueness performance #264

Uh oh!

Conversation

tayjohno commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

List.Extra.uniqueBy

Uh oh!

wstrinz left a comment

Choose a reason for hiding this comment

Uh oh!

tayjohno commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevedrip commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tayjohno commented Oct 6, 2025 •

edited

Loading

tayjohno commented Oct 7, 2025 •

edited

Loading