Skip to content

Fuzzy text search, part 1: algorithm & searching shortcuts#32634

Draft
juli27 wants to merge 1 commit intomusescore:masterfrom
juli27:addFuzzySearchPart1
Draft

Fuzzy text search, part 1: algorithm & searching shortcuts#32634
juli27 wants to merge 1 commit intomusescore:masterfrom
juli27:addFuzzySearchPart1

Conversation

@juli27
Copy link
Contributor

@juli27 juli27 commented Mar 15, 2026

Part of: #15983

This PR implements a reusable fuzzy search proxy model and adds fuzzy search support to the shortcuts list in preferences.

Overview of the Algorithm

  1. Split the user input into words (words are assumed to be separated by whitespace)
  2. Try to find a fuzzy match of each individual word within every candidate item. Matches have an error budget of 1 when the searched word is 2 characters or longer. An additional error is added to the budget every 8 characters.
  3. The resulting items are then sorted by match similarity (amount of errors between the user input and the item). If the amount of errors is equal they are sorted further by whether the match spans a whole word or starts at a word in the item.
    An error can be a missing character, an excess character, different characters, and swapped characters.

The implemented fuzzy search algorithm is an extended version of Seller's variant for string search of the Wagner-Fischer algorithm (this is the algorithm that is used in our implementation of muse::string::levenshteinDistance). It is extended to also allow transposition errors (Damerau-Levenshtein distance).

TO-DO before this PR is ready for review:

  • Improve scoring function (prioritize full-word and start-of-word matches, etc.)
  • Cache scores in model
  • Fix findBestMatch returning wrong startPos
  • Move algorithm to the global module and add unit tests
  • Revise search key of shortcut model
  • optional: Option to highlight matching substrings in search result
  • I signed the CLA
  • The title of the PR describes the problem it addresses
  • Each commit's message describes its purpose and effects, and references the issue it resolves
  • If changes are extensive, there is a sequence of easily reviewable commits
  • The code in the PR follows the coding rules
  • There are no unnecessary changes
  • The code compiles and runs on my machine, preferably after each commit individually
  • I created a unit test or vtest to verify the changes I made (if applicable)

@avvvvve
Copy link

avvvvve commented Mar 18, 2026

This is awesome to see getting some attention! Noting a couple things I noticed below that maybe are already on your radar:

Multi-word search

In a two-word search term, results are returned matching either word, not both. This isn't necessarily wrong but felt a little unexpected. For example...

Searching for 'lock' returns 4 results:
image

Searching for 'lock system' returns way more results, including many that include 'system' but not 'lock':
image

Single word returns too many matches

Search 'tie'. Full matches appear at the top (good), followed by some results containing 'ti' (good), and then further toward the bottom of the list there are many matches that don't appear to even have two of the same consecutive characters from 'tie'.

Screen.Recording.2026-03-18.at.1.26.28.PM.mov

Similar search for 'color'. There are just a lot more results than I would expect. So perhaps it's a bit too fuzzy at the moment.

Screen.Recording.2026-03-18.at.1.28.58.PM.mov

@juli27 juli27 force-pushed the addFuzzySearchPart1 branch 2 times, most recently from b70402b to beccb94 Compare March 24, 2026 18:13
@juli27 juli27 force-pushed the addFuzzySearchPart1 branch from beccb94 to d3a3a5f Compare March 24, 2026 18:13
@juli27
Copy link
Contributor Author

juli27 commented Mar 24, 2026

Pushed an update with the following changes:

  • Reduced the amount of allowed errors from 50% of the user input to a more sensible amount
  • All words in a search query now need to be successfully matched instead of any single one of them
  • Improved scoring function (prioritize full-word and start-of-word matches)
  • Count transposition errors (swapped characters) as one error instead of two

Multi-word search

[...]

This is fixed now.

Single word returns too many matches

[...]

This should be much better now, except for "tie" and other, similarly short words. It is quite difficult to narrow down the number of results without inhibiting the algorithms ability to compensate typing errors. In this specific case, "tie" contains three of the most frequent letters in English, which explains the amount of matches. (The algorithm allows one error, so it matches against "tie", "ti", "te", "ie", "ite", "tei", or any words that have one additional letter at any position of the input.)

One thing we could do, is to use strict matching for words shorter than 4 characters. But this will cause more results to be shown once the user types a fourth letter.

Since the results are sorted by relevance anyway, should something like that be done to reduce the number of results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants