Fuzzy text search, part 1: algorithm & searching shortcuts#32634
Fuzzy text search, part 1: algorithm & searching shortcuts#32634juli27 wants to merge 1 commit intomusescore:masterfrom
Conversation
b70402b to
beccb94
Compare
beccb94 to
d3a3a5f
Compare
|
Pushed an update with the following changes:
This is fixed now.
This should be much better now, except for "tie" and other, similarly short words. It is quite difficult to narrow down the number of results without inhibiting the algorithms ability to compensate typing errors. In this specific case, "tie" contains three of the most frequent letters in English, which explains the amount of matches. (The algorithm allows one error, so it matches against "tie", "ti", "te", "ie", "ite", "tei", or any words that have one additional letter at any position of the input.) One thing we could do, is to use strict matching for words shorter than 4 characters. But this will cause more results to be shown once the user types a fourth letter. Since the results are sorted by relevance anyway, should something like that be done to reduce the number of results? |


Part of: #15983
This PR implements a reusable fuzzy search proxy model and adds fuzzy search support to the shortcuts list in preferences.
Overview of the Algorithm
An error can be a missing character, an excess character, different characters, and swapped characters.
The implemented fuzzy search algorithm is an extended version of Seller's variant for string search of the Wagner-Fischer algorithm (this is the algorithm that is used in our implementation of
muse::string::levenshteinDistance). It is extended to also allow transposition errors (Damerau-Levenshtein distance).TO-DO before this PR is ready for review:
findBestMatchreturning wrongstartPosoptional: Option to highlight matching substrings in search result