-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
The following returns an exact match:
irb(main):164:0> Text::WhiteSimilarity.new.similarity("John F Kennedy", "John A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Kennedy")
1.0
I would not expect it to match exactly.
This returns NaN:
irb(main):165:0> Text::WhiteSimilarity.new.similarity("C J", "C J")
NaN
I am expecting this to return a 1.0.
The issue is in module Text, in class WhiteSimilarity, in private method word_letter_pairs which always expects the words that are parsed from input string argument to be at least two characters long.
An example of a refactor for this method would be to check for single-character length words and handle them differently:
def word_letter_pairs(str)
@word_letter_pairs[str] ||=
str.upcase.split(/\s+/).map{ |word|
if word.length == 1
[word]
else
(0 ... (word.length - 1)).map { |i| word[i, 2] }
end
}.flatten.freeze
end
I am using version 1.3.1
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels