Module: String::Similarity
- Defined in:
- lib/string/similarity.rb,
lib/string/similarity/version.rb
Overview
String::Similarity provides various methods for calculating string distances.
Constant Summary collapse
- VERSION =
Gem version
'2.1.0'
Class Method Summary collapse
-
.cosine(str1, str2, ngram: 1) ⇒ Float
Calcuate the Cosine similarity of two strings.
-
.levenshtein(str1, str2) ⇒ Float
Calculate the Levenshtein similarity for two strings.
-
.levenshtein_distance(str1, str2) ⇒ Fixnum
Calculate the Levenshtein distance of two strings.
Class Method Details
.cosine(str1, str2, ngram: 1) ⇒ Float
Calcuate the Cosine similarity of two strings.
For an explanation of the Cosine similarity of two strings read this excellent SO answer.
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# File 'lib/string/similarity.rb', line 20 def self.cosine(str1, str2, ngram: 1) raise ArgumentError.new('ngram should be >= 1') if ngram < 1 return 1.0 if str1 == str2 return 0.0 if str1.empty? || str2.empty? # convert both texts to vectors v1 = vector(str1, ngram) v2 = vector(str2, ngram) # calculate the dot product dot_product = dot(v1, v2) # calculate the magnitude magnitude = mag(v1.values) * mag(v2.values) dot_product / magnitude end |
.levenshtein(str1, str2) ⇒ Float
Calculate the Levenshtein similarity for two strings.
This is basically the inversion of the levenshtein_distance, i.e.
1 / levenshtein_distance(str1, str2)
49 50 51 52 53 |
# File 'lib/string/similarity.rb', line 49 def self.levenshtein(str1, str2) return 1.0 if str1.eql?(str2) return 0.0 if str1.empty? || str2.empty? 1.0 / levenshtein_distance(str1, str2) end |
.levenshtein_distance(str1, str2) ⇒ Fixnum
Calculate the Levenshtein distance of two strings.
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
# File 'lib/string/similarity.rb', line 62 def self.levenshtein_distance(str1, str2) # base cases result = base_case?(str1, str2) return result if result # Initialize cost-matrix rows previous = (0..str2.length).to_a current = [] (0...str1.length).each do |i| # first element is always the edit distance from an empty string. current[0] = i + 1 (0...str2.length).each do |j| current[j + 1] = [ # insertion current[j] + 1, # deletion previous[j + 1] + 1, # substitution or no operation previous[j] + (str1[i].eql?(str2[j]) ? 0 : 1) ].min end previous = current.dup end current[str2.length] end |