Class: FuzzySet

Inherits:
Object
  • Object
show all
Defined in:
lib/fuzzy_set.rb,
lib/fuzzy_set/version.rb

Overview

FuzzySet implements a fuzzy-searchable set of strings.

As a set, it cannot contain duplicate elements.

Constant Summary collapse

DEFAULT_OPTS =

default options for creating new instances

{
  all_matches: false,
  ngram_size_max: 3,
  ngram_size_min: 2
}
VERSION =
'1.1.0'

Instance Method Summary collapse

Constructor Details

#initialize(*items, **opts) ⇒ FuzzySet

Returns a new instance of FuzzySet.

Parameters:

  • items (#each, #to_s)

    item(s) to add

  • opts (Hash)

    options, see DEFAULT_OPTS

Options Hash (**opts):

  • :all_matches (Boolean)

    return all matches, even if an exact match is found

  • :ngram_size_max (Fixnum)

    upper limit for ngram sizes

  • :ngram_size_min (Fixnum)

    lower limit for ngram sizes



22
23
24
25
26
27
28
29
30
31
32
33
# File 'lib/fuzzy_set.rb', line 22

def initialize(*items, **opts)
  opts = DEFAULT_OPTS.merge(opts)

  @items = []
  @denormalize = {}
  @index = {}
  @all_matches = opts[:all_matches]
  @ngram_size_max = opts[:ngram_size_max]
  @ngram_size_min = opts[:ngram_size_min]

  add(items)
end

Instance Method Details

#<<(item) ⇒ Object

See Also:



62
63
64
# File 'lib/fuzzy_set.rb', line 62

def <<(item)
  add(item)
end

#add(*items) ⇒ FuzzySet

Add one or more items to the set.

Each item will be converted into a string and indexed upon adding.

Parameters:

  • items (#each, #to_s)

    item(s) to add

Returns:



49
50
51
52
53
54
55
56
57
58
59
# File 'lib/fuzzy_set.rb', line 49

def add(*items)
  items = [items].flatten
  items.each do |item|
    item = item.to_s
    return self if @items.include?(item)

    id = _add(item)
    calculate_grams_for(normalize(item), id)
  end
  self
end

#empty?Boolean

Returns true, if there are no items yet.

Returns:

  • (Boolean)

    true, if there are no items yet.



100
101
102
# File 'lib/fuzzy_set.rb', line 100

def empty?
  @items.empty?
end

#exact_match(query) ⇒ String

Normalizes query, and looks up an entry by its normalized value.

Parameters:

  • query (String)

    search query

Returns:

  • (String)

    matched (denormalized) value or ‘nil`



39
40
41
# File 'lib/fuzzy_set.rb', line 39

def exact_match(query)
  @denormalize[normalize(query)]
end

#get(query) ⇒ Object

Fuzzy-find a string based on query

  1. normalize query

  2. check for an exact match and return, if present

  3. find matches based on Ngrams

  4. sort matches by their cosine similarity to query

Parameters:

  • query (String)

    search query



74
75
76
77
78
79
80
81
82
83
84
85
86
# File 'lib/fuzzy_set.rb', line 74

def get(query)
  query = normalize(query)

  # check for exact match
  return [@denormalize[query]] if !@all_matches && @denormalize[query]

  match_ids = matches_for(query)
  match_ids = match_ids.flatten.compact.uniq
  matches = match_ids.map { |id| @items[id] }

  # sort matches by their cosine distance to query
  matches.sort_by { |match| 1.0 - String::Similarity.cosine(query, match) }
end

#include?(item) ⇒ Boolean

Returns true if the given item is present in the set.

Returns:

  • (Boolean)

    true if the given item is present in the set.



89
90
91
# File 'lib/fuzzy_set.rb', line 89

def include?(item)
  @items.include?(item)
end

#lengthFixnum Also known as: size

Returns Number of elements in the set.

Returns:

  • (Fixnum)

    Number of elements in the set.



94
95
96
# File 'lib/fuzzy_set.rb', line 94

def length
  @items.length
end