Class: FuzzySet
- Inherits:
-
Object
- Object
- FuzzySet
- Defined in:
- lib/fuzzy_set.rb,
lib/fuzzy_set/version.rb
Overview
FuzzySet implements a fuzzy-searchable set of strings.
As a set, it cannot contain duplicate elements.
Constant Summary collapse
- DEFAULT_OPTS =
default options for creating new instances
{ all_matches: false, ngram_size_max: 3, ngram_size_min: 2 }
- VERSION =
'1.1.0'
Instance Method Summary collapse
- #<<(item) ⇒ Object
-
#add(*items) ⇒ FuzzySet
Add one or more
items
to the set. -
#empty? ⇒ Boolean
true
, if there are no items yet. -
#exact_match(query) ⇒ String
Normalizes
query
, and looks up an entry by its normalized value. -
#get(query) ⇒ Object
Fuzzy-find a string based on
query
. -
#include?(item) ⇒ Boolean
true
if the givenitem
is present in the set. -
#initialize(*items, **opts) ⇒ FuzzySet
constructor
A new instance of FuzzySet.
-
#length ⇒ Fixnum
(also: #size)
Number of elements in the set.
Constructor Details
#initialize(*items, **opts) ⇒ FuzzySet
Returns a new instance of FuzzySet.
22 23 24 25 26 27 28 29 30 31 32 33 |
# File 'lib/fuzzy_set.rb', line 22 def initialize(*items, **opts) opts = DEFAULT_OPTS.merge(opts) @items = [] @denormalize = {} @index = {} @all_matches = opts[:all_matches] @ngram_size_max = opts[:ngram_size_max] @ngram_size_min = opts[:ngram_size_min] add(items) end |
Instance Method Details
#<<(item) ⇒ Object
62 63 64 |
# File 'lib/fuzzy_set.rb', line 62 def <<(item) add(item) end |
#add(*items) ⇒ FuzzySet
Add one or more items
to the set.
Each item will be converted into a string and indexed upon adding.
49 50 51 52 53 54 55 56 57 58 59 |
# File 'lib/fuzzy_set.rb', line 49 def add(*items) items = [items].flatten items.each do |item| item = item.to_s return self if @items.include?(item) id = _add(item) calculate_grams_for(normalize(item), id) end self end |
#empty? ⇒ Boolean
Returns true
, if there are no items yet.
100 101 102 |
# File 'lib/fuzzy_set.rb', line 100 def empty? @items.empty? end |
#exact_match(query) ⇒ String
Normalizes query
, and looks up an entry by its normalized value.
39 40 41 |
# File 'lib/fuzzy_set.rb', line 39 def exact_match(query) @denormalize[normalize(query)] end |
#get(query) ⇒ Object
Fuzzy-find a string based on query
-
normalize
query
-
check for an exact match and return, if present
-
find matches based on Ngrams
-
sort matches by their cosine similarity to
query
74 75 76 77 78 79 80 81 82 83 84 85 86 |
# File 'lib/fuzzy_set.rb', line 74 def get(query) query = normalize(query) # check for exact match return [@denormalize[query]] if !@all_matches && @denormalize[query] match_ids = matches_for(query) match_ids = match_ids.flatten.compact.uniq matches = match_ids.map { |id| @items[id] } # sort matches by their cosine distance to query matches.sort_by { |match| 1.0 - String::Similarity.cosine(query, match) } end |
#include?(item) ⇒ Boolean
Returns true
if the given item
is present in the set.
89 90 91 |
# File 'lib/fuzzy_set.rb', line 89 def include?(item) @items.include?(item) end |
#length ⇒ Fixnum Also known as: size
Returns Number of elements in the set.
94 95 96 |
# File 'lib/fuzzy_set.rb', line 94 def length @items.length end |