README.md

# FuzzyCompare

## Getting started

In order to compare two strings with each other do the following:

    iex> FuzzyCompare.similarity("Oscar-Claude Monet", "monet, claude")
    0.95

## Inner workings

Imagine you had to [match some names](https://en.wikipedia.org/wiki/Record_linkage).

Try to match the following list of painters:

  * `"Oscar-Claude Monet"`
  * `"Edouard Manet"`
  * `"Monet, Claude"`

For a human it is easy to see that some of the names have just been flipped
and that others are different but similar sounding.

A first approrach could be to compare the strings with a string similarity
function like the
[Jaro-Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance)
function.

    iex> String.jaro_distance("Oscar-Claude Monet", "Monet, Claude")
    0.6032763532763533

    iex> String.jaro_distance("Oscar-Claude Monet", "Edouard Manet")
    0.6749287749287749

This is not an improvement over exact equality.

In order to improve the results this library uses two different approaches,
`FuzzyCompare.ChunkSet` and `FuzzyCompare.SortedChunks`.

### Sorted chunks

This approach yields good results when words within a string have been
shuffled around. The strategy will sort all substrings by words and compare
the sorted strings.

    iex> FuzzyCompare.SortedChunks.substring_similarity("Oscar-Claude Monet", "Monet, Claude")
    1.0

    iex(4)> FuzzyCompare.SortedChunks.substring_similarity("Oscar-Claude Monet", "Edouard Manet")
    0.6944444444444443

### Chunkset

The chunkset approach is best in scenarios when the strings contain other
substrings that are not relevant to what is being searched for.

    iex> FuzzyCompare.ChunkSet.standard_similarity("Claude Monet", "Alice Hoschedé was the wife of Claude Monet")
    1.0

### Substring comparison

Should one of the strings be much longer than the other the library will
attempt to compare matching substrings only.

## Credits

This library is inspired by a [seatgeek blogpost from 2011](https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/).