``````# FuzzyCompare

## Getting started

In order to compare two strings with each other do the following:

iex> FuzzyCompare.similarity("Oscar-Claude Monet", "monet, claude")
0.95

## Inner workings

Try to match the following list of painters:

* `"Oscar-Claude Monet"`
* `"Edouard Manet"`
* `"Monet, Claude"`

For a human it is easy to see that some of the names have just been flipped
and that others are different but similar sounding.

A first approrach could be to compare the strings with a string similarity
function like the
[Jaro-Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance)
function.

iex> String.jaro_distance("Oscar-Claude Monet", "Monet, Claude")
0.6032763532763533

iex> String.jaro_distance("Oscar-Claude Monet", "Edouard Manet")
0.6749287749287749

This is not an improvement over exact equality.

In order to improve the results this library uses two different approaches,
`FuzzyCompare.ChunkSet` and `FuzzyCompare.SortedChunks`.

### Sorted chunks

This approach yields good results when words within a string have been
shuffled around. The strategy will sort all substrings by words and compare
the sorted strings.

iex> FuzzyCompare.SortedChunks.substring_similarity("Oscar-Claude Monet", "Monet, Claude")
1.0

iex(4)> FuzzyCompare.SortedChunks.substring_similarity("Oscar-Claude Monet", "Edouard Manet")
0.6944444444444443

### Chunkset

The chunkset approach is best in scenarios when the strings contain other
substrings that are not relevant to what is being searched for.

iex> FuzzyCompare.ChunkSet.standard_similarity("Claude Monet", "Alice Hoschedé was the wife of Claude Monet")
1.0

### Substring comparison

Should one of the strings be much longer than the other the library will
attempt to compare matching substrings only.

## Credits

This library is inspired by a [seatgeek blogpost from 2011](https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/).
``````