[![Build Status](](

Calculates a locality sensitive hash for text.

[Locality-sensitive hashing]( is a
technique for dimensionality reduction. Its properties guarantee similar
output vectors for similar inputs. It can be used for clustering and
[near-duplicate detection]( This implementation is targeted for natural language as input. It takes a `String` of arbitrary length and outputs a vector encoded as `:binary`.

## Installation

Add `ex_lsh` to your list of dependencies in `mix.exs`:

def deps do
    {:ex_lsh, version: "~> 0.4"}

## Usage

"Lorem ipsum dolor sit amet"
|> ExLSH.lsh()
|> Base.encode64()

## Docs
see [](

## Contributions
Please fork the project and submit a PR.

## Credits

- [SimHash]( is a very similar, but less versatile implementation that is focused on short strings only.
- [Resemblance]( explores simhash and sketching in Ruby. The author has documented his findings in a series of articles. You may want to make yourself familiar with [Part 3: The SimHash Algorithm](
- [Near-duplicate detection]( is a very helpful article by Moz. It explains core concepts such as tokinization, shingling, MinHash, SimHash, etc.