# BloomFilter
Bloom Filter implementation in Elixir. Bloom filters are probabilistic data structures designed
to efficiently tell you whether an element is present in a set.
[](https://travis-ci.org/yosriady/bloom_filter)
[](https://coveralls.io/github/yosriady/bloom_filter?branch=master)
[](https://hex.pm/packages/bloom_filter)
### [Hex](http://hex.pm/packages/bloom_filter)
### [API Documentation](https://hexdocs.pm/bloom_filter/)
## Installation
Add bloom_filter to your list of dependencies in `mix.exs`:
def deps do
[{:bloom_filter, "~> 1.0.0"}]
end
## Usage
iex> f = BloomFilter.new 100, 0.001 # Create a bloom filter with an expected capacity 100 and desired false positive rate < 0.001
iex> f = BloomFilter.add(f, 42)
iex> BloomFilter.has?(f, 42)
true
## Running Tests
```
mix test
```
## Background
A [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) is a space-efficient data structure designed to efficiently tell you whether an element is present in a set. Both insertion and membership operations theoretically cost a constant time `O(k)`, where `k` is the number of hash functions used in the filter.
The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either is *definitely NOT in* the set or *MAYBE in* the set.

The filter is essentially a vector of bits (`0` or `1`) of some size `m`. When we `add` a new `item` to the filter we map `item` to `k` number of hash functions, which gives us `k` indices. We then set the bits on these indices to `1`.
When we want to check if our filter `has?` a particular `item`, we feed it to our hash functions again, and check if `any?` of the bits are `0` or `all?` the bits are `1`. If there are bits that are not set, `item` is **definitely** not in the set. If all the bits are set, `item` is **probably** in the set, since false positives can happen due to [collision](https://en.wikipedia.org/wiki/Collision_(computer_science)).
Bloom filters are best suited for applications where the amount of source data would require an impractically large amount of memory if "conventional" error-free hashing techniques were applied.
## Implementation Details
`bloom_filter` uses two hash functions [`:erlang.phash2`](http://erlang.org/doc/man/erlang.html#phash2-2), [`Fowler–Noll–Vo`](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function), and the [Double Hashing](https://en.wikipedia.org/wiki/Double_hashing) technique to generate an arbitrary number of independent hash functions.
`bloom_filter` also automatically optimizes the optimal size of the bit vector and the number of hash functions required to attain the user's desired error rate.
## Running Type Checker
> You need to have [dialyxir](https://github.com/jeremyjh/dialyxir) installed.
```
mix dialyzer
```
## Contributing
1. Fork it ( http://github.com/Leventhan/bloom_filter/fork )
2. Create your feature branch (`git checkout -b feature/my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin feature/my-new-feature`)
5. Create new Pull Request (Remember to squash your commits!)
> Report any found bugs or errors using [the issue tracker](https://github.com/Leventhan/bloom_filter/issues).
## Thanks
**bloom_filter** © 2016+, Yos Riady. Released under the [MIT] License.<br>
Authored and maintained by Yos Riady with help from contributors ([list][contributors]).
> [yos.io](http://yos.io) ·
> GitHub [@yosriady](https://github.com/yosriady)
[MIT]: http://mit-license.org/
[contributors]: http://github.com/yosriady/bloom_filter/contributors