# Stripper

[![Module Version](](
[![Hex Docs](](
[![Total Download](](
[![Last Updated](](

`Stripper` is an [Elixir]( package for normalizing input from unpredictable sources (such as web scraping), useful as a pre-processing step in ETL pipelines for machine learning or data analysis. It is parser-based (not regular expression based), so it does all its work in one pass and should be performant.

Why the name? Because it describes the purpose and it's memorable -- get over it ;)

## Examples

Normalizing whitespace:

iex> Stripper.Whitespace.normalize!("   random\tstuff\fI   scraped\t\t\tfrom\nthe web\n\n")
"random stuff I scraped from the web"

This will reduce all unicode whitespace and separator characters to the humble space -- multiple spaces will be collapsed into one.

Simplifying quotes:

iex> Stripper.Quotes.normalize!(~S|‘make’ «it» „stop“|)
      "'make' \"it\" \"stop\""

See the [online documentation]( for more information.

## Installation

If [available in Hex](, the package can be installed
by adding `stripper` to your list of dependencies in `mix.exs`:

def deps do
    {:stripper, "~> 1.4.0"}

## Contributing

See the [Contributing Guidelines]( for more information.

## Image Attribution

The logo image is "wire strippers" by Designs by MB from the [the Noun Project](