README.md
# Stripper
[![Module Version](https://img.shields.io/hexpm/v/dotenvy.svg)](https://hex.pm/packages/dotenvy)
[![Hex Docs](https://img.shields.io/badge/hex-docs-lightgreen.svg)](https://hexdocs.pm/dotenvy/)
[![Total Download](https://img.shields.io/hexpm/dt/dotenvy.svg)](https://hex.pm/packages/dotenvy)
[![License](https://img.shields.io/hexpm/l/dotenvy.svg)](https://hex.pm/packages/dotenvy)
[![Last Updated](https://img.shields.io/github/last-commit/fireproofsocks/dotenvy.svg)](https://github.com/fireproofsocks/dotenvy/commits/master)
`Stripper` is an [Elixir](https://elixir-lang.org/) package for normalizing input from unpredictable sources (such as web scraping), useful as a pre-processing step in ETL pipelines for machine learning or data analysis. It is parser-based (not regular expression based), so it does all its work in one pass and should be performant.
Why the name? Because it describes the purpose and it's memorable -- get over it ;)
## Examples
Normalizing whitespace:
```elixir
iex> Stripper.Whitespace.normalize!(" random\tstuff\fI scraped\t\t\tfrom\nthe web\n\n")
"random stuff I scraped from the web"
```
This will reduce all unicode whitespace and separator characters to the humble space -- multiple spaces will be collapsed into one.
Simplifying quotes:
```elixir
iex> Stripper.Quotes.normalize!(~S|‘make’ «it» „stop“|)
"'make' \"it\" \"stop\""
```
See the [online documentation](https://hex.pm/packages/stripper) for more information.
## Installation
If [available in Hex](https://hex.pm/docs/publish), the package can be installed
by adding `stripper` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:stripper, "~> 1.4.0"}
]
end
```
## Contributing
See the [Contributing Guidelines](CONTRIBUTING.md) for more information.
## Image Attribution
The logo image is "wire strippers" by Designs by MB from the [the Noun Project](https://thenounproject.com/)