README.md

Select File:
<!--
DO NOT EDIT THIS FILE
It has been generated from the template `README.md.eex` by Extractly (https://github.com/RobertDober/extractly.git)
and any changes you make in this file will most likely be lost
-->

# Minipeg

**TODO: Add description**

## Installation

If [available in Hex](https://hex.pm/docs/publish), the package can be installed
by adding `minipeg` to your list of dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:minipeg, "~> 0.1.0"}
  ]
end
```

## Table Of Content

- [Installation](#installation)
- [Table Of Content](#table-of-content)
- [Documentation](#documentation)
  - [Minipeg](#minipeg)
- [Basic Usage](#basic-usage)
  - [Parsing Single Characters](#parsing-single-characters)
    - [`char_parser`](#char_parser)
    - [Parsing POSIX character classes](#parsing-posix-character-classes)
    - [`escaped_char_parser`](#escaped_char_parser)
  - [Parsing sequences of characters](#parsing-sequences-of-characters)
    - [Keywords: `keywords_parser`](#keywords-keywords_parser)
    - [Identifiers: `ident_parser`](#identifiers-ident_parser)
  - [Regex Parsers](#regex-parsers)
    - [`rgx_parser`](#rgx_parser)
    - [`rgx_match_parser`](#rgx_match_parser)
    - [`rgx_capture_parser`](#rgx_capture_parser)
  - [Token Parser, defining Tokens with Regular Expressions](#token-parser-defining-tokens-with-regular-expressions)
    - [Postprocessing](#postprocessing)
    - [Flatten the AST](#flatten-the-ast)
    - [Using the `skip` option](#using-the-skip-option)
    - [Mixing Regular Expressions and Parsers](#mixing-regular-expressions-and-parsers)
- [Essential Combinators](#essential-combinators)
  - [`many`](#many)
  - [`satisfy`](#satisfy)
  - [`sequence`](#sequence)
  - [`map`](#map)
    - [`mapp`](#mapp)
    - [`with_pos`](#with_pos)
  - [`select`](#select)
  - [`many_sel`](#many_sel)
  - [`many_seq`](#many_seq)
- [Convenience Combinators](#convenience-combinators)
  - [What about whitespace](#what-about-whitespace)
  - [`upto_parser_parser`](#upto_parser_parser)
  - [Error Handling...](#error-handling)
    - [`map_error`](#map_error)
- [Definitions](#definitions)
  - [Parser](#parser)
  - [Combinator](#combinator)
- [LICENSE](#license)

Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc)
and published on [HexDocs](https://hexdocs.pm). Once published, the docs can
be found at <https://hexdocs.pm/minipeg>.

It is also included in the next chapter

## Documentation

### Minipeg

# Minipeg is a minimal _Parse Expression Grammars_ (PEG) Library

Here is a first taste of how to use it:

```elixir
    iex(1)> an_a_parser = char_parser("a")
    ...(1)> parse_string(an_a_parser, "a")
    {:ok, "a"}
```

```elixir
    iex(2)> an_a_parser = char_parser("a")
    ...(2)> parse_string(an_a_parser, "b")
    {:error, "b not member of \"a\" in char_parser(\"a\") (in char_parser(\"a\")) in <binary>:1,1"}
```

The first thing to note here is that in these doctests we have imported functions from
Minipeg as follows, see the moduledocs of the corresponding modules for details

  ```elixir
    import Minipeg.Parser, only: [parse_string: 2]
    import Minipeg.{Combinators, Parsers}
  ```

## Basic Usage

Quite a small subset of predefined [parsers](#module-definitions) and [combinators](#module-definitions)
would suffice to parse any [context free language](https://en.wikipedia.org/wiki/Context-free_language), however many of the patterns used to parse
programming languages, little languages or small languages that exceed the practicallity of [regular expressions](https://en.wikipedia.org/wiki/Regular_expression)
are quite verbose.

Therefore Minipeg predefines parsers that can be easily parametrized. These parsers will be described in the [utility parsers](#moduledoc-utility-parsers) section.


### Parsing Single Characters

With a _character_ we do indeed mean a UTF-8 Code Point

The most basic parser is the...

#### `char_parser`

... which parses _any_ character

```elixir
    iex(3)> parse_string(char_parser(), "h")
    {:ok, "h"}
```


```elixir
    iex(4)> parse_string(char_parser(), "é")
    {:ok, "é"}
```


```elixir
    iex(5)> parse_string(char_parser(), "✓")
    {:ok, "✓"}
```

```elixir
    iex(6)> parse_string(char_parser(), "")
    {:error, "encountered end of input (in char_parser()) in <binary>:1,1"}
```

It can however be parametrized to parse only characters of a given set, this set can be provided as
a `String` or an `Enumerable`


```elixir
    iex(7)> parse_string(char_parser("ab"), "b")
    {:ok, "b"}
```

```elixir
    iex(8)> parse_string(char_parser(["b", "c"]), "b")
    {:ok, "b"}
```


```elixir
    iex(9)> parse_string(char_parser(["b", "c"]), "a")
    {:error, "a not member of \"bc\" in char_parser([\"b\", \"c\"]) (in char_parser([\"b\", \"c\"])) in <binary>:1,1"}
```

Often used charsets might be extremly large to be defined and therefore some more specialised parsers have been defined:

#### Parsing POSIX character classes

If instead of a string or list we pass an atom into `char_parser` it only parses a character if it matches a character class as defined in POSIX regular expressions,
which are also described in the docs of the [`Regex`](https://hexdocs.pm/elixir/Regex.html#module-character-classes) module, here are the currently supported values:

```
  :alnum | :alpha | :blank | :cntrl | :digit | :graph | :lower | :print | :punct | :space | :upper | :word | :xdigit
```

```elixir
    iex(10)> parser = char_parser(:alnum)
    ...(10)> "aD7_%"
    ...(10)> |> String.graphemes
    ...(10)> |> Enum.map(&parse_string(parser, &1))
    [
    ok: "a",
    ok: "D",
    ok: "7",
    error: "~r{\\A[[:alnum:]]}u does not match at 1,1 in char_parser(:alnum) (in char_parser(:alnum)) in <binary>:1,1",
    error: "~r{\\A[[:alnum:]]}u does not match at 1,1 in char_parser(:alnum) (in char_parser(:alnum)) in <binary>:1,1"
    ]
```


#### `escaped_char_parser`

this parser helps to parse escaped characters, while one could de this quite easily with the following example, one notices
that in order to get just an escpaed character two combinators, `map` and `sequence` are needed

```elixir
    iex(11)> escaped_quote_parser = sequence([
    ...(11)> char_parser("\\"), char_parser()])
    ...(11)> |> map(&Enum.at(&1, 1))
    ...(11)> { parse_string(escaped_quote_parser, "\\\""), parse_string(escaped_quote_parser, "\\'") }
    { {:ok, "\""}, {:ok, "'"} }
```

Compare this to the provided `escaped_char_parser`:

```elixir
    iex(12)> parse_string(escaped_char_parser(), "\\a")
    {:ok, "a"}
```

We can also change the escape character

```elixir
    iex(13)> parse_string(escaped_char_parser("%"), "%a")
    {:ok, "a"}
```

```elixir
    iex(14)> parse_string(escaped_char_parser("%"), "\\a")
    {:error, "\\ not member of \"%\" in char_parser(\"%\") (in escaped_char_parser) in <binary>:1,1"}
```

And furthermore we can restrict the set of which characters are allowed to be escaped

```elixir
    iex(15)> parser = escaped_char_parser("\\", "escape only \\", "\\")
    ...(15)> { parse_string(parser, "\\\\"), parse_string(parser, "\\a") }
    { {:ok, "\\"}, {:error, "a not member of \"\\\\\" in char_parser(\"\\\\\") (in escaped_char_parser) in <binary>:1,1"} }
```

### Parsing sequences of characters

#### Keywords: `keywords_parser`

Does pretty much what is expected ;)

```elixir
    iex(16)> kwd_parser = keywords_parser(["do", "else", "if"])
    ...(16)> ["do", "if", "for"]
    ...(16)> |> Enum.map(&parse_string(kwd_parser, &1))
    [
    ok: "do",
    ok: "if",
    error: "no alternative could be parsed in keywords_parser([\"do\", \"else\", \"if\"]) (in keywords_parser([\"do\", \"else\", \"if\"])) in <binary>:1,1"
    ]
```

#### Identifiers: `ident_parser`

An identifier is defined by a character class for its first character and a character class for its subsequent characters, so
one could define it roughly as

        sequence([
          first_char_parser,
          many(second_char_paser)])

And that is how the `ident_parser` is actually defined

```elixir
    iex(17)> parse_string(ident_parser(), "hello_42")
    {:ok, "hello_42"}
```

```elixir
    iex(18)> parse_string(ident_parser(), "42hello_world")
    {:error, "~r{\\A[[:alpha:]]}u does not match at 1,1 in char_parser(:alpha) (in char_parser(:alpha)) in <binary>:1,1"}
```


In Lisp we prefer `-` to `_`, no problem

```elixir
    iex(19)> parse_string(ident_parser("Lisp Style", additional_chars: "-"), "hello-42")
    {:ok, "hello-42"}
```

But even the parser for the first character and the following characters can be defined

```elixir
    iex(20)> register_parser = ident_parser(
    ...(20)>   "Uppercase and digit",
    ...(20)>   first_char_parser: char_parser(:upper),
    ...(20)>   rest_char_parser: char_parser(:digit),
    ...(20)>   additional_chars: nil,
    ...(20)>   max_len: 2,
    ...(20)>   min_len: 2)
    ...(20)> [
    ...(20)>   parse_string(register_parser, "R2"),
    ...(20)>   parse_string(register_parser, "X_"),
    ...(20)>   parse_string(register_parser, "R12"),
    ...(20)>   parse_string(register_parser, "a2"),
    ...(20)>   parse_string(register_parser, "ab")
    ...(20)> ]
    [
    ok: "R2",
    error: "string \"X\" length 1 under required minimum 2 (in Uppercase and digit) in <binary>:1,1",
    error: "string \"R12\" length 3 exceeds allowed 2 (in Uppercase and digit) in <binary>:1,1",
    error: "~r{\\A[[:upper:]]}u does not match at 1,1 in char_parser(:upper) (in char_parser(:upper)) in <binary>:1,1",
    error: "~r{\\A[[:upper:]]}u does not match at 1,1 in char_parser(:upper) (in char_parser(:upper)) in <binary>:1,1"
    ]
```

In some environments we would like to restrict the length of an identifier

```elixir
    iex(21)> dos_name_parser = ident_parser("dos name parser", max_len: 8)
    ...(21)> [
    ...(21)>   parse_string(dos_name_parser, "dosok"),
    ...(21)>   parse_string(dos_name_parser, "way_too_long")
    ...(21)> ]
    [
    ok: "dosok",
    error: "string \"way_too_long\" length 12 exceeds allowed 8 (in dos name parser) in <binary>:1,1"
    ]
```

        }
### Regex Parsers

Sometimes it is cumbersome to specify a parser that can be expressed simply with a regular expression.
A good example of this would be the `ident_parser` from above.

A _Regex Parser_ will always create an anchored regex which will be parsed against the start of the input.
As long as you avoid backtracking or unbond lookahead the performance should be at the same level as
writing a parser "by hand".

The `Regex` that will be used in the parser will be compiled as fomllows from the string parameter specifying
it:

        Regex.compile!("\\A" <> param, [:unicode])

#### `rgx_parser`

This is the basic parser that creates a regular expression as described above and if it parses puts the result of
`Regex.run(compiled_rgx, input.input)` into the ast field of the `Success` structure, of course the matching string is
removed from the returned input


```elixir
    iex(22)> atom_parser = rgx_parser(":[[:alpha:]][[:word:]]*", "rgx based atom parser")
    ...(22)> Parser.parse(atom_parser, Input.new(":atom_42"), %Cache{})
    %Success{ast: [":atom_42"], cache: %Cache{}, parsed_by: "rgx based atom parser", rest: %Input{col: 9, context: %{}, input: "", lnb: 1}}
```

```elixir
    iex(23)> atom_parser = rgx_parser(":[[:alpha:]][[:word:]]*", "rgx based atom parser")
    ...(23)> parse_string(atom_parser, "hello")
    {:error, "~r{\\A:[[:alpha:]][[:word:]]*}u does not match at 1,1 in rgx based atom parser (in rgx based atom parser) in <binary>:1,1"}
```

#### `rgx_match_parser`

 Oftentimes we will only want the whole match and do not care of captures, enter `rgx_match_parser`

```elixir
    iex(24)> atom_parser = rgx_match_parser(":[[:alpha:]][[:word:]]*", "rgx based atom parser")
    ...(24)> Parser.parse(atom_parser, Input.new(":atom_42"), %Cache{})
    %Success{ast: ":atom_42", cache: %Cache{}, parsed_by: "rgx based atom parser", rest: %Input{col: 9, context: %{}, input: "", lnb: 1}}
```

The `:unicode` will **always** be used in the compiled regex, however one can add other options
A nice addition is the `:extended` option

```elixir
    iex(25)> a_list_parser = rgx_match_parser(" a (?: , a)+ ", nil, [:extended])
    ...(25)> parse_string(a_list_parser, "a,a,abc")
    {:ok, "a,a,a"}
```


#### `rgx_capture_parser`

Also quite often we are only interested in one capture

```elixir
    iex(26)> number_parser = rgx_capture_parser("\\s*(\\d+)") |> map(&String.to_integer/1)
    ...(26)> parse_string(number_parser, "  42")
    {:ok, 42}
```

### Token Parser, defining Tokens with Regular Expressions

Let us start with a simple example that demonstrates the concept

```elixir
    iex(27)> tokens = [
    ...(27)>   {:number, "\\d+"},
    ...(27)>   {:number, "\\+(\\d+)"} ]
    ...(27)> parser = token_parser(tokens)
    ...(27)> assert parse_string(parser, "42") == {:ok, {:number, ["42"]}}
    ...(27)> assert parse_string(parser, "+42") == {:ok, {:number,  ["+42", "42"]}}
```

#### Postprocessing

We do have all captures in the ast, because, of course, they might be needed, in our case
this is not desired and as we want to convert the values anyway, this can be achieved with
post-processing, as follows:

```elixir
    iex(28)> tokens = [
    ...(28)>   {:number, "\\d+", fn [n] -> {:number, String.to_integer(n)} end},
    ...(28)>   {:number, "\\+(\\d+)", fn [_, n] -> {:number, String.to_integer(n)} end} ]
    ...(28)> parser = token_parser(tokens)
    ...(28)> assert parse_string(parser, "42") == {:ok, {:number, 42}}
    ...(28)> assert parse_string(parser, "+42") == {:ok, {:number, 42}}
```

There are many use cases that allow to implement simple grammars in a concise way, especially if they are not recursive.
Here is a real world example, used by [colorize](https://codeberg.org/lab419/ex_aequo)

Note however that defining a parser for color might be a better alternative for the strictness of the parser, e.g. restricting to certain colors

```elixir
    iex(29)> tokens = [
    ...(29)>     {:verb, "\\$\\$", fn _ -> {:verb, "$"} end},
    ...(29)>     {:reset, "\\$"},
    ...(29)>     {:reset, "<reset>"},
    ...(29)>     {:verb, "<<", fn _ -> {:verb, "<"} end},
    ...(29)>     {:color, "<([^,]+),([^,>]+)>", fn [_, col, style] ->  {:color, col, style } end},
    ...(29)>     {:color, "<([^,>]+)>"},
    ...(29)>     {:verb, "[^<\\$]+"} ]
    ...(29)>
    ...(29)> color_parser = many(token_parser(tokens, flatten_ast: true))
    ...(29)> {
    ...(29)>  parse_string(color_parser, "$$"),
    ...(29)>  parse_string(color_parser, "<red>$$$"),
    ...(29)>
    ...(29)> }
    {
    {:ok, [verb: "$"]},
    {:ok, [color: "red", verb: "$", reset: "$"]},
    }
```

#### Flatten the AST

If however all we need in the AST is the first capture or the whole match, the `flatten_ast` option
can be used:

```elixir
    iex(30)> tokens = [
    ...(30)>   {:number, "\\s*(\\d+)"},
    ...(30)>   {:name, "\\s*(\\w+)"},
    ...(30)>   {:any, ".+"} ]
    ...(30)> parser = many(token_parser(tokens, flatten_ast: true))
    ...(30)> parse_string(parser, " 42 hello ,x")
    {:ok, [number: "42", name: "hello", any: " ,x"]}
```

#### Using the `skip` option

In many cases the above pattern repeats in a way that we ignore whitespace before tokens, we can simply imply this
by passing a regular expression or string to the `skip:` option.

```elixir
    iex(31)> tokens = [
    ...(31)>   {:number, "\\d+"},
    ...(31)>   {:name, "\\w+"},
    ...(31)>   {:any, ".+"} ]
    ...(31)> parser = many(token_parser(tokens, flatten_ast: true, skip: "\\s+"))
    ...(31)> # "\\s*"  would work here but zero width matches are just so dangerous in parsing
    ...(31)> parse_string(parser, " 42 hello ,x")
    {:ok, [number: "42", name: "hello", any: ",x"]}
```

 **N.B.** Now the whitespace is also removed from the `:any` token

#### Mixing Regular Expressions and Parsers

If we want to structure a parser around `token_parser` **even if** everything cannot be expressed (or
is not desired to be expressed) in a regular expression, we can simply replace the regular expression
with another parser...

```elixir
    iex(32)> ab_parser = rgx_parser("(a+)(b+)") |> satisfy(fn [_, as, bs] -> String.length(as) == String.length(bs) end)
    ...(32)> tokens = [
    ...(32)>    abs: ab_parser,
    ...(32)>    other: ".*" ]
    ...(32)> parser = token_parser(tokens)
    ...(32)> assert parse_string(parser, "aabb") == {:ok, {:abs, ["aabb", "aa", "bb"]}}
    ...(32)> assert parse_string(parser, "aaabb") == {:ok, {:other, ["aaabb"]}}
    ...(32)> assert parse_string(parser, "aabbb") == {:ok, {:other, ["aabbb"]}}
```

**Note on Performance and Style**:
Maybe `ab_parser` should have been written as shown below, for long inputs, but I do not think
that this recursion, which I would have needed to express as the [Y-Combinator](https://en.wikipedia.org/wiki/Fixed-point_combinator#Y_combinator) inside a doctest
would have made for a readable doctest. And also for many practical purposes a regular expression
with a satisfy clause might be at least as performant as a complicated grammar.

```elixir
    def ab_parser, do: select([sequence([char_parser("a"), lazy(fn -> ab_parser()), char_parser("b")], empty_parser())])
```

## Essential Combinators

### `many`

Parses an input if a parser can be applied _many_ times to it. This means that `many` can parse
an _empty_ input unless a minimum count is specified:

```elixir
    iex(33)> a_parser = many(char_parser("a"))
    ...(33)> assert parse_string(a_parser, "") == {:ok, []}
    ...(33)> assert parse_string(a_parser, "aaa") == {:ok, ~W[a a a]}
```

But if we specify a `min_count`...

```elixir
    iex(34)> a_parser = many(char_parser("a"), "at least one", 1)
    ...(34)> assert parse_string(a_parser, "") == {:error, "Missing 1 parses in many (in at least one) in <binary>:1,1"}
    ...(34)> assert parse_string(a_parser, "aaa") == {:ok, ~W[a a a]}
```

Oftentimes `many` will be mapped to a string, like this:

```elixir
  many(some_parser) |> map(&IO.chardata_to_string/)
```

This can just be abbreviated to `many_as_string(some_parser)`

```elixir
    iex(35)> a_parser = many_as_string(char_parser("a"))
    ...(35)> parse_string(a_parser, "aaa")
    {:ok, "aaa"}
```

But as `IO.chardata` is used, we can convert deeper structures too:

```elixir
    iex(36)> ab_parser = many_as_string(sequence([
    ...(36)>   many(char_parser("a"), nil, 1),
    ...(36)>   many(char_parser("b"), nil, 1)
    ...(36)> ]))
    ...(36)> parse_string(ab_parser, "aabbb")
    {:ok, "aabbb"}
```

Be careful with `nil` values in your ast, as `IO.chardata_to_String` does not support them, use `maybe_as_empty` for these cases

```elixir
    iex(37)> one_or_two = many_as_string(sequence([
    ...(37)> char_parser("a"), maybe_as_empty(char_parser("a"))]))
    ...(37)> assert parse_string(one_or_two, "aa") == {:ok, "aa"}
    ...(37)> assert parse_string(one_or_two, "a") == {:ok, "a"}
```

### `satisfy`

This creates parsers with constraints by applying a _validation function_ to the ast of a successful parser invocations (fails are passed through of course)

The _validation function_ can either return a tuple `{:ok, new_ast}|{:error, :reason}` or simply a truth value in which case the original ast will be
maintained or a generic error message (in the case of the _validation function_ returning `false` or  `nil`) will be generated for the fail case.

As a consequence, very often in the case of production code, the _tuple form_ will be the preferred result of the _validation function_

```elixir
    iex(38)> vowel_parser = char_parser()
    ...(38)> |> satisfy(&Enum.member?(~W[a], &1)) # The famous Restricted Vowel Set ;)
    ...(38)> assert parse_string(vowel_parser, "a") == {:ok, "a"}
    ...(38)> assert parse_string(vowel_parser, "b") == {:error, "satisfier char_parser() returned false (in char_parser()) in <binary>:1,1"}
```

It might be preferable to be clearer

```elixir
    iex(39)> vowel_parser = char_parser()
    ...(39)> |> satisfy(
    ...(39)> fn letter -> if Enum.member?(~W[a], letter), do: {:ok, :a}, else: {:error, "Not an A"} end,
    ...(39)> "restricted vowel parser")
    ...(39)> assert parse_string(vowel_parser, "a") == {:ok, :a}
    ...(39)> assert parse_string(vowel_parser, "b") == {:error, "Not an A (in restricted vowel parser) in <binary>:1,1"}
```

### `sequence`

Takes a list of `Parsers`, only parses if **all** of them parse subsequently on the given
input and return a list of the results of each parser.

Let us _remimplement_ the keywords parser

```elixir
    iex(40)> if_parser = sequence([char_parser("i"), char_parser("f")])
    ...(40)> |> map(&Enum.join/1)
    ...(40)> ~w[if else]
    ...(40)> |> Enum.map(&parse_string(if_parser, &1))
    [
    ok: "if",
    error: "e not member of \"i\" in char_parser(\"i\") (in char_parser(\"i\")) in <binary>:1,1"
    ]
```

This leads us directly to

### `map`

Map, takes a parser and a _mapping function_. It returns a new parser that fails with exactly
the same error message as its input parser, but succeeds with the result mapped by the _mapping function_.

```elixir
    iex(41)> list_parser = many(char_parser()) |> map(&Enum.join(&1, ", "))
    ...(41)> parse_string(list_parser, "abc")
    {:ok, "a, b, c"}
```

Oftentimes we will want the position of a parsed string to be included into the ast.
An _obvious_ use case to identify where in the source a semantic error has occurred.

Enter ...

#### `mapp`

This example also demonstrates the `ignore` combinator which will be ignored in `sequence`

```elixir
    iex(42)> a_parser = sequence([ws_parser() |> ignore(), char_parser("a") |> mapp(&{&1, &2})])
    ...(42)> [parse_string(a_parser, "  a"), parse_string(a_parser, "")]
    [
    ok: [{"a", {3, 1}}],
    error: "encountered end of input (in char_parser(\"a\")) in <binary>:1,1"
    ]
```

`ignore` does of course not mean that the input does not need to parse

```elixir
    iex(43)> a_parser = sequence([char_parser("b") |> ignore(), char_parser("a") |> mapp(&{&1, &2})])
    ...(43)> parse_string(a_parser, "a")
    {
    :error, "a not member of \"b\" in char_parser(\"b\") (in char_parser(\"b\")) in <binary>:1,1"
    }
```


furthermore `ignore` can be used inside a `select` and will pop up to `sequence` or `many`

```elixir
    iex(44)> ignore_bs_parser = many(select([ # bs just means the plural of "b" \o/
    ...(44)> literal_parser("a"),
    ...(44)> literal_parser("b") |> ignore()
    ...(44)>   ]))
    ...(44)> parse_string(ignore_bs_parser, "abbab")
    {:ok, ["a", "a"]}
```

It is also quite normal to just append the position to the ast, so the above can also be written simpler
with ...

#### `with_pos`

```elixir
    iex(45)> a_parser = sequence([ws_parser() |> ignore(), char_parser("a") |> with_pos()])
    ...(45)> [parse_string(a_parser, "  a"), parse_string(a_parser, "")]
    [
    ok: [{"a", {3, 1}}],
    error: "encountered end of input (in char_parser(\"a\")) in <binary>:1,1"
    ]
```

### `select`

Oftentimes `select` is (maybe better) named `choice` we have therefore define an alias for `choice`

```elixir
    iex(46)> vowel_parser = select(~W[a e i o u y] |> Enum.map(&char_parser/1), "vowel_parser")
    ...(46)> ~W[a u y x] |> Enum.map(&parse_string(vowel_parser, &1))
    [
    ok: "a",
    ok: "u",
    ok: "y",
    error: "no alternative could be parsed in vowel_parser (in vowel_parser) in <binary>:1,1"
    ]
```

And the aliased `option`

```elixir
    iex(47)> parser = option([char_parser("a"), char_parser("b")], "option_parser")
    ...(47)> ~W[b x] |> Enum.map(&parse_string(parser, &1))
    [
    ok: "b",
    error: "no alternative could be parsed in option_parser (in option_parser) in <binary>:1,1"
    ]
```

end

### `many_sel`

Just a shortcut for `many(select(...`

```elixir
    iex(48)> parser = many_sel([char_parser("a"), char_parser("b")])
    ...(48)> parse_string(parser, "abba")
    {:ok, ~W[a b b a]}
```


### `many_seq`

Just a shortcut for `many(sequence(...`

```elixir
    iex(49)> parser = many_seq([char_parser("a"), char_parser("b")])
    ...(49)> parse_string(parser, "abab")
    {:ok, [~W[a b], ~W[a b]]}
```

In the result we can see that it is important to take into consideration that this
is indeed two nested parsers and one will often do things as the following

```elixir
    iex(50)> parser = many_seq([char_parser("a"), char_parser("b")]) |> map(&IO.chardata_to_string/1)
    ...(50)> parse_string(parser, "abab")
    {:ok, "abab"}
```


## Convenience Combinators

### What about whitespace

Oftentimes whitespace shall be ignored in the resulting ast, and sometimes in the input too. To be more precise
when whitespace stops parsing of example a keyword then the subsequent patser often is not interested in$
the left ofer ws preceeding its new input.

Enter `ignore_ws`

Here is the form that does not ignore newlines, which is the default:
    iex(51)> next_char_parser = ignore_ws(char_parser())
    ...(51)> parse_string(next_char_parser, " \ta")
    {:ok, "a"}

```elixir
    iex(52)> next_a_parser = ignore_ws(char_parser("a"))
    ...(52)> parse_string(next_a_parser, " \na")
    {:error, "\n not member of \"a\" in char_parser(\"a\") (in char_parser(\"a\")) in <binary>:1,2"}
```

But we can also use the newline allowing version

```elixir
    iex(53)> next_a_parser = ignore_ws(char_parser("a"), "skip newlines", true)
    ...(53)> parse_string(next_a_parser, " \na")
    {:ok, "a"}
```


### `upto_parser_parser`

Oftentimes parsing algorithms become more read- and maintanable when we reparse a part of the
input stream with a different parser. In order to be able to do this we can just parse up to
a part of the input stream defined by a parser and return the input stream up to that point
as a `String`

**N.B.** This convenience comes with a price, the `parser` will try to match for every position
in the input stream until it succeeds or fails on an empty input. Hence use with care.

```elixir
    iex(54)> upto_end_parser = upto_parser_parser(keywords_parser(~W[end]))
    ...(54)> parse_string(upto_end_parser, "up to end")
    {:ok, "up to "}
```

If however `parser` never succeeds the `upto_parser_parser` fails.

```elixir
    iex(55)> upto_end_parser = upto_parser_parser(keywords_parser(~W[end]))
    ...(55)> parse_string(upto_end_parser, "up to en")
    {:error, "encountered end of input (in upto_parser_parser(keywords_parser([\"end\"]), keep)) in <binary>:1,9"}
```

We can also ask to include the ast from the `parser` into the result

```elixir
    iex(56)> upto_end_parser = upto_parser_parser(keywords_parser(~W[end]), "my parser", :include)
    ...(56)> parse_string(upto_end_parser, "up to end")
    {:ok, {"up to ", "end"}}
```

Or to discard it, which is not the default case (which is `:keep`)

```elixir
    iex(57)> keep_parser = upto_parser_parser(keywords_parser(~W[end]), "my parser", :keep)
    ...(57)> discard_parser = upto_parser_parser(keywords_parser(~W[end]), "my other parser", :discard)
    ...(57)> [ parse(keep_parser, "up to end"), parse(discard_parser, "up to end")]
    [
    %Minipeg.Success{ast: "up to ", cache: %Minipeg.Cache{cache: %{}}, parsed_at: {7, 1}, parsed_by: "my parser", rest: %Minipeg.Input{input: "end", col: 7, lnb: 1}},
    %Minipeg.Success{ast: "up to ", cache: %Minipeg.Cache{cache: %{}}, parsed_at: {7, 1}, parsed_by: "my other parser", rest: %Minipeg.Input{input: "", col: 10, lnb: 1}}
    ]
```


### Error Handling...

has been enhanced a little bit in version 0.6.0, we have two new combinators that allow to make better error messages. Much can still be done I guess,
however, as we will demonstrate now, with the `map_error` one can collect on the data in `%Failure{}`, but the `Failure` struct might provider a richer
inteface maybe.

#### `map_error`

...


## Definitions

### Parser

A `Parser` is a struct that parses an `Input` struct (with the `parse` function) and either returns a `Success` or `Failure` struct

In order to abstract the internal representations of input and results  the `parse_string` function is provided as shown in the exampleas above.

The `Success` struct contains the resulting Abstract Syntaxt Tree and the rest of the input as an `Input` struct.

The `Failure` struct contains the original `Input` struct and an error message

Internally a `Cache` is already returned (and passed into subsequent `parse` calls of the `Parser` module) but unless
you are extending `Minipeg` itself by defining parsers by hand instead of using `Combinators` you can ignore this.

### Combinator

A `Combibator` is a function that takes a `Parser` optionally some arguments and returns a new `Parser`


## LICENSE

Apache-2.0 see [for details](LICENSE)

<!--SPDX-License-Identifier: Apache-2.0-->