# erlarg - v1.0.0
[](https://github.com/Eptwalabha/erlarg/actions/workflows/ci.yml)
An Erlang lib that parsed a list of arguments into structured data.
Useful for handling options/parameters of escript
## Installation
Add `erlarg` to in the `deps` of your `rebar.config`:
```erlang
{deps, [{erlarg, "1.0.0"}]}
% or
{depts, [{erlarg, {git, "https://github.com/Eptwalabha/erlarg.git", {tag, "v1.0.0"}}}]}
```
If you're building an `escript`, add `erlarg` to the list of apps to include in the binary
```erlang
{escript_incl_apps, [erlarg, …]}.
```
fetch and compile the dependencies of your project:
```bash
rebar3 compile --deps_only
```
That's it, you're good to go.
## How does it work ?
Imagine this command :
```bash
./my-script --limit=20 -m 0.25 --format "%s%t" -o output.tsv -
```
The `main/1` function of `my-script` will receive this list of arguments:
```erlang
["--limit=20", "-m", "0.25", "--format", "%s%t", "-o", "output.tsv", "-"]
```
The function `erlarg:parse` will help you convert them into a structured data:
```erlang
main(Args) ->
Syntax = {any, [erlarg:opt({"-l", "--limit"}, limit, int),
erlarg:opt({"-f", "--format"}, format, binary),
erlarg:opt("-o", file, string),
erlarg:opt("-", stdin),
erlarg:opt({"-m", "--max"}, max, float)
]}
{ok, {Result, RemainingArgs} = erlarg:parse(Args, Syntax),
...
```
For this example, `parse` will return this proplist:
```erlang
% Result
[{limit, 20},
{max, 0.25},
{format, <<"%s%t">>},
{file, "output.tsv"},
stdin].
```
The functions `erlarg:parse/2` & `erlarg:parse/3` will transform a list of arguments into a structured data.
- `Args`: A list of arguments (generaly what's given to `main/1`)
- `Syntax`: The syntax (or the specification) that describes how the arguments should be parsed
- `Aliases`: [optional] the list of options, types or sub-syntax that are referenced in `Syntax`
## Syntax
The syntax will describe to the parser how to handle each arguments (`Args`).
It will consume each argument one by one while building the structured data.
A syntax could be any of those things:
- a type
- a named type
- a custom type
- an option
- an alias
- a sub-syntax (which is a syntax itself)
- a syntax operator
- a list of all the above
It can be pretty complex, but for now, let's go simple.
Imagine this fictionnal script `print_n_time` that takes a string and an integer as argument
```bash
# this will print the string "hello" 3 times
$ print_n_time hello 3
```
Here's the simplest spec needed to handle the arguments:
```erlang
Syntax = [string, int].
erlarg:parse(Args, Syntax). % here Args = ["hello", "3"]
{ok, {["hello", 3], []}} % erlang:parse/2 result
```
We explicitly asked the parser to handle two arguments, the first <u>must</u> be a `string`, the second <u>must</u> be an `int`.
If if the parsing is successful, it will return the following tuple:
```erlang
{ok, {Data, RemainingArgs}}.
```
Where `Data` is the structured data generated by the parser (`["hello", 3]`) and `RemainingArgs` is the list of arguments not consumed by the parser (`[]`).
### Parsing failure
If the parser encounter a problem with an argument, it will fail and return the nature of the problem:
```erlang
> erlarg:parse(["world"], [int]).
{error, {not_int, "word"}} % it failed to convert the word "world" into an int
```
or
```erlang
> erlang:parse(["one"], [string, string]). % expect two strings but only got one
{error, {missing, arg}}
```
> [!TIP]
> These errors can be used to explain to the user what's wrong with the command it typed
### Remaining Args
Remaining args are the arguments not consumed by the parser when this one terminates successfuly.
If we add some extra arguments at the end of our command:
```bash
$ print_n_time hello 3 some extra arguments
```
this time, calling `erlarg:parse/2` with the same syntax as before will give this result:
```erlang
Syntax = [string, int].
{ok, {_, RemainingArgs}} = erlarg:parse(Args, Syntax).
["some", "extra", "arguments"] % RemainingArgs
```
The parser will consume the two first arguments, the remaining argument will be returned in the `RemainingArgs`.
> [!NOTE]
> Having unconsumed arguments does not generate an error
## Types
The parser can convert the argument to more types than just `string` and `int`.
Here are all the types currently available :
- `int`: cast the argument into an int
- `float`: cast the argument into a float (will cast int into float)
- `number`: cast the argument into an int. If it fails it will cast the argument into a float
- `string`: returns the given argument
- `binary`: cast the argument into a binary list
- `atom`: cast the arg to an atom
- `bool`: return the boolean value of the arg
| syntax | arg | result | note |
|---|---|---|---|
| int | "1" | 1 |-|
| int | "1.2" | error | not an int |
| float | "1.2" | 1.2 |-|
| float | "1" | 1.0 | cast int into float |
| float | "1.234e2" | 123.4 |-|
| number | "1" | 1 |-|
| number | "1.2" | 1.2 |-|
| string | "abc" | "abc" |-|
| binary | "äbc" | <<"äbc"/utf8>> | use `unicode:characters_to_binary`|
| atom | "super-top" | 'super-top' |-|
the `bool` conversion:
| arg | bool | note |
|---|---|---|
| "true" | true | case insensitive |
| "yes" | true | |
| "abcd" | true | any non-empty string |
| "1" | true | |
| "0.00001" | true | |
| "false" | false | case insensitive |
| "no" | false | |
| "" | false | empty-string|
| "0" | false | |
| "0.0" | false | |
> [!TIP]
> converting an argument into `string`, `binary`, `bool` or `atom` it will always succeed.
If you need more complicated "type", see the chapter on [Custom types](#custom-types)
## Naming parameters
Converting an argument into a specific type is important, but it doesn't really help us understand what these values are for:
```erlang
> Syntax = [string, int].
> {ok, {Result, _}} = erlarg:parse(["hello", "3"], Syntax).
["hello", 3]. % Result
```
To avoid this issue, you can give "name" to the parsed parameters with the following syntax:
```erlang
{Name :: atom(), Type :: base_type()}
```
If we rewrite the syntax as such:
```erlang
Syntax = [{text, string()}, {nbr, int}].
{ok, {Result, _}} = erlarg:parse(["hello", "3"], Syntax).
[{text, "hello"}, {nbr, 3}] % Result
```
you can even name a list of parameters if you want:
```erlang
Syntax = [{a, [string, {a2, float}]}, {b, binary}],
{ok, {Result, _}} = erlang:parse(["abc", "2.3", "bin"], Syntax).
[{a, ["abc", {a2, 2.3}]}, {b, <<"bin">>}] % Result
```
## Options
Naming and casting parameters into types is neat, but most programs use options. An option is an argument that usually (not always…) starts with dash and has zero or more parameters.
```bash
$ date -d --utc --date=STRING
```
Option can have several formats a short one (a dash followed by a letter eg. `-v`) and/or a long one (double dash and a word eg. `--version`)
This table summarizes the formats handled/recognized by the parser:
| format | note |
|---|---|
| -s | |
| -s <u>VALUE</u> | |
| -s<u>VALUE</u> | same as `-s VALUE` |
| -abc <u>VALUE</u> | same as `-a -b -c VALUE` |
| -abc<u>VALUE</u> | same as `-a -b -c VALUE` |
| --long | |
| --long <u>VALUE</u> | |
| --long=<u>VALUE</u> | |
In this chapter, we'll see how to tell the parser how to recognise three kind of options:
- option without parameter
- option with parameters
- option with sub options
### option without parameter
```bash
$ grep -v "bad"
$ grep --invert-match "bad"
```
We can define this option with `erlarg:opt` like so:
```erlang
> Syntax = [erlarg:opt({"-v", "--invert-match"}, invert_match)].
> {ok, {Result, _}} = erlarg:parse(["-v"], Syntax),
[invert_match] % Result
```
The first parameter of `erlarg:opt` is the option:
```erlang
{"-s", "--long"} % short and long options
"-s" % only short option
{"-s", undefined} % same as above
{undefined, "--long"} % only long option
```
The second parameter is the name of the option, in this case `invert_match`
### option with parameter(s)
Option can have parameters
```bash
$ date --date 'now -3 days'
$ date --date='now -3 days'
$ date -d'now -3 days'
```
```erlang
> Syntax = [erlarg:opt({"-d", "--date"}, date, string)].
> {ok, {Result, _}} = erlarg:parse(["--date", "now -3 days"], date, string).
[{date, "now -3 days"}] % Result
```
The third parameter is the syntax of the parameters expected by the option. In this case after matching the argument `--date` this option is expecting a string (`"now -3 days"`).
Maybe one of the option of your program is expecting two parameters ?
No problem :
```erlang
erlang:opt({"-d", "--dimension"}, dimension, [int, string]}).
[{dimension, [3, "inch"]}] % Result for "-d 3 inch"
```
You can even use name
```erlang
erlang:opt({"-d", "--dimension"}, dimension, [{size, int}, {unit, string}]).
[{dimension, [{size, 3}, {unit, "inch"}]}] % Result for "-d 3 inch"
```
### option with sub-option(s):
Because the third parameter is a syntax, and because an option is a syntax itself, that means you can put options into option :
```bash
$ my-script --opt1 -a "param of a" -b "param of opt1" --opt2 …
```
In this fictionnal program, the option `--opt1` has two sub-options (`-a` that expects a parameter and `-b` that doesn't). We can define `opt1` this way:
```erlang
Opt1 = erlarg:opt({"-o", "--opt1"}, % option
opt1, % option's name
[erlarg:opt("-a", a, string), % sub-option 1
erlarg:opt("-b", b), % sub-option 2
{value, string} % the param under the name 'value'
]).
{ok, {Result, _}} = erlarg:parse(["--opt1", "-a", "abc", "-b", "def"], Opt1).
[{opt1, [{a, "abc"}, b, {value, "def"}]}] % Result
```
Well… that's quite unreadable… fortunately, you can use `Aliases` to avoid this mess.
### Aliases
Aliases, let you define all your options, sub-syntax and custom types in a map. It helps keep the Syntax clear and readable.
```erlang
Aliases = #{
option1 => erlarg:opt({"-o", "--opt1"}, opt1, [opt_a, opt_b, {value, string}]),
option2 => erlarg:opt({undefined, "--opt2"}, opt2),
opt_a => erlarg:opt("-a", a, string),
opt_b => erlarg:opt("-b", b)
},
Syntax = [option1, option2],
{ok, {Result, _}} = erlarg:parse(["--opt1", "-a", "abc", "-b", "def", "--opt2"],
Syntax, Aliases).
[{opt1, [{a, "abc"}, b, {value, "def"}]}, opt2] % Result
```
Here `Syntax` is a list of two aliases, `option1` and `option2`
## Syntax operators
Operator tells the parser how to handle a list of syntax
### `sequence` operator
Take the following syntax:
```erlang
[opt({"-d", "--date"}, date, string), opt({"-u", "--utc"}, utc)]
```
It would parse this command without problem:
```bash
$ date -d "now -3 days" --utc # yay!
```
But will crash with this one:
```bash
$ date --utc --date="now -3 days" # boom !
```
Why ? Aren't these two commands identical ?
That's because a list of syntax is considered by the parser as a `sequence` operator :
```
[syntax1, syntax2, …]
```
A `sequence` is expecting the arguments to match in the same order as the elements of the list. The first argument must match `syntax1`, the second `syntax2`, …) if any fails, the whole sequence fails.
All elements of the list must succeed in order for the operator to succeed.
| syntax | args | result | note |
|---|---|---|---|
| [int, string] | ["1", "a"] | [1, "a"] | |
| [int] | ["1", "a"] | [1] | remaining: ["a"] |
| [int, int] | ["1", "a"] | error | "a" isn't an int |
| [int, string, int] | ["1", "a"] | error | missing a third argument |
So how to parse arguments if we're not sure of they order… moreover, some option are… optionnal ! how do we do ?
That's where the `any` operator comes to play.
### `any` operator
format:
```
{any, [syntax1, syntax2, …]}
```
The parser will try to consume arguments as long as one of syntax matches. If an element of the syntax fails, the operator fails.
| syntax | args | result | note |
|---|---|---|---|
| {any, [int]} | ["1", "2", "abc"] | [1, 2] | remaining: ["abc"] |
| {any, [{key, int}]} | ["1", "2"] | [{key, 1}, {key, 2}] | |
| {any, [int, {s, string}]} | ["1", "2", "abc", "3"] | [1, 2, {s, "abc"}, 3] | |
| {any, [string]} | ["1", "-o", "abc", "3"] | ["1", "-o", "abc", "3"] | even if "-o" is an option |
No matter the number of matching element, `any` will always succeed. If nothing matches no arguments will be consumed.
> [!NOTE]
> Keep in mind that if the list given to `any` contains types like `string` or `binary`, it will consume all the remaining arguments.
> `{any, [string, custom_type]}`, `custom_type` will never be executed because the type `string` will always consume argument
### `first`
format:
```
{first, [syntax1, syntax2, …]}
```
The parser will return the first element of the syntax to succeed.
It'll fail if no element matches.
The following table use `Args = ["a", "b", "1"]`
| syntax | result | remaining |
|---|---|---|
| {first, [int]} | [1] | ["2", "a", "3", "b"] |
| {first, [{opt, int}]} | [{opt, 1}] | ["a", "3", "b"] |
| {any, [int, {b, binary}]} | [1, 2, {b, <<"a">>}, 3, {b, <<"b">>}] | [] |
| {any, [string]} | ["1", "2", "a", "3", "b"] | [] |
## Custom types
Sometime, you need to perfom some operations on an argument or do more complexe verifications. This is what custom type is for.
A custom type is a function that takes a list of arguments and return the formated / checked value to the parser:
```erlang
-spec fun(Args) -> {ok, Value, RemainingArgs} | Failure) where
Args :: args(),
Value :: any(),
RemainingArgs :: args(),
Failure :: any().
```
- `Args`: The list of arguments not yet consumed by the parser
- `Value`: The Value you want to return to the parser
- `RemainingArgs`: The list of arguments your function didn't consumed
- `Failure`: some explanation on why the function didn't accept the argument
**Example 1**:
Let say your script has an option `-f FILE` where `FILE` must be an existing file. In this case the type `string` won't be enought. You could write your own function to perform this check:
```erlang
existing_file([File | RemainingArgs]) ->
case filelib:is_regular(File) of
true -> {ok, File, RemainingArgs};
_ -> {not_a_file, File}
end.
```
To use your custom type:
```erlang
Spec = #{
syntax => {any, [file]},
definitions => #{
file => erlarg:opt({"-f", "--file"}, existing_file),
existing_file => fun existing_file/1
}
}.
```
or directly as a syntax:
```erlang
Spec = {any, [{file, erlarg:opt({"-f", "--file"}, fun existing_file/1)}]}.
```
**Example 2**:
In this case, your script needs to fetch the informations of a particular user from a config file with the option `--consult USERS_FILE USER_ID` where `USERS_FILE` is the file containing the users data and `USER_ID` is the id of the user:
```erlang
get_user_config([DatabaseFile, UserID | RemainingArgs]) ->
case file:consult(DatabaseFile) of
{ok, Users} ->
case proplists:get_value(UserID, Users, not_found) of
not_found -> {user_not_found, UserID};
UserData -> {ok, UserData, RemainingArgs}
end;
Error -> {cannot_consult, DatabaseFile, Error}
end;
get_user_config(_) ->
{badarg, missing_arguments}.
```