defmodule Elexer do
@moduledoc """
Elexer: a lexing library for the Elixir programming language.
Elexer is designed to be simple and lightweight, and easy to learn.
GitHub: https://github.com/VideoCarp/Elexer
Example: https://github.com/VideoCarp/Elexer/blob/main/example.ex
Learn by-hand lexing: https://gist.github.com/VideoCarp/d7cec2195a7de370d850aead62fa09cd
Learn Elexer: not yet available. These docs should be quite helpful, though.
"""
@doc """
Does nothing. Used when a function is to be called, but no effect is desired.
"""
def nothing do
end
@doc """
The actual lexing function. Not for your use, but for your contribution. See the `lex` function instead if you plan to use it.
"""
def uglex(current, tokenstream, len, input_str, singlecharh, multicharh, otherwise, tmp) do
char = String.at(input_str, current)
{boolch, tag} = singlecharh.(char)
{boolmh, _} = multicharh.(char) # We don't need tagm here. We'll get it later.
unless current >= len do
cond do
boolmh ->
uglex(current + 1, tokenstream, len, input_str, singlecharh, multicharh, otherwise, tmp <> char)
boolch ->
if tmp == "" do
uglex(current + 1, [{char, tag} | tokenstream], len, input_str, singlecharh, multicharh, otherwise, "")
else
{_, tagm1} = multicharh.(String.at(input_str, current - 1))
uglex(current, [{tmp, tagm1} | tokenstream], len, input_str, singlecharh, multicharh, otherwise, "")
end
true ->
otherwise.()
uglex(current + 1, tokenstream, len, input_str, singlecharh, multicharh, otherwise, "")
end
else
# Prepending then reversing is allegedly faster than concatenation
Enum.reverse(tokenstream)
end
end
# To save you from uglex.
# String, function, function, [function] -> [{string, atom}]
@doc """
'tag' describes the token.
Lex takes in the following:
input_str: String,
singlecharh: Function/1,
multicharh: Function/1,
otherwise: Function/0, # optional
`input_str` the program to lex
`singlecharh` the function that returns `{bool, atom}` where 'bool' is given when a match is made,
and 'atom' is given as the argument for the tag to the match.
For example, a function that matches "(" to return {true, :oparen}
while also matching ")" to return `{true, :cparen}`
'singlecharh' should contain what tokens to match and their tag.
Code example (of a function taking 'character' as its sole argument)
```elixir
cond do
char == "(" ->
{true, :oparen}
char == ")"->
{true, :cparen}
char == "!" ->
{true, :not}
true ->
{false, :pass}
end
```
Example on repository.
This will allow the lexer to lex these characters. Given a string "()" the lexer will
be able to lex that into:
```elixir
[{"(", :oparen}, {")", :cparen}]
```
`multichar`: the function that returns `{bool, atom}` where 'bool' is given as 'true' when the character satisfies the required
condition, and 'atom' is the tag.
For example, a function that matches alphanumeric characters or underscore
while also matching another multi-character pattern.
Effectively used the same way as 'singlecharh'. Example in repository.
`otherwise`: what to execute if elexer encounters a foreigh character.
For example, if 'otherwise' is a function that prints an error, it will print the error when a foreigh character is found.
To give `lex` these arguments, you should use the `&function/arity` syntax, where 'arity' is the number of arguments.
"""
def lex(input_str, singlecharh, multicharh, otherwise \\ ¬hing/0) do
uglex(0, [], String.length(input_str), input_str, singlecharh, multicharh, otherwise, "")
end
end