README.md

# Myhtmlex

Bindings for lexborisov's [myhtml](https://github.com/lexborisov/myhtml).

* Available as a hex package: `{:myhtmlex, "~> 0.1.0"}`
* [Documentation](https://hexdocs.pm/myhtmlex/Myhtmlex.html)

## Example

    iex> Myhtmlex.decode("<h1>Hello world</h1>")
    {"html", [], [{"head", [], []}, {"body", [], [{"h1", [], ["Hello world"]}]}]}

## Thoughts

I need to a fast html-parsing library in Erlang/Elixir.
So falling back to c, and to myhtml especially, is a natural move.

But Erlang interoperability is a tricky mine-field.
This increase in parsing speed does not come for free.

The current implementation can be considered a proof-of-concept.
The myhtml code is called as a dirty-nif and executed **inside the Erlang-VM**.
Thus completely giving up the safety of the Erlang-VM. I am not saying that myhtml is unsafe, but
the slightest Segfault brings down the whole Erlang-VM.
So, I consider this mode of operation unsafe, and **not recommended for production use**.

The other option, that I have on my roadmap, is to call into a C-Node.
A separate OS-process that receives calls from erlang and returns to the calling process.

Another option is to call into a Port driver.
A separate OS-process that communicates via stdin/stdout.

So to recap, I want a **fast** and **safe** html-parsing library for Erlang/Elixir.

Not quite there, yet.

## Development

* Please make sure you do `git submodule update` after a checkout/pull

## Status

Currently under development.

* [x] Parse a HTML-document into a tree
* [ ] Expose node-retrieval functions
* [ ] Investigate safety and calling options
  * [x] Call as dirty-nif
  * [x] Call as C-Node (check branch `c-node`)
  * [ ] Call as Port driver