# Meeseeks

Meeseeks is an Elixir library for extracting data from HTML.

iex> import Meeseeks.CSS
iex> html = Tesla.get("").body
iex> for story <- Meeseeks.all(html, css("tr.athing")) do
       title =, css(".title a"))
       %{title: Meeseeks.text(title),
         url: Meeseeks.attr(title, "href")}
[%{title: "...", url: "..."}, %{title: "...", url: "..."}, ...]
[API documentation]( is available.

## Installation

Add Meeseeks to your `mix.exs`:

defp deps do
    {:meeseeks, "~> 0.4.0"}

Then run `mix get.deps`.

## Dependencies

Meeseeks depends on [html5ever]( via [meeseeks_html5ever](

Because html5ever is a Rust library, you will need to have the Rust compiler [installed](

This dependency is necessary because there are no HTML5 spec compliant parsers written in Elixir/Erlang.

## Getting Started

### Parse

Start by parsing a source (HTML string or [`Meeseeks.TupleTree`]( into a [`Meeseeks.Document`]( so that it can be queried.

iex> document = Meeseeks.parse("<div id=main><p>1</p><p>2</p><p>3</p></div>")

The selection functions accept an unparsed source, but parsing is expensive, so parse ahead of time when running multiple selections on the same document.

### Select

Next, use one of Meeseeks's two selection functions, `all` or `one`, to search for nodes. Both functions accept a queryable (a source, a document, or a [`Meeseeks.Result`]( and one or more [`Meeseeks.Selector`](

`all` returns a list of results representing every node matching one of the provided selectors, while `one` returns a result representing the first node to match a selector (depth-first).

Use the `css` macro provided by [`Meeseeks.CSS`]( to generate selectors.

iex> import Meeseeks.CSS
iex> result =, css("#main p"))
%Meeseeks.Result{ "<p>1</p>" }

### Extract

Retrieve information from the result with an extraction function.

The [`Meeseeks.Result`]( extraction functions are `attr`, `attrs`, `data`, `dataset`, `html`, `own_text`, `tag`, `text`, `tree`.

iex> Meeseeks.tag(result)
iex> Meeseeks.text(result)
iex> Meeseeks.tree(result)
{"p", [], ["1"]}

## Custom Selectors

Meeseeks is designed to have extremely extensible selectors, and creating a custom selector is as easy as defining a struct that implements the [`Meeseeks.Selector`]( behaviour.

iex> defmodule CommentContainsSelector do
       use Meeseeks.Selector

       alias Meeseeks.Document

       defstruct value: ""

       def match?(selector, %Document.Comment{} = node, _document) do
         String.contains?(node.content, selector.value)

       def match?(_selector, _node, _document) do
{:module, ...}
iex> selector = %CommentContainsSelector{value: "TODO"}
%CommentContainsSelector{value: "TODO"}
iex>"<!-- TODO: Close vuln! -->", selector)
%Meeseeks.Result{ "<!-- TODO: Close vuln! -->" }

To learn more, check the documentation for [`Meeseeks.Selector`]( and [`Meeseeks.Selector.Combinator`](

## Contribute

Contributions are very welcome, especially bug reports.

If submitting a bug report, please search open and closed issues first.

To make a pull request, fork the project, create a topic branch off of `master`, push your topic branch to your fork, and open a pull request.

If you're submitting a bug fix, please include a test or tests that would have caught the problem.

If you're submitting new features, please test and document as appropriate.

By submitting a patch, you agree to license your work under the license of this project.

### Running Tests

$ git clone
$ cd meeseeks
$ mix deps.get
$ mix test

### Building Docs

$ MIX_ENV=docs mix docs

## License

Meeseeks is licensed under the [MIT License](LICENSE)