# Meeseeks

[![Build Status](](

Meeseeks is an Elixir library for parsing and extracting data from HTML.

import Meeseeks.CSS

html = HTTPoison.get!("").body

for story <- Meeseeks.all(html, css("tr.athing")) do
  title =, css(".title a"))
  %{title: Meeseeks.text(title),
    url: Meeseeks.attr(title, "href")}
#=> [%{title: "...", url: "..."}, %{title: "...", url: "..."}, ...]
[API documentation]( is available.

## Installation

Add Meeseeks to your `mix.exs`:

defp deps do
    {:meeseeks, "~> 0.7.0"}

Then run `mix get.deps`.

## Dependencies

Meeseeks depends on [html5ever]( via [meeseeks_html5ever](

Because html5ever is a Rust library, you will need to have the Rust compiler [installed](

This dependency is necessary because there are no HTML5 spec compliant parsers written in Elixir/Erlang.

## Getting Started

### Parse

Start by parsing a source (HTML/XML string or [`Meeseeks.TupleTree`]( into a [`Meeseeks.Document`]( so that it can be queried.

`Meeseeks.parse/1` parses the source as HTML, but `Meeseeks.parse/2` accepts a second argument of either `:html` or `:xml` that specifies how the source is parsed.

document = Meeseeks.parse("<div id=main><p>1</p><p>2</p><p>3</p></div>")
#=> #Meeseeks.Document<{...}>

The selection functions accept an unparsed source, parsing it as HTML, but parsing is expensive so parse ahead of time when running multiple selections on the same document.

### Select

Next, use one of Meeseeks's two main selection functions, `all` or `one`, to search for nodes. Both functions accept a queryable (a source, a document, or a [`Meeseeks.Result`](, one or more [`Meeseeks.Selector`](, and optionally an initial context.

`all` returns a list of results representing every node matching one of the provided selectors, while `one` returns a result representing the first node to match a selector (depth-first).

Use the `css` macro provided by [`Meeseeks.CSS`]( or the `xpath` macro provided by [`Meeseeks.XPath`]( to generate selectors.

import Meeseeks.CSS
result =, css("#main p"))
#=> #Meeseeks.Result<{ <p>1</p> }>

import Meeseeks.XPath
result =, xpath("//*[@id='main']//p"))
#=> #Meeseeks.Result<{ <p>1</p> }>

### Extract

Retrieve information from the result with an extraction function.

The [`Meeseeks.Result`]( extraction functions are `attr`, `attrs`, `data`, `dataset`, `html`, `own_text`, `tag`, `text`, `tree`.

#=> "p"
#=> "1"
#=> {"p", [], ["1"]}

## Custom Selectors

Meeseeks is designed to have extremely extensible selectors, and creating a custom selector is as easy as defining a struct that implements the [`Meeseeks.Selector`]( behaviour.

defmodule CommentContainsSelector do
  use Meeseeks.Selector

  alias Meeseeks.Document

  defstruct value: ""

  def match(selector, %Document.Comment{} = node, _document, _context) do
    String.contains?(node.content, selector.value)

  def match(_selector, _node, _document, _context) do

selector = %CommentContainsSelector{value: "TODO"}"<!-- TODO: Close vuln! -->", selector)
#=> #Meeseeks.Result<{ <!-- TODO: Close vuln! --> }>

To learn more, check the documentation for [`Meeseeks.Selector`]( and [`Meeseeks.Selector.Combinator`](

## Contribute

Contributions are very welcome, especially bug reports.

If submitting a bug report, please search open and closed issues first.

To make a pull request, fork the project, create a topic branch off of `master`, push your topic branch to your fork, and open a pull request.

If you're submitting a bug fix, please include a test or tests that would have caught the problem.

If you're submitting new features, please test and document as appropriate.

By submitting a patch, you agree to license your work under the license of this project.

### Running Tests

$ git clone
$ cd meeseeks
$ mix deps.get
$ mix test

### Building Docs

$ MIX_ENV=docs mix docs

## License

Meeseeks is licensed under the [MIT License](LICENSE)