# HtmlQuery

A concise API, honed over multiple years, for querying HTML. There are just 5 main functions:
`all/2`, `find/2` and `find!/2` for finding things, plus `attr/2` and `text/1` for extracting
information. There are also a handful of other useful functions, referenced below and described in detail in
the [module docs]( HTML parsing is handled by

The input can be:

* A string of HTML.
* An [IO Data]( of HTML.
* A [Floki]( [html_node](
  or [html_tree]( data structure. HtmlQuery uses Floki internally
  and can accept its data structure as input, and some HtmlQuery functions return its data structure as output.
* Anything that implements the `String.Chars` protocol. See [Implementing String.Chars](#implementing-string-chars)

We created a related library called [XmlQuery]( which has the same API but
is used for querying XML.

This library is MIT licensed and is part of a growing number of Elixir open source libraries published at

## Installation

def deps do
    {:html_query, "~> 1.2"}

## Usage

Detailed docs are in the [HtmlQuery module docs](; a quick usage
overview follows.

We typically alias `HtmlQuery` to `Hq`:

alias HtmlQuery, as: Hq

The rest of these examples use the following HTML:

html = """
  <h1>Please update your profile</h1>
  <form id="profile" test-role="profile">
    <label>Name <input name="name" type="text" value="Fido"> </label>
    <label>Age <input name="age" type="text" value="10"> </label>
    <label>Bio <textarea name="bio">Fido likes long walks and playing fetch.</textarea> </label>

### Querying

Query functions use CSS selector strings for finding nodes. The
[MDN CSS Selectors guide](
is a helpful CSS reference.

Hq.find(html, "form#profile label textarea")

We’ve found that reserving CSS classes and IDs for styling instead of using them for testing reduces the chance of
styling changes breaking tests, and so we often add attributes that start with `test-` into our HTML; a query for
`test-role` would look like:

Hq.find(html, "form[test-role=profile]")

For simple queries, [HtmlQuery.Css]( provides a shorthand
using keyword lists. For complicated queries, it’s usually clearer to use a CSS string.

Hq.find(html, test_role: "profile")

### Finding

`all/2` finds all elements matching the query, `find/2` returns the first element that matches the selector or `nil` if
none was found, and `find!/2` is like `find/2` but raises unless exactly one element is found.

Hq.all(html, "input") # returns a list of all the <input> elements
Hq.find(html, "input[name=age]") # returns the <input> with `name=age`
Hq.find!(html, "input[name=foo]") # raises because no such element exists

See the [module docs]( for more details.

### Extracting

`text/1` is the simplest extraction function:

html |> Hq.find(:h1) |> Hq.text() # returns "Please update your profile"

`attr/2` returns the value of an attribute:

html |> Hq.find("input[name=age]") |> Hq.attr(:value) # returns "10"

To extract data from multiple HTML nodes, we found that it is clearer to compose multiple functions rather than to
have a more complicated API:

html |> Hq.all(:input) |> # returns ["Name", "Age"]
html |> Hq.all("input[type=text]") |>, "value")) # returns ["Fido", "10"]

There are also functions for extracting form fields as a map, meta tags as a list, and table contents as a list of
lists or a list of maps. See the [module docs]( for more details.

### Parsing

`parse/1` and `parse_doc/1` delegate to Floki’s `parse_fragment/1` and `parse_document!/1` functions. These functions
are rarely needed since all the HtmlQuery functions will parse HTML if needed. See the
[module docs]( for more details.

### Utilities

`inspect_html/2` prints prettified HTML with a label, `normalize/1` parses and re-stringifies HTML which can be handy
when trying to compare two strings of HTML, `pretty/1` formats HTML in a human-friendly format, and `reject/2` removes
nodes that match the given selector. See the [module docs]( for more

## Implementing String.Chars

HtmlQuery functions that accept HTML will convert any module that implements `String.Chars`. For example, our
[Pages]( testing library implements `String.Chars` for controller output like

defimpl String.Chars do
  def to_string(%Pages.Driver.Conn{conn: %Plug.Conn{status: 200} = conn}),
    do: Phoenix.ConnTest.html_response(conn, 200)

and implements `String.Chars` for LiveView output like this:

defimpl String.Chars, for: Pages.Driver.LiveView do
  def to_string(%Pages.Driver.LiveView{rendered: rendered}) when not is_nil(rendered),
    do: rendered

  def to_string(%Pages.Driver.LiveView{live: live}) when not is_nil(live),
    do: live |> Phoenix.LiveViewTest.render()

## Development

brew bundle