README.md

# Capuli

[![Build Status](https://travis-ci.org/merongivian/capuli.svg?branch=master)](https://travis-ci.org/merongivian/capuli/)
[![Deps Status](https://beta.hexfaktor.org/badge/all/github/merongivian/capuli.svg)](https://beta.hexfaktor.org/github/merongivian/capuli)

Declarative XML parsing library inspired by Ruby's [Sax Machine](https://github.com/pauldix/sax-machine), backed by Floki

This is used by [Feedraptor](https://github.com/merongivian/feedraptor) for parsing feeds (RSS, Atom, etc.)

## Examples

Add `use Capuli` in any module and define properties to parse:

```elixir
defmodule AtomEntry do
  use Capuli
  element :title
  # The :as argument makes this available through entry.author instead of entry.name
  element :name, as: :author
  # Element name is case insensitive, so it's not necessary to add the element name as feedburner:origLink
  element :"feedburner:origlink", as: :url
  element :published
end

defmodule Atom do
  use Capuli
  # The :with argument means that you only match a link tag
  # that has an attribute of type: "text/html"
  element :link, value: :href, as: :url, with: [
    type: "text/html"
  ]
  # The :value argument means that instead of setting the value
  # to the text between the tag, it sets it to the attribute value of :href
  element :link, value: :href, as: :feed_url, with: [
    type: "application/atom+xml"
  ]
  elements :entry, as: :entries, module: AtomEntry
end
```

Then parse any XML with your module:

```elixir
feed = Atom.parse(xml_text)

feed.title # Whatever the title of the blog is
feed.url # The main URL of the blog
feed.feed_url # The URL of the blog feed

List.first(feed.entries).title # Title of the first entry
List.first(feed.entries).author # The author of the first entry
List.first(feed.entries).url # Permalink on the blog for this entry
```

Multiple elements can be mapped to the same alias:

```elixir
defmodule RSSEntry do
  use Capuli
  # ...
  element :pubdate, as: :published
  element :"dc:date", as: :published
  element :"dcterms:created", as: :published
end
```

If more than one of these elements exists in the source, the value from the *first one* is used. The order of
the `element` declarations in the code is unimportant. The order they are encountered while parsing the
document determines the value assigned to the alias.

If an element is defined in the source but is blank (e.g., `<pubDate></pubDate>`), it is ignored, and non-empty one is picked.

## Installation

The package can be installed by adding `capuli` to your list of dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:capuli, "~> 0.2.0"}
  ]
end
```

Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc)
and published on [HexDocs](https://hexdocs.pm). Once published, the docs can
be found at [https://hexdocs.pm/capuli](https://hexdocs.pm/capuli).

## Tasks

- [x] Basic DSL
- [ ] `:default` option
- [ ] Auto parsing (without specifying a module)
- [ ] **Maybe**: Type coercion
- [ ] **Maybe**: Streaming for big XML files

## License

Capuli is under MIT license. Check the `LICENSE` file for more details.