# Funkspector


Web page inspector for Elixir.

Funkspector is a web scraper that lets you extract data from web pages and XML sitemaps.

## Usage

### Page Scraping

Simply pass Funkspector the URL of a web page to inspect and it will return its scraped data:

iex> { :ok, data } = Funkspector.page_scrape("")

### Sitemap Scraping

Funkspector can extract the locations from XML sitemaps, like this:

iex> { :ok, data } = Funkspector.sitemap_scrape("")

### Custom options

Both `Funkspector.page_scrape` and `Funkspector.sitemap_scrape` accept options to customize the timeout and User Agent string.

For example, you could use:

  Funkspector.page_scrape("", %{recv_timeout: 5_000, user_agent: "My Bot"})
  Funkspector.sitemap_scrape("", %{recv_timeout: 5_000, user_agent: "My Bot"})

### Scraped data

Currently Funkspector returns this scraped data both from pages and sitemaps:

* `headers`. Response headers, including content-type etc.
* `body`. Raw body.
* `original_url` and `final_url`. Funkspector follows redirections, here are the original URL given and the final one after following the redirections.
* `scheme`. Like, "http" or "https".
* `host`. Like, "".
* `root_url`. Root url for the given URL. For `` it will be ``.

The PageScraper also returns:

* `links`. Organized in `raw`, `http.internal`, `http.external` and `non_http`.

The SitemapScraper also returns:

* `locs`. Collection ot URLs.

## Error response

In case of error, Funkspector will return the `original_url` and the reason from the server:

case Funkspector.page_scrape("") do
  { :ok, data } ->
  { :error, url, reason } ->
    IO.puts "Could not scrape #{url} because of #{reason}"

## Installation

If [available in Hex](, the package can be installed as:

  1. Add funkspector to your list of dependencies in `mix.exs`:

        def deps do
          [{:funkspector, "~> 0.1"}]

  2. Ensure funkspector is started before your application:

        def application do
          [applications: [:funkspector]]