# Scrape
An Elixir package to scrape websites. This is an attempt to rewrite
[meteor-scrape](https://github.com/Anonyfox/meteor-scrape) from scratch,
leveraging the expressiveness and power of Elixir. Current features:
- can handle non-utf-8 sources.
- parse common websites
- parse RSS/Atom feeds
## Installation
Add `scrape` to your mixfile:
````Elixir
{:scrape, "~> 0.1"}
````
## Usage
````Elixir
# Feed scraping:
Scrape.feed "http://feeds.venturebeat.com/VentureBeat"
# result (list of items):
[
%{
categories: ["mobile advertising", "Mobile", "ad blockers", "Marketing"],
description: "<p>Advertisers are very [...shortened..."],
image: "http://i2.wp.com/venturebeat.com/wp-content/uploads/2015/11/FullSizeRender1.jpg?resize=160%2C140",
pubdate: %Timex.DateTime{
calendar: :gregorian,
day: 5,
hour: 22,
minute: 40,
month: 11,
ms: 0,
second: 33,
timezone: %Timex.TimezoneInfo{
abbreviation: "UTC",
from: :min,
full_name: "UTC",
offset_std: 0,
offset_utc: 0,
until: :max},
year: 4015},
title: "Advertising industry challenged to [...shortened...]",
url: "http://venturebeat.com/2015/11/05/advertising-industry-challenged-to-create-ads-that-people-dont-want-to-block/"},
%{...},
...
]
````
````Elixir
# Scrape a website:
Scrape.website "http://montrealgazette.com/"
# Result (basic metadata):
%Scrape.Website{
description: "The latest news and headlines from Montreal and Quebec. Get breaking news, stories and in-depth analysis on business, sports, arts, lifestyle and weather.",
favicon: "http://0.gravatar.com/blavatar/ab6c5a9287c37a4f2ebe4dac7a314814?s=114",
feeds: ["http://montrealgazette.com/feed"],
image: "http://0.gravatar.com/blavatar/ab6c5a9287c37a4f2ebe4dac7a314814?s=200&ts=1446766105",
title: "Montreal Gazette",
url: "http://montrealgazette.com/"
}
````
## License
LGPLv3. Use this library however you want, but I want improvements & bugfixes
to flow back into this package.