# Spidra Elixir SDK
The official Elixir SDK for [Spidra](https://spidra.io) that allows you to scrape pages, run browser actions, batch-process URLs, and crawl entire sites. All results come back as structured data ready to feed into your LLM pipelines or store directly.
## Installation
Add `spidra` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:spidra, "~> 0.1.0"}
]
end
```
Then run `mix deps.get` in your terminal.
Get your API key at [app.spidra.io](https://app.spidra.io) under **Settings** > **API Keys**.
## Quick start
```elixir
# Initialize your configuration
config = Spidra.Config.new(api_key: "spd_YOUR_API_KEY")
# Run a scrape job
{:ok, job} = Spidra.Scrape.run(config, %{
urls: [%{url: "https://news.ycombinator.com"}],
prompt: "List the top 5 stories with title, points, and comment count",
output: "json"
})
IO.inspect(job["result"]["content"])
```
## Table of contents
- [Spidra Elixir SDK](#spidra-elixir-sdk)
- [Installation](#installation)
- [Quick start](#quick-start)
- [Table of contents](#table-of-contents)
- [Scraping](#scraping)
- [Basic scrape](#basic-scrape)
- [Structured output with JSON schema](#structured-output-with-json-schema)
- [Geo-targeted scraping](#geo-targeted-scraping)
- [Authenticated pages](#authenticated-pages)
- [Browser actions](#browser-actions)
- [Manual job control](#manual-job-control)
- [Batch scraping](#batch-scraping)
- [Crawling](#crawling)
- [Logs](#logs)
- [Usage statistics](#usage-statistics)
## Scraping
All scrape jobs run asynchronously on the Spidra platform. The `Spidra.Scrape.run/3` function submits a job and polls until it finishes. If you need more control, use `submit/2` and `get/2` directly.
Up to 3 URLs can be passed per request and they are processed in parallel.
### Basic scrape
```elixir
{:ok, job} = Spidra.Scrape.run(config, %{
urls: [%{url: "https://example.com/pricing"}],
prompt: "Extract all pricing plans with name, price, and included features",
output: "json"
})
IO.inspect(job["result"]["content"])
# "{ \"plans\": [{ \"name\": \"Starter\", \"price\": \"$9/mo\", \"features\": [...] }, ...] }"
```
### Structured output with JSON schema
When you need a guaranteed shape, pass a `schema`. The API will enforce the structure and return `null` for any missing fields rather than hallucinating values.
```elixir
{:ok, job} = Spidra.Scrape.run(config, %{
urls: [%{url: "https://jobs.example.com/senior-engineer"}],
prompt: "Extract the job listing details",
output: "json",
schema: %{
"type" => "object",
"required" => ["title", "company", "remote"],
"properties" => %{
"title" => %{"type" => "string"},
"company" => %{"type" => "string"},
"remote" => %{"type" => ["boolean", "null"]},
"salary_min" => %{"type" => ["number", "null"]},
"salary_max" => %{"type" => ["number", "null"]},
"skills" => %{"type" => "array", "items" => %{"type" => "string"}}
}
}
})
```
### Geo-targeted scraping
Pass `use_proxy: true` and a `proxy_country` code to route the request through a specific country. Useful for geo-restricted content or localized pricing.
```elixir
{:ok, job} = Spidra.Scrape.run(config, %{
urls: [%{url: "https://www.amazon.de/gp/bestsellers"}],
prompt: "List the top 10 products with name and price",
use_proxy: true,
proxy_country: "de"
})
```
Supported country codes include: `us`, `gb`, `de`, `fr`, `jp`, `au`, `ca`, `br`, `in`, `nl`, `sg`, `es`, `it`, `mx`, and 40+ more. Use `"global"` or `"eu"` for regional routing.
### Authenticated pages
Pass cookies as a string to scrape pages that require a login session.
```elixir
{:ok, job} = Spidra.Scrape.run(config, %{
urls: [%{url: "https://app.example.com/dashboard"}],
prompt: "Extract the monthly revenue and active user count",
cookies: "session=abc123; auth_token=xyz789"
})
```
### Browser actions
Actions let you interact with the page before the scrape runs. They execute in order, and the scrape happens after all actions complete.
```elixir
{:ok, job} = Spidra.Scrape.run(config, %{
urls: [
%{
url: "https://example.com/products",
actions: [
%{type: "click", selector: "#accept-cookies"},
%{type: "wait", duration: 1000},
%{type: "scroll", to: "80%"}
]
}
],
prompt: "Extract all product names and prices"
})
```
### Manual job control
Use `submit/2` and `get/2` when you want to manage polling yourself, or fire-and-forget and check back later.
```elixir
# Submit a job and get the job_id immediately
{:ok, %{"jobId" => job_id}} = Spidra.Scrape.submit(config, %{
urls: [%{url: "https://example.com"}],
prompt: "Extract the main headline"
})
# Check status at any point
{:ok, status} = Spidra.Scrape.get(config, job_id)
case status["status"] do
"completed" -> IO.inspect(status["result"]["content"])
"failed" -> IO.inspect(status["error"])
_ -> IO.puts("Job is still pending...")
end
```
## Batch scraping
Submit up to 50 URLs in a single request. All URLs are processed in parallel. Each URL is a plain string.
```elixir
{:ok, batch} = Spidra.Batch.run(config, %{
urls: [
"https://shop.example.com/product/1",
"https://shop.example.com/product/2",
"https://shop.example.com/product/3"
],
prompt: "Extract product name, price, and availability",
output: "json",
use_proxy: true
})
for item <- batch["items"] do
case item["status"] do
"completed" -> IO.inspect({item["url"], item["result"]})
"failed" -> IO.inspect({item["url"], item["error"]})
_ -> :ok
end
end
```
**Retry failed items:**
```elixir
{:ok, %{"batchId" => batch_id}} = Spidra.Batch.submit(config, %{
urls: ["https://example.com/1", "https://example.com/2"],
prompt: "Extract the page title"
})
# Later, after checking status
{:ok, result} = Spidra.Batch.get(config, batch_id)
if result["failedCount"] > 0 do
{:ok, retried} = Spidra.Batch.retry(config, batch_id)
IO.puts("Retried #{retried["retriedCount"]} items")
end
```
**Cancel a running batch:**
```elixir
{:ok, response} = Spidra.Batch.cancel(config, batch_id)
IO.puts("Cancelled #{response["cancelledItems"]} items, refunded #{response["creditsRefunded"]} credits")
```
**List past batches:**
```elixir
{:ok, response} = Spidra.Batch.list(config, page: 1, limit: 20)
for job <- response["jobs"] do
IO.puts("#{job["uuid"]} #{job["status"]} #{job["completedCount"]}/#{job["totalUrls"]}")
end
```
## Crawling
Given a starting URL, Spidra discovers pages automatically according to your instruction and extracts structured data from each one.
```elixir
{:ok, job} = Spidra.Crawl.run(config, %{
base_url: "https://competitor.com/blog",
crawl_instruction: "Find all blog posts published in 2024",
transform_instruction: "Extract the title, author, publish date, and a one-sentence summary",
max_pages: 30,
use_proxy: true
})
for page <- job["result"] do
IO.inspect({page["url"], page["data"]})
end
```
**Get signed download URLs for all crawled pages:**
Each page includes `html_url` and `markdown_url` pointing to S3-signed URLs that expire after 1 hour.
```elixir
{:ok, response} = Spidra.Crawl.pages(config, job_id)
for page <- response["pages"] do
IO.puts("#{page["url"]} - #{page["status"]}")
# Download raw HTML: page["html_url"]
# Download markdown: page["markdown_url"]
end
```
**Re-extract with a new instruction:**
Runs a new AI transformation over an existing completed crawl without re-crawling any pages. Charges credits for the transformation only.
```elixir
{:ok, queued} = Spidra.Crawl.extract(config, source_job_id, "Extract only the product SKUs and prices as a CSV")
# Poll the new job manually
{:ok, result} = Spidra.Crawl.get(config, queued["jobId"])
```
**Crawl history and stats:**
```elixir
{:ok, response} = Spidra.Crawl.history(config, page: 1, limit: 10)
{:ok, stats} = Spidra.Crawl.stats(config)
IO.puts("Total crawls: #{stats["total"]}")
```
## Logs
Scrape logs are stored for every job that runs through the API.
```elixir
# List logs with optional filters
{:ok, response} = Spidra.Logs.list(config, %{
status: "failed",
search_term: "amazon.com",
channel: "api",
date_start: "2024-01-01",
date_end: "2024-12-31",
page: 1,
limit: 20
})
for log <- response["logs"] do
IO.puts("#{hd(log["urls"])["url"]} #{log["status"]} #{log["credits_used"]}")
end
```
**Get a single log with full extraction result:**
```elixir
{:ok, log} = Spidra.Logs.get(config, "log-uuid")
IO.inspect(log["result_data"]) # the full AI output for that job
```
## Usage statistics
Returns credit and request usage broken down by day or week.
```elixir
# Range options: "7d" | "30d" | "weekly"
{:ok, rows} = Spidra.Usage.get(config, "30d")
for row <- rows do
IO.puts("#{row["date"]} Requests: #{row["requests"]} Credits: #{row["credits"]}")
end
```
## Requirements
- Elixir 1.14 or later
- A Spidra API key ([sign up free](https://spidra.io))
## License
MIT