defmodule CDPEx do
@moduledoc """
OTP-native Chrome DevTools Protocol (CDP) browser automation for Elixir.
`CDPEx` launches a headless Chrome process and drives it directly over the
Chrome DevTools Protocol on a `Mint.WebSocket` connection — no ChromeDriver
and no Node.js. Browsers and their WebSocket connections are supervised
processes, so a Chrome crash surfaces to callers as `{:error, reason}` rather
than a hung session.
This module is the high-level facade. See `CDPEx.Page` for page operations.
## Example
{:ok, browser} = CDPEx.launch()
{:ok, page} = CDPEx.new_page(browser)
{:ok, _page} = CDPEx.Page.navigate(page, "https://example.com")
{:ok, html} = CDPEx.Page.html(page)
:ok = CDPEx.stop(browser)
Or, resource-safe, with `with_page/3`:
CDPEx.with_page([], fn page ->
{:ok, _} = CDPEx.Page.navigate(page, "https://example.com")
CDPEx.Page.html(page)
end)
Observability is via `:telemetry` — see `CDPEx.Telemetry` for the event taxonomy
(launch / navigate spans, page open/close, and error events). Silent by default.
## Error handling
Every operation returns `{:error, reason}` on failure; `t:error_reason/0` documents
the reason shapes. To drive retries without hard-coding that list, classify the
reason instead of matching it:
case CDPEx.Page.navigate(page, url) do
{:ok, page} ->
{:ok, page}
{:error, reason} ->
if CDPEx.transient?(reason), do: retry(), else: {:error, reason}
end
`classify_error/1` buckets a reason as `:transient` (a fresh attempt may succeed),
`:terminal` (it won't), or `:unknown` (payload-dependent — you decide). It tracks
the error surface as the library evolves, so the transient/terminal decision stays
in one place rather than being reimplemented (and re-drifting) downstream. Retries
stay yours to bound: cap attempts, back off, and on `:transient` re-establish the
resource (a fresh page/browser) rather than reusing a dead handle.
> #### Status {: .info}
>
> Pages default to one WebSocket each (strong crash isolation); opt into
> `sessionId` multiplexing (many pages over the one browser socket) with
> `new_page(browser, transport: :session)`, trading isolation for fewer sockets.
> Stealth / anti-fingerprinting presets remain out of scope.
"""
alias CDPEx.Browser
alias CDPEx.Connect
alias CDPEx.Page
alias CDPEx.Telemetry
@typedoc """
The `reason` shapes that appear in `{:error, reason}` across CDPEx.
Error reasons are part of the public contract — pattern-match the **tagged kinds**
(`{:cdp_error, …}`, `{:timeout, …}`, `{:ws_closed, …}`, …); their payloads (a CDP
method, an exit status, a stderr/contents excerpt) are open and may gain detail.
The only bare, context-free reasons are `:noproc`, the high-level `:timeout`,
`:unknown_page`, `:already_authenticated`, and `:already_intercepting` —
self-describing control-flow outcomes with no payload to carry, the way GenServer
uses `:noproc`. Validation failures that *do* have offending data to surface are
tagged instead (`{:invalid_response_body, excerpt}`, `{:invalid_pdf_data, excerpt}`,
`{:invalid_screenshot_data, excerpt}`).
To act on a failure without hard-coding this list, use `classify_error/1` — it
buckets any reason as `:transient` / `:terminal` / `:unknown` and tracks this union,
so retry logic isn't reimplemented (and re-drifted) downstream.
Two sub-unions are machine-checked: `t:CDPEx.Connection.call_error/0` and
`t:CDPEx.Chrome.launch_error/0` are precisely specced on `call/5` / `launch/1`, so
Dialyzer catches a shape change in *those* at the source. The remaining members —
the page-level tagged kinds and bare atoms — are hand-maintained (kinds such as
`{:cdp_error, method, payload}` also wrap arbitrary CDP data), kept honest by a
compile-time coverage test that fails if any member here lacks a `classify_error/1`
test exemplar — so a member can't be added to this type without being classified.
That guard is one-directional (type → classified): the reverse — an error a producer
returns but never adds here — still relies on review, and such a stray reason would
fall through `classify_error/1` to `:unknown`.
Two timeout shapes, by layer: the low-level `CDPEx.Connection.call/5` and
`await_event/4` return `{:timeout, context}` (a CDP method, or `:await_event`),
while the high-level `CDPEx.Page` `wait_for_*` functions and `CDPEx.Pool.checkout/2`
return a bare `:timeout` ("the awaited condition didn't happen in time").
A WebSocket frame that fails to decode is not a standalone reason: the connection
stops on the decode failure, so callers observe it nested, as
`{:ws_closed, {:ws_decode, _}}`.
"""
@type error_reason ::
CDPEx.Connection.call_error()
| CDPEx.Chrome.launch_error()
| {:ws_connect, term()}
| {:ws_upgrade, term()}
| :timeout
| :unknown_page
| :already_authenticated
| :already_intercepting
| {:timeout, :await_event}
| {:conflict, :authenticated | :intercepting}
| {:navigate, String.t()}
| {:no_document_response, String.t()}
| {:connect_discovery_failed, term()}
| {:capture_failed, term()}
| {:idle_wait_failed, term()}
| {:selector_not_found, String.t()}
| {:not_clickable, String.t()}
| {:unknown_key, String.t()}
| {:evaluate_exception, term()}
| {:unserializable_value, String.t()}
| {:unexpected_evaluate, term()}
| {:invalid_args, term()}
| {:invalid_source, term()}
| {:invalid_error_reason, term()}
| {:invalid_transport, term()}
| {:invalid_proxy, term()}
| {:unsupported_transport, term()}
| {:unsupported_with_connect, term()}
| {:invalid_response_body, String.t()}
| {:invalid_pdf_data, String.t()}
| {:invalid_screenshot_data, String.t()}
| {:write_failed, term()}
@typedoc """
The result of `classify_error/1`.
Intentionally open: match `:transient` (or `:terminal`) explicitly and fall through
with a catch-all rather than enumerating all three atoms, so a future bucket can be
added without breaking exhaustive matches.
"""
@type error_classification :: :transient | :terminal | :unknown
@doc """
Classifies an error `reason` as `:transient`, `:terminal`, or `:unknown`.
`reason` is the value from any `{:error, reason}` this library returns (see
`t:error_reason/0`). The classification answers one question — **might a fresh
attempt succeed?** — so you drive retries from one place instead of reimplementing
the decision (and re-drifting it) in every caller:
case CDPEx.Page.navigate(page, url) do
{:ok, page} ->
{:ok, page}
{:error, reason} ->
case CDPEx.classify_error(reason) do
:transient -> retry_with_fresh_page()
_ -> {:error, reason}
end
end
The buckets:
* `:transient` — environmental or timing failures: the connection dropped or
couldn't be established (`{:ws_closed, _}`, `{:ws_connect, _}`, `{:ws_upgrade, _}`,
`:noproc`), a wait or call timed out (`:timeout`, `{:timeout, _}`), Chrome died
or was slow to start (`{:chrome_exited, _, _}`, `{:debug_url_not_found, _}`,
`{:devtools_file_malformed, _}`), an internal capture/idle helper crashed
(`{:capture_failed, _}`, `{:idle_wait_failed, _}`), or a navigation hit a
connection/network-layer `net::ERR_*` (e.g. `{:navigate, "net::ERR_CONNECTION_REFUSED"}`).
* `:terminal` — deterministic outcomes: a selector didn't match, JS threw, a
usage/validation error, or a missing Chrome binary. Retrying the same call
yields the same error. (`:already_authenticated` / `:already_intercepting` are
terminal for the ordinary double-call; the narrow post-timeout teardown race
`authenticate/4` documents — where a retry can still succeed — is signalled by
the preceding `{:timeout, _}`, which is itself `:transient`.)
* `:unknown` — the outcome depends on a payload or timing this function does not
crack: an ambiguous navigation `net::ERR_*` (DNS `ERR_NAME_NOT_RESOLVED`,
`ERR_ABORTED`, `ERR_BLOCKED_BY_*` — unlike the connection-layer codes above), the
CDP error code (`{:cdp_error, _, _}`), the file-write posix reason
(`{:write_failed, _}`), whether a `{:no_document_response, _}` was a
same-document hop or a slow miss, or a `connect/2` endpoint-discovery failure
(`{:connect_discovery_failed, _}` — a transient network blip and a permanently
bad endpoint are indistinguishable here). Also covers any term `CDPEx` doesn't
produce. Decide the retry policy yourself.
Retries are the caller's responsibility: bound the attempts and back off. A
`:transient` result means **re-establish the resource** — open a fresh page/browser
or call `CDPEx.Pool.checkout/2` again — not retry the same handle (a dead page keeps
returning `:noproc`). Some `:transient` reasons are still a judgment call for your
environment — retrying a `:timeout` / `net::ERR_TIMED_OUT` multiplies wall-time and
browser memory, so a resource-constrained caller may reasonably treat timeouts as
terminal. The input is typed `term()` so the catch-all stays reachable;
routing through this instead of matching `t:error_reason/0` directly trades Dialyzer
exhaustiveness for a stable, library-maintained dispatch point.
"""
@spec classify_error(term()) :: error_classification()
# Transient — a fresh attempt may succeed (connection / process / launch / helper).
def classify_error({:ws_closed, _}), do: :transient
def classify_error({:ws_connect, _}), do: :transient
def classify_error({:ws_upgrade, _}), do: :transient
def classify_error(:noproc), do: :transient
def classify_error(:timeout), do: :transient
def classify_error({:timeout, _}), do: :transient
def classify_error({:chrome_exited, _, _}), do: :transient
def classify_error({:debug_url_not_found, _}), do: :transient
def classify_error({:devtools_file_malformed, _}), do: :transient
def classify_error({:capture_failed, _}), do: :transient
def classify_error({:idle_wait_failed, _}), do: :transient
# Terminal — deterministic; retrying the same call yields the same error.
def classify_error({:chrome_not_found, _}), do: :terminal
def classify_error({:selector_not_found, _}), do: :terminal
def classify_error({:not_clickable, _}), do: :terminal
def classify_error({:unknown_key, _}), do: :terminal
def classify_error({:evaluate_exception, _}), do: :terminal
def classify_error({:unserializable_value, _}), do: :terminal
def classify_error({:unexpected_evaluate, _}), do: :terminal
def classify_error({:invalid_args, _}), do: :terminal
def classify_error({:invalid_source, _}), do: :terminal
def classify_error({:invalid_error_reason, _}), do: :terminal
def classify_error({:invalid_transport, _}), do: :terminal
def classify_error({:invalid_proxy, _}), do: :terminal
def classify_error({:unsupported_transport, _}), do: :terminal
def classify_error({:unsupported_with_connect, _}), do: :terminal
def classify_error({:invalid_response_body, _}), do: :terminal
def classify_error({:invalid_pdf_data, _}), do: :terminal
def classify_error({:invalid_screenshot_data, _}), do: :terminal
def classify_error({:conflict, _}), do: :terminal
def classify_error(:unknown_page), do: :terminal
def classify_error(:already_authenticated), do: :terminal
def classify_error(:already_intercepting), do: :terminal
# A navigation failure carries Chrome's net::ERR_* text: connection/network-layer
# codes are transient (a fresh attempt may succeed); everything else stays :unknown
# (see @transient_net_errors). A non-string payload can't be inspected -> :unknown.
def classify_error({:navigate, error_text}) when is_binary(error_text) do
if transient_net_error?(error_text), do: :transient, else: :unknown
end
# Ambiguous — :unknown until the caller (or a future refinement) inspects the payload
# or timing: a non-string navigate reason, an ambiguous net::ERR_* (DNS / blocked /
# aborted), the CDP error code, the file-write posix reason, or whether a no-document
# navigation was a same-document hop vs a slow miss. Explicit (not the catch-all) so
# they read as decisions and the coverage test holds them.
def classify_error({:navigate, _}), do: :unknown
def classify_error({:cdp_error, _, _}), do: :unknown
def classify_error({:write_failed, _}), do: :unknown
def classify_error({:no_document_response, _}), do: :unknown
def classify_error({:connect_discovery_failed, _}), do: :unknown
# Anything else — a reason CDPEx doesn't produce, or a future shape.
def classify_error(_other), do: :unknown
@doc """
Convenience over `classify_error/1`: `true` only when the error is `:transient`.
Conservative by design — `:unknown` is **not** transient, so an unrecognized or
payload-dependent error won't be auto-retried. Match `classify_error/1` directly
when you want to treat `:unknown` specially, and see its note on bounded,
resource-re-establishing retries — this classifies, it does not retry.
"""
@spec transient?(term()) :: boolean()
def transient?(reason), do: classify_error(reason) == :transient
# Connection/network-layer net::ERR_* codes whose retry outcome is unambiguous: a
# fresh attempt may succeed. These are site-independent Chromium semantics, not
# site-specific guesses. Genuinely ambiguous codes (DNS ERR_NAME_NOT_RESOLVED,
# ERR_ABORTED, ERR_BLOCKED_BY_*, ERR_TOO_MANY_REDIRECTS) are deliberately left
# :unknown for the caller to decide.
@transient_net_errors MapSet.new(~w(
ERR_CONNECTION_REFUSED
ERR_CONNECTION_RESET
ERR_CONNECTION_CLOSED
ERR_CONNECTION_ABORTED
ERR_CONNECTION_TIMED_OUT
ERR_TIMED_OUT
ERR_NETWORK_CHANGED
ERR_INTERNET_DISCONNECTED
ERR_ADDRESS_UNREACHABLE
ERR_SOCKET_NOT_CONNECTED
))
# Exact-token match on the code after the "net::" prefix (Chrome's navigate errorText is
# the bare code) — not a substring scan, so a future allowlist entry can't silently flip
# an ambiguous code that merely contains it. A non-"net::" / suffixed text is :unknown.
defp transient_net_error?("net::" <> code), do: MapSet.member?(@transient_net_errors, code)
defp transient_net_error?(_error_text), do: false
@doc """
Launches a headless Chrome browser and returns its process pid.
Accepts the launch options documented in `CDPEx.Chrome` (e.g. `:headless`,
`:chrome_binary`, `:extra_args`, `:window_size`, `:launch_timeout`). On slow
cold-start hosts (e.g. headless Chrome in a constrained container) raise
`:launch_timeout` — it is a ceiling, not a fixed wait. For long-lived use, prefer
putting `CDPEx.Browser` under your own supervisor with a `:shutdown` timeout.
## Proxy
Pass `:proxy` to route the browser through a proxy — a URL or a keyword list:
CDPEx.launch(proxy: "http://user:pass@host:8080")
CDPEx.launch(proxy: [server: "host:8080", username: "u", password: "p"])
It sets Chrome's `--proxy-server` and, when credentials are given, **automatically
answers the proxy auth challenge on each page** — so you just `new_page/2` and
`CDPEx.Page.navigate/3`, no manual `CDPEx.Page.authenticate/4`. See `CDPEx.Proxy` for
the accepted forms (the keyword form avoids percent-encoding special-character
passwords).
A credentialed proxy requires the default `:dedicated` transport: `new_page(transport:
:session)` on such a browser returns `{:error, {:unsupported_transport, :session}}`,
and an auto-armed page can't also use `enable_request_interception/2` (both drive the
`Fetch` domain). A malformed `:proxy` — or combining it with a full `:args` override —
fails the launch with `{:error, {:invalid_proxy, _}}` (`:proxy` appends to `:extra_args`,
which an `:args` override discards, so the two are mutually exclusive; use one). Don't
set `--proxy-server` in `:extra_args` yourself when using `:proxy`.
"""
@spec launch(keyword()) :: GenServer.on_start()
def launch(opts \\ []) do
Telemetry.span(:launch, %{}, fn ->
result = Browser.start_link(opts)
{result, launch_metadata(result)}
end)
end
# Launch span :stop metadata: empty on success, {error: reason} on failure — so a
# consumer can tell a failed launch from a successful one (mirrors navigate's span).
defp launch_metadata({:ok, _pid}), do: %{}
defp launch_metadata({:error, reason}), do: %{error: reason}
defp launch_metadata(:ignore), do: %{error: :ignore}
@doc """
Connects to an already-running Chrome instead of launching one.
`endpoint` is either a `ws://`/`wss://` browser WebSocket URL (used directly) or
an `http://`/`https://` base URL (the browser socket is discovered via
`GET /json/version`, with the host/port taken from `endpoint`). Returns the same
handle as `launch/1`, so `new_page/2`, `with_page/3`, `close_page/2`, and
`stop/1` all work — but `stop/1` only closes the pages cdp_ex opened and drops
the socket; it never kills the remote Chrome or touches pre-existing tabs.
> #### http(s) discovery is IP/localhost only {: .warning}
>
> Chrome's DevTools HTTP endpoint rejects `/json/version` with **403** when the
> `Host` header is a non-IP, non-`localhost` name (DNS-rebinding protection), so
> the `http(s)://` discovery form works only for IP / `localhost` endpoints (a 403
> surfaces as `{:error, {:connect_discovery_failed, {:http_status, 403}}}`). For a
> **named** remote/sidecar/cloud host, pass the `ws(s)://` browser URL directly
> (or launch that Chrome with a permissive `--remote-allow-origins`/host).
Pages default to `:session` transport; `transport: :dedicated` returns
`{:error, {:unsupported_transport, :dedicated}}` (not yet supported over a
connected browser).
Options: `:insecure` (skip `wss://` cert verification, default `false`),
`:cacertfile` / `:cacerts` (custom CA for `wss://`), `:discovery_timeout` (ceiling
in ms for the whole `http(s)://` `/json/version` exchange — TCP connect, recv, and
body — default `5_000`; raise it for a slow remote endpoint), `:name` (register the
process).
"""
@spec connect(String.t(), keyword()) :: GenServer.on_start()
def connect(endpoint, opts \\ []) when is_binary(endpoint) do
Telemetry.span(:connect, %{}, fn ->
{tls_opts, opts} = Keyword.split(opts, [:insecure, :cacertfile, :cacerts])
{disc_timeout, opts} = Keyword.pop(opts, :discovery_timeout)
disc_opts = if disc_timeout, do: [timeout: disc_timeout], else: []
result =
case Connect.resolve(endpoint, tls_opts, disc_opts) do
{:ok, ws_url} -> Browser.start_link([connect: ws_url, conn_opts: tls_opts] ++ opts)
{:error, _} = error -> error
end
{result, launch_metadata(result)}
end)
end
@doc "Stops a browser started with `launch/1` (kills Chrome) or `connect/2` (closes the pages it opened, leaves Chrome running)."
@spec stop(pid()) :: :ok
def stop(browser), do: Browser.stop(browser)
@doc "Opens a new page. See `CDPEx.Browser.new_page/2` for options."
@spec new_page(pid(), keyword()) :: {:ok, Page.t()} | {:error, term()}
def new_page(browser, opts \\ []), do: Browser.new_page(browser, opts)
@doc """
Closes a page opened with `new_page/2`.
Returns `{:error, :unknown_page}` if `page` was not opened on `browser`.
"""
@spec close_page(pid(), Page.t()) :: :ok | {:error, :unknown_page}
def close_page(browser, page), do: Browser.close_page(browser, page)
@doc """
Runs `fun` with a fresh page, guaranteeing the page (and, when given launch
options, the browser) is cleaned up afterwards — even if `fun` raises.
Pass an existing browser pid to reuse it, or a keyword list of launch options
to spin up a throwaway browser for the duration of the call. A `[connect:
endpoint]` list attaches to an already-running Chrome instead of launching one
(see `connect/2`); teardown then closes only the pages it opened and never reaps
that Chrome. Returns whatever `fun` returns, or `{:error, reason}` if the
page/browser could not be created.
With launch options, the throwaway browser is linked but **contained**: if it
crashes during the call (e.g. its connection drops) `with_page` returns
`{:error, reason}` instead of letting the crash propagate to the caller. To do
that it briefly traps exits in the calling process for the duration of the call.
Only the browser's own `{:EXIT, _, _}` is drained — a *foreign* process linked
to the caller that exits during this window has its exit delivered as a message
left in the caller's mailbox, so a caller that links other processes and relies
on un-trapped exit propagation should pass a pre-launched browser pid instead.
On slow cold-start hosts, raise `:launch_timeout` (a ceiling, not a fixed wait).
# against an existing browser
CDPEx.with_page(browser, fn page ->
{:ok, _} = CDPEx.Page.navigate(page, "https://example.com")
CDPEx.Page.html(page)
end)
# throwaway browser + page
CDPEx.with_page([headless: true], &CDPEx.Page.html/1)
# against an existing Chrome (discovered via /json/version)
CDPEx.with_page([connect: "http://localhost:9222"], &CDPEx.Page.html/1)
"""
@spec with_page(pid() | keyword(), (Page.t() -> result), keyword()) ::
result | {:error, term()}
when result: var
def with_page(browser_or_opts, fun, opts \\ [])
def with_page(browser, fun, opts) when is_pid(browser) and is_function(fun, 1) do
case new_page(browser, opts) do
{:ok, page} ->
try do
fun.(page)
after
# Best-effort: the page/browser may have already exited, and a teardown
# exit must not mask `fun`'s result or a raised exception.
safe(fn -> close_page(browser, page) end)
end
{:error, _} = error ->
error
end
end
def with_page(launch_opts, fun, opts) when is_list(launch_opts) and is_function(fun, 1) do
# `[connect: endpoint]` attaches to a running Chrome instead of launching one;
# the resource-safe trap/stop/drain machinery below is identical (stop/1 on a
# connected browser closes only the pages we opened — it never kills Chrome).
start =
case Keyword.pop(launch_opts, :connect) do
{nil, _} -> fn -> launch(launch_opts) end
{endpoint, rest} -> fn -> connect(endpoint, rest) end
end
case start.() do
{:ok, browser} ->
# launch/1 links `browser` to us. Trap exits for the duration of this call
# so a browser crash (e.g. its connection dropping mid-call) arrives as a
# message — `fun`'s own {:error, _} returns first and is never masked —
# instead of a link exit that would kill the caller, breaking with_page's
# resource-safe contract. We keep the link (not just a monitor) so that if
# the *caller* dies, the browser is still reaped via Browser.terminate/2.
prev_trap = Process.flag(:trap_exit, true)
try do
try do
with_page(browser, fun, opts)
after
safe(fn -> stop(browser) end)
end
after
# Drain the browser's queued {:EXIT, browser, _} (from its crash, or from
# our own stop/1) so it can't surface to the caller once we restore the
# prior trap_exit flag. A foreign linked process's EXIT is left as-is.
drain_exit(browser)
Process.flag(:trap_exit, prev_trap)
end
{:error, _} = error ->
error
end
end
# Run a teardown action, swallowing an already-dead-process exit (or any raise)
# so cleanup in `with_page/3` never overrides the operation's real outcome.
defp safe(fun) do
fun.()
rescue
_ -> :ok
catch
:exit, _ -> :ok
end
# Remove a single queued {:EXIT, browser, _} (delivered as a message because the
# throwaway-browser `with_page/3` clause traps exits) so it can't leak to the
# caller after the prior trap_exit flag is restored. At most one can exist — a
# process exits once.
defp drain_exit(browser) do
receive do
{:EXIT, ^browser, _reason} -> :ok
after
0 -> :ok
end
end
end