lib/pdf/reader/shape.ex

defmodule Pdf.Reader.Shape do
  @moduledoc """
  Polymorphic struct describing an "interactive" or actionable element
  extracted from a PDF — currently link-like elements (URIs, emails,
  intra-document jumps).

  A shape may come from one of three sources:

  - `:annotation` — a real PDF annotation of subtype `/Link` that the
    document author placed on the page (PDF 1.7 § 12.5.6.5).
  - `:inferred` — a URL or email address that appears as plain text in
    the page content but is not wrapped in a clickable annotation. This
    is common in government forms (e.g. the SAT CSF prints
    `http://sat.gob.mx` as text without making it a link). We pattern-
    match URI and email tokens to surface these to callers.
  - `:embedded` — a non-text element drawn into the page content
    (currently raster images via `Do` operators on `/Subtype /Image`
    XObjects, PDF 1.7 § 8.9). The reader surfaces these so callers
    can know an image exists at a position even if they can't decode
    its contents (e.g. a QR code rendered as PNG).

  ## Fields

  - `:type` — one of `:uri | :email | :goto | :launch | :named | :image`
  - `:page` — 1-indexed page number where the shape lives
  - `:rect` — `{x1, y1, x2, y2}` user-space bounding box, or `nil` when
    the source is `:inferred` and the bounding box could not be derived
    from token positions
  - `:target` — for `:uri`/`:email`: the URI/address as a string. For
    `:goto`: a map `%{page: n}`. For `:image`: the indirect ref
    `{n, g}` of the underlying XObject. For `:launch`/`:named`: see
    PDF 1.7 § 12.6.4 — currently surfaced as a raw string when known.
  - `:text` — visible text of the shape (annotation `:contents`, or the
    matched token text for inferred shapes). `nil` for images.
  - `:source` — `:annotation`, `:inferred`, or `:embedded`
  - `:meta` — type-specific extras as a map. For `:image`:
    `%{format: :png_like | :jpeg, width: w, height: h, byte_size: n}`.
    Empty for link-like shapes today; future kinds (`:button`,
    `:form_field`) will populate it.

  ## Spec references

  - PDF 1.7 § 8.9         — Images (XObject /Subtype /Image):
    https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf
  - PDF 1.7 § 12.5.6.5    — Link Annotations
  - PDF 1.7 § 12.6.4      — Action types (URI, GoTo, Launch, Named, …)
  - RFC 3986 § 3          — URI Generic Syntax: https://datatracker.ietf.org/doc/html/rfc3986
  - RFC 5321 § 4.1.2      — SMTP Mailbox/Domain syntax (for `mailto:`):
    https://datatracker.ietf.org/doc/html/rfc5321
  """

  @type type :: :uri | :email | :goto | :launch | :named | :image
  @type source :: :annotation | :inferred | :embedded
  @type rect :: {number(), number(), number(), number()}
  @type target :: String.t() | %{page: pos_integer()} | {pos_integer(), non_neg_integer()} | nil

  @type t :: %__MODULE__{
          type: type(),
          page: pos_integer(),
          rect: rect() | nil,
          target: target(),
          text: String.t() | nil,
          source: source(),
          meta: map()
        }

  defstruct type: nil,
            page: 1,
            rect: nil,
            target: nil,
            text: nil,
            source: :inferred,
            meta: %{}
end