defmodule Pdf.Reader.Document do
@moduledoc """
Struct representing an open PDF document in the reader.
Holds the full PDF binary, the merged cross-reference table, the most-recent
trailer dictionary, a lazy object-resolution cache, and memoized page refs.
The struct is immutable — every `Pdf.Reader.*` function that resolves objects
returns an updated copy with a warmer cache. Dropping the updated copy is safe
(correctness is preserved) but re-resolving the same object will incur a
re-parse. Thread the returned doc forward for performance.
Callers do not construct `Document` directly; obtain one via `Pdf.Reader.open/1`.
## Cache key conventions
The `:cache` field is a plain `%{}` map. Keys used by reader subsystems:
- `{n, g}` — resolved object with xref ref `(n, g)` (used by `ObjectResolver`)
- `{:font_decoder, {n, g}}` — cached decoder closure for the font at ref `(n, g)`,
built by `Pdf.Reader.Font.build_decoder/2`. Present only for fonts accessed via
an indirect reference. Inline font dicts (embedded literally in a resources dict)
are NOT cached — they are rebuilt on every call.
- `{:page_resources, {n, g}}` — resolved `/Resources` map for the leaf page at
ref `(n, g)`. Written by `Pdf.Reader.resolve_page_resources/4` after the first
`/Parent`-chain walk for a given page. Subsequent calls for the same page ref
short-circuit the walk and return the cached map directly. Intermediate ancestor
nodes are NOT cached — only the leaf page ref is used as the key.
## Recovery mode
When opened with `recover: true`, the struct carries two additional fields:
- `:recover_mode` — `true` when recovery is active; `false` (default) for strict mode.
- `:recovery_log` — a reverse-prepend accumulator of structured recovery event tuples.
Exposed in chronological (oldest-first) order via `Pdf.Reader.recovery_log/1`.
Closed set of recovery event tuples (PDF 1.7 § 7.5, § 7.5.4, § 7.5.5, § 7.5.8):
| Tuple | Meaning |
|---|---|
| `{:xref_recovered, n}` | Linear scan recovered `n` object entries (§ 7.5.4, § 7.5.8) |
| `{:eof_marker_missing, :linear_scan_used}` | `%%EOF` absent; linear scan was invoked (§ 7.5.5) |
| `{:page_failed, page_n_or_ref, reason}` | A page was skipped. `page_n_or_ref` is either a `non_neg_integer()` page index OR a `{n, g}` ref-key tuple (used when iteration happens by `/Kids` ref before pages are indexed); `reason` is an atom or term |
| `{:font_skipped, page_n, font_name, reason}` | Font replaced with U+FFFD fallback |
| `{:page_tree_recovered, n}` | Catalog/Pages fallback found `n` page objects |
## Spec references
- PDF 1.7 § 7.5 — PDF file structure
- PDF 1.7 § 7.5.4 — Cross-reference table
- PDF 1.7 § 7.5.5 — File trailer
- PDF 1.7 § 7.5.8 — Cross-reference streams
"""
@type ref :: {pos_integer(), non_neg_integer()}
@type xref_entry ::
{:in_use, offset :: non_neg_integer(), gen :: non_neg_integer()}
| {:compressed, objstm_obj :: pos_integer(), index :: non_neg_integer()}
| :free
@type encryption_context :: %Pdf.Reader.Encryption.StandardHandler{}
@type recovery_event ::
{:eof_marker_missing, atom()}
| {:xref_recovered, non_neg_integer()}
| {:page_tree_recovered, non_neg_integer()}
| {:page_failed, non_neg_integer() | {pos_integer(), non_neg_integer()}, term()}
| {:font_skipped, non_neg_integer(), binary(), term()}
@type t :: %__MODULE__{
binary: binary(),
version: String.t(),
xref: %{ref() => xref_entry()},
trailer: map(),
cache: %{ref() => term()},
page_refs: [ref()] | nil,
encryption: encryption_context() | nil,
recover_mode: boolean(),
recovery_log: [recovery_event()]
}
defstruct binary: <<>>,
version: "1.0",
xref: %{},
trailer: %{},
cache: %{},
page_refs: nil,
encryption: nil,
recover_mode: false,
recovery_log: []
@doc """
Appends a recovery event to the document's internal log (reverse-prepend).
This is the single chokepoint for all recovery event recording.
Callers retrieve events in chronological order via `Pdf.Reader.recovery_log/1`,
which calls `Enum.reverse/1` on the internal accumulator.
## Spec reference
PDF 1.7 § 7.5 — PDF file structure (recovery model).
"""
@spec log_recovery(t(), recovery_event()) :: t()
def log_recovery(%__MODULE__{recovery_log: log} = doc, event) do
%{doc | recovery_log: [event | log]}
end
end