Skip to main content

CHANGELOG.md

# Changelog

All notable changes to this project are documented here. The format is based on
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and the project aims
to follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.0] - 2026-06-15

First release: a pure-Elixir PDF parsing and lossless surgery engine.

### Parsing & extraction

- Lazy dual-AST parser: classic xref tables, PDF 1.5+ xref streams, object
  streams, Flate + PNG-predictor decoding.
- `PdfEx.open/1`, `page_count/1`, `pages/1`, `extract_text/1,2`.
- Text extraction with positions, fonts, real `/Widths` metrics, and
  ToUnicode/encoding decoding.

### Editing

- Structural page ops (`PdfEx.Editor`): insert / delete (lossless free) /
  reorder, with inherited-attribute materialization on reorder-flatten.
- Run-level text editing (`PdfEx.ContentEdit`): `replace_text/3`,
  `delete_glyph/2`, `run_text/2` — token-span patches with width compensation;
  single-byte fonts and Type0 / Identity-H composite fonts.
- Stable per-glyph UIDs and visual position mutation
  (`PdfEx.Convert.apply_visual_mutation/3`).

### Projection

- Visual and semantic HTML (`PdfEx.Convert.to_html/2`) with `data-uid`
  back-references; reverse mapping of edited semantic blocks into per-run text
  ops (`semantic_ops/3`, `apply_semantic_mutation/3`).

### Collaboration

- Supervised per-document editing sessions (`PdfEx.Session`) with a
  crash-surviving snapshot cache, plain-struct operations (`PdfEx.Op`), and
  operational transformation (`PdfEx.OT`) for intention-preserving concurrent
  edits.

### Serialization

- Incremental-first serializer (`PdfEx.Serializer`): byte-exact round-trip on
  unmodified documents, xref style matched to the source; opt-in full
  re-serialization (`mode: :full`, a single clean revision, not byte-lossless).

### Fonts

- TrueType glyph-retaining subset surgery (`PdfEx.Font.Surgery`) with
  composite-glyph closure and recomputed checksums.

### Robustness

- Hardened against hostile input: atom-table exhaustion, nesting-depth bombs,
  circular xref/`/Length` chains, unbounded xref-stream ranges, malformed
  positioning operands, CR/LF escaping in re-serialized strings, spec-legal
  real number forms, huge-float serialization, and refc binary pinning.
- Real-PDF corpus harness and a deterministic fuzz suite.