README.md

# URP

Pure Elixir client for the [UNO Remote Protocol](https://wiki.openoffice.org/wiki/Uno/Binary/Spec/Protocol).
Converts documents by talking directly to an off-the-shelf `soffice`
container over a TCP socket — no custom images, wrappers, or sidecars needed.

## Why?

LibreOffice is a versatile open-source tool for converting between office
document formats, but integrating it into a web app typically requires
intermediate layers:

- **[unoserver](https://github.com/unoconv/unoserver)** — Python daemon that wraps soffice and exposes an HTTP API
- **[Gotenberg](https://gotenberg.dev/)** — Go service that wraps unoserver (which wraps soffice)
- **Python UNO bindings** (`uno`, `unoconv`) — require Python and LibreOffice's UNO runtime installed together

Each layer adds deployment complexity, resource overhead, and failure modes.

URP skips all of that. It speaks the binary UNO Remote Protocol directly over
TCP to a stock `soffice` process — the same protocol LibreOffice uses
internally. No Python runtime, no wrapper services, no custom Docker images.

## Installation

Add `urp` to your dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:urp, "~> 0.5"}
  ]
end
```

## Prerequisites

A running `soffice` process with a URP socket listener:

```sh
soffice --headless --norestore \
  --accept="socket,host=0.0.0.0,port=2002,tcpNoDelay=1;urp;"
```

Or via Docker:

```sh
docker run \
  --detach \
  --name soffice \
  --publish 2002:2002 \
  libreofficedocker/alpine:3.23 \
  soffice --headless --norestore \
    --accept="socket,host=0.0.0.0,port=2002,tcpNoDelay=1;urp;"
```

## Usage

A default connection pool starts automatically, connecting to `localhost:2002`.
No supervision tree setup needed.

```elixir
# File path — convert to PDF
{:ok, pdf_path} = URP.convert("/path/to/input.docx", filter: "writer_pdf_Export")

# Explicit output path
{:ok, "/tmp/out.pdf"} = URP.convert("/path/to/input.docx", filter: "writer_pdf_Export", output: "/tmp/out.pdf")

# Return bytes in memory
{:ok, pdf_bytes} = URP.convert("/path/to/input.docx", filter: "writer_pdf_Export", output: :binary)

# Raw bytes input
{:ok, pdf_bytes} = URP.convert({:binary, docx_bytes}, filter: "calc_pdf_Export", output: :binary)

# Enumerable input (e.g. File.stream!, S3 download stream)
{:ok, pdf_path} = URP.convert(File.stream!("huge.docx", 65_536), filter: "writer_pdf_Export")

# Convert to Markdown
{:ok, md_path} = URP.convert("/path/to/input.docx", filter: "Markdown")
```

Configure the default pool in `config/runtime.exs`:

```elixir
config :urp, :default,
  host: "soffice",
  port: 2002,
  pool_size: 1
```

### Output modes

The `:output` option controls where converted bytes go:

```elixir
# Default — write to temp file, return path
{:ok, tmp_path} = URP.convert(input, filter: "writer_pdf_Export")

# Write to specific path
{:ok, path} = URP.convert(input, filter: "writer_pdf_Export", output: "/tmp/output.pdf")

# Return bytes in memory
{:ok, bytes} = URP.convert(input, filter: "writer_pdf_Export", output: :binary)

# Stream chunks to a callback
:ok = URP.convert(input, filter: "writer_pdf_Export", output: fn chunk -> send_chunk(chunk) end)
```

### Named pools

For multiple soffice instances, configure named pools:

```elixir
config :urp, :pools,
  spreadsheets: [host: "soffice-2", port: 2002, pool_size: 3]
```

Named pools are started on first use:

```elixir
{:ok, pdf} = URP.convert({:binary, xlsx_bytes}, pool: :spreadsheets, filter: "calc_pdf_Export")
```

## Testing

Stub conversions in tests — no running soffice needed:

```elixir
test "generates invoice PDF" do
  URP.Test.stub(fn _input, _opts ->
    {:ok, "/tmp/fake.pdf"}
  end)

  assert {:ok, _pdf} = MyApp.generate_invoice(order)
end
```

Stubs are per-process and propagate through `$callers` (Tasks, GenServers).
See `URP.Test` for details.

Integration tests require soffice on `localhost:2002`:

```sh
mix test
```

## Scope

URP implements the subset of the UNO API needed for document conversion:

| UNO interface | Methods used | Purpose |
|---|---|---|
| [`XComponentLoader`](https://api.libreoffice.org/docs/idl/ref/interfacecom_1_1sun_1_1star_1_1frame_1_1XComponentLoader.html) | `loadComponentFromURL` | Open documents (from file URL or `private:stream`) |
| [`XStorable2`](https://api.libreoffice.org/docs/idl/ref/interfacecom_1_1sun_1_1star_1_1frame_1_1XStorable2.html) | `storeToURL` | Export documents (to file URL or `private:stream`) |
| [`XCloseable`](https://api.libreoffice.org/docs/idl/ref/interfacecom_1_1sun_1_1star_1_1util_1_1XCloseable.html) | `close` | Release document resources |
| [`XInputStream`](https://api.libreoffice.org/docs/idl/ref/interfacecom_1_1sun_1_1star_1_1io_1_1XInputStream.html) | `readBytes`, `readSomeBytes`, `skipBytes`, `available`, `closeInput` | Feed document bytes to soffice |
| [`XOutputStream`](https://api.libreoffice.org/docs/idl/ref/interfacecom_1_1sun_1_1star_1_1io_1_1XOutputStream.html) | `writeBytes`, `flush`, `closeOutput` | Receive converted output from soffice |

The output format is controlled by [export filter names](https://help.libreoffice.org/latest/en-US/text/shared/guide/convertfilters.html)
passed via [`MediaDescriptor`](https://api.libreoffice.org/docs/idl/ref/servicecom_1_1sun_1_1star_1_1document_1_1MediaDescriptor.html).
Common filters: `writer_pdf_Export`, `calc_pdf_Export`, `impress_pdf_Export`.

Other UNO APIs (editing, formatting, macros, etc.) are not implemented.

## Architecture

| Module | Role |
|---|---|
| `URP` | Public API — converts via pool with test stub support |
| `URP.Pool` | NimblePool — connection pooling with DisposedException retry |
| `URP.Test` | Test helpers — per-process stubs via NimbleOwnership |
| `URP.Bridge` | Mid-level — UNO operations (handshake, load, store, close, streaming) |
| `URP.Stream` | Bidirectional URP dispatch for XInputStream/XOutputStream |
| `URP.Protocol` | Low-level — binary wire format (framing, encoding, reply parsing) |

## Design: soffice as a network service

This library treats soffice as an external network service — your Elixir app
connects to it over TCP. soffice must be deployed and scaled separately
(e.g. as a sidecar container, a separate Kubernetes deployment, or a standalone
server).

An alternative approach would be to bundle soffice into the same image as
the Elixir app and manage it via Erlang Ports:

| | Network service (this library) | Embedded via Port |
|---|---|---|
| **Scaling** | Scale soffice independently | Tied to app instances |
| **Isolation** | soffice crash doesn't affect the BEAM | Port crash is contained but messier |
| **Deployment** | Separate image, simpler app image | Single image, larger and more complex |
| **Latency** | TCP overhead (negligible on local network) | No network hop |
| **Multiple instances** | Deploy N soffice containers | Spawn N Ports per app node |
| **Complexity** | Needs orchestration (Docker/K8s) | Needs Port supervision, lifecycle management |

The network approach is simpler to implement and fits well with containerized
deployments where soffice already runs as a separate service.

### Kubernetes: scaling note

The soffice Docker image is ~1.7GB, but image layers are shared across all
containers on the same node — running 10 pods doesn't use 10× the disk.
Each soffice process uses ~50-150MB RSS (spiking during conversion). The
per-container namespace overhead (~10-20MB) is negligible in comparison.

Multiple pods give you health checks, restart policies, and per-instance
resource limits for free. The equivalent with embedded Ports means managing
all of that in application code.

Tested with [`libreofficedocker/alpine:3.23`](https://hub.docker.com/r/libreofficedocker/alpine).

## References

- [UNO Binary Protocol Spec](https://wiki.openoffice.org/wiki/Uno/Binary/Spec/Protocol)
- [binaryurp source](https://git.libreoffice.org/core/+/refs/heads/master/binaryurp/) — reader.cxx, writer.cxx, marshal.cxx
- [specialfunctionids.hxx](https://git.libreoffice.org/core/+/refs/heads/master/binaryurp/source/specialfunctionids.hxx)
- [typeclass.h](https://git.libreoffice.org/core/+/refs/heads/master/include/typelib/typeclass.h)
- [XInputStream](https://api.libreoffice.org/docs/idl/ref/interfacecom_1_1sun_1_1star_1_1io_1_1XInputStream.html), [XOutputStream](https://api.libreoffice.org/docs/idl/ref/interfacecom_1_1sun_1_1star_1_1io_1_1XOutputStream.html)
- [MediaDescriptor](https://api.libreoffice.org/docs/idl/ref/servicecom_1_1sun_1_1star_1_1document_1_1MediaDescriptor.html)

## License

MIT — see [LICENSE](LICENSE).

This is an independent implementation based on the public UNO protocol spec.
LibreOffice source was consulted as documentation for protocol details not
covered by the spec. No code was copied.