Skip to main content

README.md

# LiteParse

Elixir wrapper for [LiteParse](https://github.com/run-llama/liteparse), a fast and lightweight PDF parser written in Rust. Parsing runs locally with no cloud dependencies.

Note: this Elixir binding exposes a subset of the upstream LiteParse features and may not yet cover all of them. Check the [upstream project](https://github.com/run-llama/liteparse) for the complete capability set.

## Installation

Add to your `mix.exs`:

```elixir
def deps do
  [
    {:liteparse, "~> 0.1.0"}
  ]
end
```

## Usage

Parse a PDF from disk:

```elixir
{:ok, %{text: text, page_count: n}} = LiteParse.parse("document.pdf")
```

Parse a PDF from binary data:

```elixir
{:ok, %{text: text, page_count: n}} = LiteParse.parse_input(pdf_binary)
```

Options can be passed as a keyword list:

```elixir
LiteParse.parse("doc.pdf", max_pages: 100, ocr_enabled: false)
```

Or as a reusable struct:

```elixir
config = LiteParse.Config.new(ocr_language: "spa", max_pages: 50)
LiteParse.parse("doc.pdf", config)
```

See `LiteParse.Config` for the full list of available options.

## Supported Formats

- PDF (`.pdf`)
- Microsoft Office (`.docx`, `.xlsx`, `.pptx`, etc.) — requires LibreOffice
- OpenDocument (`.odt`, `.ods`, `.odp`) — requires LibreOffice
- Images (`.png`, `.jpg`, `.tiff`, etc.) — requires ImageMagick

## License

MIT. See [LICENSE](https://github.com/luimedi/liteparse/blob/main/LICENSE).