Skip to main content

livebooks/demo.livemd

# Image.OCR Demo

## Prerequisites

> #### Install Tesseract first
>
> `image_ocr` is a NIF binding — it cannot start without **Tesseract ≥ 5.0**
> and Leptonica installed on the host running this Livebook. The first
> `Mix.install/2` cell below will fail to compile the NIF if these are
> missing.
>
> | Platform               | One-liner                                                                                            |
> | ---------------------- | ---------------------------------------------------------------------------------------------------- |
> | macOS                  | `brew install tesseract leptonica pkg-config`                                                        |
> | Debian / Ubuntu 24.04+ | `sudo apt-get install -y build-essential pkg-config libtesseract-dev libleptonica-dev tesseract-ocr` |
> | Fedora / RHEL          | `sudo dnf install -y gcc-c++ pkgconf-pkg-config tesseract-devel leptonica-devel`                     |
> | Arch                   | `sudo pacman -S base-devel pkgconf tesseract leptonica`                                              |
> | Alpine                 | `apk add build-base pkgconf tesseract-ocr-dev leptonica-dev`                                         |
> | Windows                | Use WSL2 with Ubuntu 24.04 and follow the Debian/Ubuntu row.                                         |
>
> Verify with `tesseract --version` and `pkg-config --modversion tesseract`
> in your shell before continuing.

```elixir
Mix.install(
  [
    {:image_ocr, "~> 0.1"},
    {:image, "~> 0.66"},
    {:kino, "~> 0.14"},
    {:nx, "~> 0.10"},
    {:exla, "~> 0.10"}
  ]
)
```

## 1. What's installed

```elixir
%{
  tesseract_version: Image.OCR.tesseract_version(),
  trained_data_path: Image.OCR.Tessdata.datapath(),
  installed_languages: Image.OCR.Tessdata.installed_languages(),
  schedulers_online: System.schedulers_online()
}
```

## 2. Render a test image

`Image.OCR` accepts any `Vix.Vips.Image`. The `Image` library makes it
easy to render text on a white background that Tesseract can read.

```elixir
render_text = fn string ->
  Image.Text.text!(string,
    font_size: 48,
    text_fill_color: :black,
    background_fill_color: :white,
    padding: 40
  )
end

sample = render_text.("The quick brown fox\njumps over the lazy dog.")
Image.to_kino(sample)
```

## 3. One-shot OCR

The simplest possible call — no instance management:

```elixir
{:ok, text} = Image.OCR.quick_read(sample)
text
```

## 4. Reusable instance

For repeated calls, build an instance once and reuse it. The `:locale`
option accepts ISO 639-1 codes (`:en`, `"fr"`), BCP-47 region/script tags
(`"zh-Hans"`, `"sr-Latn"`), or Tesseract codes verbatim (`"frk"`,
`"osd"`). Add the optional `:localize` dependency for full BCP-47 support
(`"en-US"`, `"zh-Hans-CN"`, etc.).

```elixir
{:ok, ocr} = Image.OCR.new(locale: :en, psm: :auto)
ocr
```

```elixir
phrases = [
  "Hello, Tesseract!",
  "Elixir is fun.",
  "Image processing with vips.",
  "Optical character recognition."
]

phrases
|> Enum.map(&{&1, render_text.(&1)})
|> Enum.map(fn {expected, image} ->
  {:ok, recognised} = Image.OCR.read_text(ocr, image)
  %{expected: expected, recognised: String.trim(recognised)}
end)
|> Kino.DataTable.new()
```

## 5. Per-word results with bounding boxes

`Image.OCR.recognize/3` returns each word together with a confidence (0-100)
and bounding box (`{x1, y1, x2, y2}` in image coordinates).

```elixir
sample2 = render_text.("Words have boxes around them.")
{:ok, words} = Image.OCR.recognize(ocr, sample2)

table =
  Enum.map(words, fn %{text: text, confidence: conf, bbox: {x1, y1, x2, y2}} ->
    %{
      text: text,
      confidence: Float.round(conf, 1),
      x1: x1,
      y1: y1,
      x2: x2,
      y2: y2,
      width: x2 - x1,
      height: y2 - y1
    }
  end)

Kino.Layout.grid([Image.to_kino(sample2), Kino.DataTable.new(table)], columns: 1)
```

## 6. Concurrency with `Image.OCR.Pool`

A single Tesseract instance is single-threaded. For real parallelism, use
the supplied `NimblePool`-backed pool — one OCR instance per worker.

```elixir
pool_name = :demo_pool
pool_size = min(4, System.schedulers_online())

# Stop any previous pool we started in this notebook so re-running the
# cell is idempotent.
case Process.whereis(pool_name) do
  nil -> :ok
  pid -> GenServer.stop(pid)
end

{:ok, _} = Image.OCR.Pool.start_link(name: pool_name, locale: :en, pool_size: pool_size)
:ok
```

```elixir
images = Enum.map(phrases, render_text)

time = fn fun ->
  {micros, result} = :timer.tc(fun)
  {Float.round(micros / 1_000, 1), result}
end

{sequential_ms, sequential_results} =
  time.(fn ->
    Enum.map(images, &Image.OCR.read_text(ocr, &1))
  end)

{parallel_ms, parallel_results} =
  time.(fn ->
    images
    |> Task.async_stream(&Image.OCR.Pool.read_text(pool_name, &1),
      max_concurrency: pool_size,
      timeout: 30_000
    )
    |> Enum.map(fn {:ok, r} -> r end)
  end)

%{
  pool_size: pool_size,
  sequential_ms: sequential_ms,
  parallel_ms: parallel_ms,
  speedup: Float.round(sequential_ms / parallel_ms, 2),
  results_match: sequential_results == parallel_results
}
```

## 7. OCR your own image

Drop in a PNG, JPEG, or TIFF — the input pipeline accepts file paths,
in-memory binaries, and live `Vix.Vips.Image` values transparently.

```elixir
upload = Kino.Input.image("Image to OCR")
```

````elixir
case Kino.Input.read(upload) do
  nil ->
    Kino.Markdown.new("_Upload an image above and re-run this cell._")

  kino ->
    {:ok, image} = Image.from_kino(kino)
    {:ok, recognised} = Image.OCR.read_text(ocr, image)

    Kino.Layout.grid(
      [
        Image.to_kino(image),
        Kino.Markdown.new("**Recognised text:**"),
        Kino.Markdown.new("```\n" <> recognised <> "\n```")
      ],
      columns: 1
    )
end
````

## 8. Tweaking accuracy with PSM and SetVariable

Tesseract has 14 page-segmentation modes and exposes ~700 internal
variables. Both can be set on `Image.OCR.new/1`.

```elixir
digits = render_text.("4815162342")

{:ok, default_ocr} = Image.OCR.new()

{:ok, digits_only} =
  Image.OCR.new(
    psm: :single_line,
    variables: [tessedit_char_whitelist: "0123456789"]
  )

%{
  default: Image.OCR.read_text(default_ocr, digits) |> elem(1),
  with_whitelist: Image.OCR.read_text(digits_only, digits) |> elem(1)
}
```

## 9. Adding more languages

To OCR text in other languages, install the relevant trained-data files
from your terminal:

```text
mix image.ocr.tessdata.add fr de        # French + German, "fast" variant
mix image.ocr.tessdata.add en --variant best   # high-accuracy English
mix image.ocr.tessdata.add zh-Hans              # Simplified Chinese
mix image.ocr.tessdata.list                     # show what's installed
mix image.ocr.tessdata.update                   # refresh to latest upstream
```

The destination directory is resolved by `Image.OCR.Tessdata.datapath/0`
(option → `:image_ocr, :tessdata_path` config → `TESSDATA_PREFIX` env →
vendored `priv/tessdata/`).