defmodule NxTfliteMob do
@moduledoc """
Call TensorFlow Lite models from Elixir / BEAM, with full vendor
accelerator access on phones — Apple Neural Engine on iOS, MediaTek
/ Qualcomm GPU+NPU HALs on Android.
## This is NOT an `Nx.Backend`
`NxTfliteMob` does not replace `Nx.BinaryBackend`, `EMLX.Backend`,
`NxVulkan.Backend`, etc. There is no `NxTfliteMob.Backend` module
to set via `Nx.global_default_backend/1`.
TFLite executes pre-compiled model graphs (`.tflite` files)
end-to-end through vendor-optimised delegates. The whole graph
stays opaque so the delegate can fuse + schedule it for ANE / GPU
/ NPU. You don't compose your own ops here — you call a pre-trained
model.
Use `NxTfliteMob` when you have a pre-trained model to run. Use
Nx backends when you're writing arbitrary tensor math in Elixir.
Both can coexist in the same app.
## API surface
Three functions: `load_module/2`, `call/2`, `release_module/1`.
iex> tflite = File.read!("priv/yolov8n_float16.tflite")
iex> {:ok, handle} = NxTfliteMob.load_module(tflite,
...> delegate: "coreml", coreml_ane_only: false)
iex> {:ok, [output_bytes]} = NxTfliteMob.call(handle, [input_bytes])
iex> NxTfliteMob.release_module(handle)
:ok
See [the YOLO walkthrough](yolo_walkthrough.html) for a complete
end-to-end example with input prep, inference, and output decode.
## Delegate options
The `delegate` opt selects how the model graph runs. Per-platform
recommendations (see also [the delegates guide](delegates.html)):
### Android — `delegate: "nnapi"`
NNAPI is Android's neural-net dispatch API. It picks a vendor HAL
driver based on the `accelerator` name:
| `accelerator:` value | What it routes to |
|---|---|
| `"mtk-gpu_shim"` | MediaTek's GPU HAL — fastest for YOLO on Dimensity chips |
| `"mtk-neuron_shim"` | MediaTek's APU/NPU — only worthwhile if your graph is pure conv (no concat/reshape post-processing — TFLite falls back to CPU for those, transfer overhead dominates) |
| `"qti-gpu"` | Qualcomm Snapdragon GPU |
| `"google-edgetpu"` | Pixel TPU |
| `nil` (no key) | NNAPI auto-picks — often the WRONG choice for YOLO (defaults to NPU on MediaTek, which is 5× slower) |
Discover available accelerators on a connected device with
`adb shell` + the standalone `bench` CLI's `list-nnapi` mode (see
the package's `scripts/bench_android/`).
Other Android opts:
* `num_threads:` — XNNPACK CPU thread count (default 6)
* `allow_fp16:` — let NNAPI run FP32 ops in FP16 (default `true`)
### iOS — `delegate: "coreml"`
Core ML routes the delegated portion through Apple's Core ML
framework, which internally schedules to the Apple Neural Engine
when ops are supported. For YOLOv8n FP16, ~56% of nodes delegate
to the ANE on an iPhone SE 3rd gen A15 (the rest fall to CPU via
XNNPACK), hitting **24 ms** per inference.
Caveats:
* **INT8 + Core ML doesn't work.** Core ML's tooling doesn't
understand the Ultralytics INT8 quant flavour — 0/256 nodes
delegate. Use the FP16 model variant for Core ML.
* `coreml_ane_only:` (default `false`) — when `true`, the
delegate returns `nil` instead of falling back to CPU on
devices without an ANE. Useful for "ANE-only or skip" logic;
irrelevant on A11+ devices where the ANE is always present.
### iOS — `delegate: "metal"` (planned)
TFLite ships `TensorFlowLiteCMetal.xcframework` for Metal GPU
inference but the current NIF doesn't expose it as a `delegate:`
option yet. PR welcome. Core ML is usually faster anyway on Apple
Silicon devices (Core ML can pick GPU when ANE ops are unsupported).
### XNNPACK CPU — `delegate: "xnnpack"` (default)
Bundled into TFLite. Highly-optimised CPU+SIMD path. Default when
no other delegate is set. Surprisingly competitive on modern phones
— ~77 ms on the Moto G Power 5G (tied with the GPU path) and 27-37
ms on iPhone SE 3rd gen A15. Use this when:
* You're on a device without GPU/NPU acceleration
* The vendor delegate fails to delegate (e.g. INT8 + Core ML)
* You want deterministic, reproducible numbers (CPU paths don't
thermal-throttle as aggressively as GPUs)
## Input + output byte layout
`call/2` is raw-bytes-in, raw-bytes-out. **The byte layout is
model-specific** — you have to match what the `.tflite` model
expects.
Inspect a model's expected shape/dtype via `mix` Python helpers or
TFLite's `flatc` tool. Or `:erlang.load_nif/2` an inspector NIF
built against TFLite's `TfLiteInterpreterGetInputTensor` —
exposing this in the Elixir API is on the roadmap.
Common shapes:
| Model | Input | Output |
|---|---|---|
| YOLOv8n INT8 (Ultralytics full_integer_quant) | 1×640×640×3 INT8 NHWC (`1228800` bytes) | 1×84×8400 INT8 (`705600` bytes) |
| YOLOv8n FP16 (Ultralytics float16) | 1×640×640×3 FP32 NHWC (`4915200` bytes — the FP16 model accepts FP32 input that's cast internally) | 1×84×8400 FP32 normalised (`2822400` bytes) |
| YOLOv8n FP32 | 1×640×640×3 FP32 NHWC | 1×84×8400 FP32 |
| MobileNetV2 (ImageNet) | 1×224×224×3 FP32 NHWC | 1×1001 FP32 (class logits) |
See the YOLO walkthrough for the layout-aware decoder we use in
production (pure-BEAM, 13 ms for the full INT8 NMS pass).
## Where Nx fits in (optionally)
You CAN use Nx tensors on either side of `call/2`. It's optional —
bytes-in/bytes-out is the canonical interface.
Input prep with Nx:
input_bytes =
camera_frame_f32_binary
|> Nx.from_binary(:f32)
|> Nx.reshape({1, 640, 640, 3})
|> Nx.as_type(:s8) # quantize for INT8 model
|> Nx.to_binary()
{:ok, [out]} = NxTfliteMob.call(handle, [input_bytes])
Output decode with Nx:
detections =
out
|> Nx.from_binary(:s8)
|> Nx.reshape({1, 84, 8400})
|> Nx.as_type(:f32)
|> Nx.multiply(scale)
|> Nx.subtract(zero_point)
|> extract_detections()
In practice we bypass Nx for performance-critical decoding —
`Nx.BinaryBackend` for an argmax across `{80, 8400}` is 1700 ms;
a pure-BEAM `:binary.at/2` loop is 13 ms (130× faster). See
`NxeigenProbe.LiveYoloScreen` for the pure-BEAM decoder pattern.
## Using with Mob
If you're building a [Mob](https://github.com/GenericJam/mob) app,
the easiest path is mob_dev's Igniter task:
mix mob.enable tflite
This adds the dep + generates a per-platform default-opts helper
and registers the NIF in mob_dev's static-NIF table. Requires
`mob_dev >= 0.5.9`. See mob_dev's
[`mob.enable` docs](https://hexdocs.pm/mob_dev/Mix.Tasks.Mob.Enable.html)
for details.
After `mix mob.enable tflite`, the auto-generated helper picks
delegate opts per platform:
{:ok, h} = NxTfliteMob.load_module(model_bytes,
MyApp.TfliteInit.default_opts())
## Building from source (non-Mob)
See the package's `Makefile` — targets `android`, `ios_device`,
`ios_sim`, `mac`. Each requires platform-appropriate TFLite
distribution (cached at `~/.mob/cache/` by mob_dev's downloader, or
per-target overrides for standalone builds).
Mac builds require building `libtensorflowlite_c.dylib` from TF
source first — TFLite has no Mac arm64 prebuilt. See
`docs/build_mac_tflite.md` in the repo.
"""
alias NxTfliteMob.NIF
@typedoc """
Opaque handle to a loaded TFLite model. Pass to `call/2` and free
with `release_module/1`. Closed handles also get freed when garbage
collected, but explicit release is recommended for short-lived
inferences.
"""
@type module_handle :: reference()
@doc """
Load a TFLite model from raw `.tflite` FlatBuffer bytes.
Returns `{:ok, handle}` on success or `{:error, message}` if the
bytes aren't a valid TFLite model or delegate creation fails.
## Options
All options are documented in detail in the moduledoc:
* `:delegate` (string) — `"xnnpack"` (default), `"nnapi"` (Android),
`"coreml"` (iOS)
* `:accelerator` (string) — vendor accelerator name for NNAPI
(e.g. `"mtk-gpu_shim"`)
* `:num_threads` (integer) — XNNPACK CPU thread count (default 6)
* `:allow_fp16` (boolean) — NNAPI FP32→FP16 promotion (default
`true`)
* `:coreml_ane_only` (boolean) — Core ML requires ANE (default
`false` — falls back to CPU/GPU)
## Examples
# XNNPACK CPU (cross-platform default)
{:ok, h} = NxTfliteMob.load_module(tflite_bytes, [])
# Android NNAPI → MediaTek GPU HAL
{:ok, h} = NxTfliteMob.load_module(tflite_bytes,
delegate: "nnapi",
accelerator: "mtk-gpu_shim",
allow_fp16: true)
# iOS Core ML → ANE
{:ok, h} = NxTfliteMob.load_module(tflite_bytes,
delegate: "coreml",
coreml_ane_only: false)
"""
@spec load_module(binary(), keyword()) ::
{:ok, module_handle()} | {:error, String.t() | charlist()}
def load_module(model_bytes, opts \\ []) when is_binary(model_bytes) do
NIF.load_module(model_bytes, normalize(opts))
end
@doc """
Run inference on a loaded model.
`inputs` is a list of binaries — one per input tensor in the model's
declared input order. Each binary must match the model's expected
shape × dtype byte layout exactly (1×640×640×3 INT8 = 1228800 bytes
for YOLOv8n full_integer_quant, for example).
Returns `{:ok, outputs}` where `outputs` is a list of binaries — one
per output tensor, also in declared order. Decode each according to
the model's documented output layout.
## Examples
# YOLOv8n INT8 — 1×640×640×3 INT8 input, 1×84×8400 INT8 output
input = <<…1228800 INT8 bytes…>>
{:ok, [output]} = NxTfliteMob.call(handle, [input])
true = byte_size(output) == 705600
## Errors
Returns `{:error, message}` for:
* Input list length doesn't match the model's input-tensor count
* Any input binary's size doesn't match the model's expected size
* The model's `TfLiteInterpreterInvoke` returns non-OK status
"""
@spec call(module_handle(), [binary()]) ::
{:ok, [binary()]} | {:error, String.t() | charlist()}
def call(handle, inputs) when is_reference(handle) and is_list(inputs),
do: NIF.call(handle, inputs)
@doc """
Free the model + delegate + interpreter held by `handle`.
Idempotent — calling on an already-released handle returns `:ok`
(the underlying resource is zero'd and re-releasing is a no-op).
Resources are also freed on GC if `release_module/1` isn't called,
but explicit release is recommended for tight loops or short-lived
inferences to keep memory predictable.
"""
@spec release_module(module_handle()) :: :ok
def release_module(handle) when is_reference(handle), do: NIF.release_module(handle)
# Coerce opt values to types the NIF's proplist parser understands
# (strings, ints, atoms). Bools and atoms become strings.
defp normalize(opts) do
Enum.map(opts, fn
{k, v} when is_boolean(v) -> {k, to_string(v)}
{k, v} when is_atom(v) -> {k, to_string(v)}
{k, v} -> {k, v}
end)
end
end
defmodule NxTfliteMob.NIF do
@moduledoc false
@on_load :load_nifs
def load_nifs do
path =
try do
case :code.priv_dir(:nx_tflite_mob) do
{:error, _} -> ~c"libtflite_nif"
dir when is_list(dir) -> :filename.join(dir, ~c"native/libtflite_nif")
end
rescue
_ -> ~c"libtflite_nif"
end
:erlang.load_nif(path, 0)
end
def load_module(_bytes, _opts), do: :erlang.nif_error(:nif_not_loaded)
def call(_h, _inputs), do: :erlang.nif_error(:nif_not_loaded)
def release_module(_h), do: :erlang.nif_error(:nif_not_loaded)
end