guides/03_mobile_deployment.md

Select File
guides/03_mobile_deployment.md

# Mobile Deployment with ExBurn

## Overview

ExBurn compiles trained models for mobile deployment via Burn's CubeCL backend.
The pipeline optimizes models for the target GPU backend:

- **iOS**: Metal via CubeCL
- **Android**: Vulkan via CubeCL

ExBurn is designed as a library — it provides the Nx backend and GPU
acceleration layer that other frameworks can build on top of.

## Compiling a Model

```elixir
# Define a model with Axon
model =
  Axon.input("input", shape: {nil, 784})
  |> Axon.dense(128, activation: :relu)
  |> Axon.dropout(rate: 0.2)
  |> Axon.dense(10)

# Compile for training/inference
compiled = ExBurn.Model.compile(model,
  loss: :cross_entropy,
  optimizer: :adam,
  learning_rate: 0.001
)

# Run inference
{:ok, output} = ExBurn.Model.predict(compiled, input_tensor)

# Save for deployment
ExBurn.Model.save(compiled, "model.bin")

# Load
{:ok, loaded} = ExBurn.Model.load(compiled, "model.bin")
```

## Using ExCubecl for GPU Inference

ExBurn integrates with ExCubecl for GPU buffer management and kernel execution:

```elixir
# Create GPU buffers via ExCubecl
{:ok, input_buf} = ExCubecl.buffer([1.0, 2.0, 3.0], [3], :f32)
{:ok, output_buf} = ExCubecl.buffer([0.0, 0.0, 0.0], [3], :f32)

# Run a kernel
ExCubecl.run_kernel("elementwise_add", [input_buf, input_buf], output_buf)

# Read results back
{:ok, data} = ExCubecl.read(output_buf)
```

## Using ExBurn.Serving for Batched Inference

For production inference with concurrent batching:

```elixir
# Build a serving from a compiled model
serving = ExBurn.Serving.build(compiled,
  batch_size: 32,
  batch_timeout: 50
)

# Run batched inference
output = Nx.Serving.run(serving, input_tensor)
```

## Model Optimization Tips

1. **Use f16 quantization**: Halves memory usage with minimal accuracy loss
2. **Reduce model size**: Target < 10MB for mobile apps
3. **Batch inference**: Process multiple inputs together for better throughput
4. **Use ExCubecl pipelines**: Chain multiple GPU kernels without CPU round-trips
5. **Profile on device**: Benchmark on the target hardware before deploying

## Supported Operations

| Operation | iOS (Metal) | Android (Vulkan) |
|-----------|-------------|------------------|
| Dense     | ✅          | ✅               |
| Conv2D    | ✅          | ✅               |
| ReLU      | ✅          | ✅               |
| Sigmoid   | ✅          | ✅               |
| Softmax   | ✅          | ✅               |
| Dropout   | ✅          | ✅               |
| LayerNorm | ✅          | ✅               |