# Deep Learning with ExBurn: A Step-by-Step Guide
## Table of Contents
1. [What You'll Learn](#what-youll-learn)
2. [Prerequisites](#prerequisites)
3. [Lesson 1: Tensors — The Building Blocks](#lesson-1-tensors--the-building-blocks)
4. [Lesson 2: Your First Neural Network](#lesson-2-your-first-neural-network)
5. [Lesson 3: Training a Classifier](#lesson-3-training-a-classifier)
6. [Lesson 4: Understanding Loss Functions](#lesson-4-understanding-loss-functions)
7. [Lesson 5: Optimizers and Learning Rates](#lesson-5-optimizers-and-learning-rates)
8. [Lesson 6: Overfitting and Regularization](#lesson-6-overfitting-and-regularization)
9. [Lesson 7: Working with Real Data](#lesson-7-working-with-real-data)
10. [Lesson 8: Inference and Deployment](#lesson-8-inference-and-deployment)
11. [Lesson 9: GPU-Accelerated Numerical Functions](#lesson-9-gpu-accelerated-numerical-functions)
12. [Lesson 10: Putting It All Together](#lesson-10-putting-it-all-together)
---
## What You'll Learn
This guide teaches deep learning fundamentals through hands-on ExBurn examples. By the end, you'll be able to:
- Create and manipulate tensors (the core data structure of deep learning)
- Build neural network architectures using Axon
- Train models with different optimizers and learning rate strategies
- Prevent overfitting with regularization techniques
- Preprocess real-world data
- Run inference and deploy models
- Write GPU-accelerated numerical functions with `defn`
Each lesson builds on the previous one. Code examples are complete and runnable.
---
## Prerequisites
- Elixir ~> 1.18 and OTP 27+
- Rust stable (for NIF compilation)
- Basic Elixir knowledge (modules, functions, pipes)
- No prior deep learning experience required
Add to your `mix.exs`:
```elixir
def deps do
[
{:ex_burn, "~> 0.3"},
{:nx, ">= 0.12.0"},
{:axon, "~> 0.8"},
{:ex_cubecl, ">= 0.5.0"}
]
end
```
```bash
mix deps.get
mix compile
```
Check that your GPU is available:
```elixir
ExBurn.default_device() # :gpu or :cpu
ExBurn.device_name() # e.g. "CUDA (NVIDIA GPU)" or "Metal (Apple GPU)"
ExBurn.summary() # full environment summary
```
---
## Lesson 1: Tensors — The Building Blocks
### What is a Tensor?
A tensor is a multi-dimensional array of numbers. Deep learning is essentially tensor math:
| Tensor rank | Example | Shape |
|---|---|---|
| 0 (scalar) | `5.0` | `{}` |
| 1 (vector) | `[1.0, 2.0, 3.0]` | `{3}` |
| 2 (matrix) | `[[1, 2], [3, 4]]` | `{2, 2}` |
| 3 (image) | batch of 8 RGB 32x32 images | `{8, 3, 32, 32}` |
### Creating Tensors
```elixir
import Nx
# From a list
t = Nx.tensor([1.0, 2.0, 3.0])
# 2D tensor (matrix)
m = Nx.tensor([[1.0, 2.0], [3.0, 4.0]])
# With explicit type
t_f64 = Nx.tensor([1.0, 2.0], type: {:f, 64})
t_i32 = Nx.tensor([1, 2, 3], type: {:s, 32})
# Useful constructors
zeros = Nx.broadcast(0.0, {3, 4}) # 3x4 matrix of zeros
ones = Nx.broadcast(1.0, {3, 4}) # 3x4 matrix of ones
iota = Nx.iota({5}) # [0, 1, 2, 3, 4]
eye = Nx.eye(3) # 3x3 identity matrix
```
### Inspecting Tensors
```elixir
Nx.shape(t) # {3} — the shape
Nx.type(t) # {:f, 32} — the element type
Nx.rank(t) # 1 — number of dimensions
Nx.size(t) # 3 — total number of elements
Nx.to_list(t) # [1.0, 2.0, 3.0] — convert to Elixir list
```
### Element-wise Operations
```elixir
a = Nx.tensor([1.0, 2.0, 3.0])
b = Nx.tensor([4.0, 5.0, 6.0])
Nx.add(a, b) # [5.0, 7.0, 9.0]
Nx.subtract(a, b) # [-3.0, -3.0, -3.0]
Nx.multiply(a, b) # [4.0, 10.0, 18.0]
Nx.divide(a, b) # [0.25, 0.4, 0.5]
Nx.negate(a) # [-1.0, -2.0, -3.0]
Nx.abs(a) # [1.0, 2.0, 3.0]
Nx.exp(a) # [2.718, 7.389, 20.085]
Nx.log(a) # [0.0, 0.693, 1.099]
Nx.sqrt(a) # [1.0, 1.414, 1.732]
```
### Broadcasting
When shapes don't match, Nx automatically broadcasts the smaller tensor:
```elixir
a = Nx.tensor([[1.0, 2.0], [3.0, 4.0]]) # shape {2, 2}
b = Nx.tensor([10.0, 20.0]) # shape {2}
Nx.add(a, b)
# [[11.0, 22.0],
# [13.0, 24.0]]
# b is broadcast across rows
```
### Reductions
Collapse dimensions to produce summaries:
```elixir
m = Nx.tensor([[1.0, 2.0], [3.0, 4.0]])
Nx.sum(m) # 10.0 — sum all elements
Nx.mean(m) # 2.5 — mean of all elements
Nx.reduce_max(m) # 4.0 — maximum value
Nx.reduce_min(m) # 1.0 — minimum value
# Reduce along a specific axis
Nx.sum(m, axes: [0]) # [4.0, 6.0] — sum along rows (column sums)
Nx.sum(m, axes: [1]) # [3.0, 7.0] — sum along columns (row sums)
```
### Shape Manipulation
```elixir
t = Nx.tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
Nx.reshape(t, {2, 3})
# [[1.0, 2.0, 3.0],
# [4.0, 5.0, 6.0]]
m = Nx.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
Nx.transpose(m)
# [[1.0, 4.0],
# [2.0, 5.0],
# [3.0, 6.0]]
# Concatenation
a = Nx.tensor([1.0, 2.0])
b = Nx.tensor([3.0, 4.0])
Nx.concatenate([a, b]) # [1.0, 2.0, 3.0, 4.0]
```
### Linear Algebra
```elixir
a = Nx.tensor([[1.0, 2.0], [3.0, 4.0]])
b = Nx.tensor([[5.0, 6.0], [7.0, 8.0]])
Nx.dot(a, b)
# Matrix multiplication:
# [[1*5+2*7, 1*6+2*8],
# [3*5+4*7, 3*6+4*8]]
# = [[19.0, 22.0], [43.0, 50.0]]
# Dot product of vectors
x = Nx.tensor([1.0, 2.0, 3.0])
y = Nx.tensor([4.0, 5.0, 6.0])
Nx.dot(x, y) # 1*4 + 2*5 + 3*6 = 32.0
```
### Try It Yourself
```elixir
# Create a 3x3 matrix, transpose it, then multiply by the original
m = Nx.iota({3, 3}) |> Nx.as_type(:f32)
mt = Nx.transpose(m)
result = Nx.dot(m, mt)
Nx.to_list(result)
```
---
## Lesson 2: Your First Neural Network
### What is a Neural Network?
A neural network is a function that transforms input data into predictions through a series of learned transformations:
```
input → [Linear → Activation] × N → output
```
Each **Linear** layer computes `output = input × weights + bias`. The **Activation** function introduces non-linearity, enabling the network to learn complex patterns.
### Defining a Model with Axon
Axon provides a functional, Keras-like API for building models:
```elixir
model =
Axon.input("input", shape: {nil, 4})
|> Axon.dense(8, activation: :relu)
|> Axon.dense(3, activation: :softmax)
```
Breaking this down:
- `Axon.input("input", shape: {nil, 4})` — defines the input. `nil` means "any batch size", `4` means 4 features per sample.
- `Axon.dense(8, activation: :relu)` — a fully-connected layer with 8 neurons and ReLU activation.
- `Axon.dense(3, activation: :softmax)` — output layer with 3 neurons (one per class) and softmax activation.
### Understanding Layer Shapes
```elixir
# Input: {batch_size, 4}
# ↓ Dense(8) — learns a {4, 8} weight matrix + {8} bias
# Hidden: {batch_size, 8}
# ↓ Dense(3) — learns a {8, 3} weight matrix + {3} bias
# Output: {batch_size, 3}
```
The `nil` in the input shape is the batch dimension — it can be any size.
### Compiling the Model
Before training, we need to compile the model. This initializes parameters and sets up the optimizer:
```elixir
compiled = ExBurn.Model.compile(model,
loss: :cross_entropy,
optimizer: :adam,
learning_rate: 0.01
)
```
### Inspecting the Model
```elixir
# Keras/PyTorch-style summary
IO.puts(ExBurn.Model.summary(compiled))
# Get model info
info = ExBurn.Model.info(compiled)
IO.puts("Total parameters: #{info.total_params}")
IO.puts("Layers: #{info.layer_count}")
IO.puts("Memory: #{info.estimated_memory_mb} MB")
# Access individual components
ExBurn.Model.parameters(compiled) # parameter map
ExBurn.Model.loss_function(compiled) # :cross_entropy
ExBurn.Model.optimizer(compiled) # :adam
```
### Forward Pass (Inference)
```elixir
# Create some dummy input
input = Nx.tensor([[1.0, 2.0, 3.0, 4.0]])
# Run inference
{:ok, output} = ExBurn.Model.predict(compiled, input)
Nx.to_list(output)
# e.g. [[0.2, 0.5, 0.3]] — class probabilities from softmax
```
### Activation Functions
Activation functions introduce non-linearity. Without them, stacking linear layers would be equivalent to a single linear layer:
```elixir
# Common activations in Axon:
Axon.dense(64, activation: :relu) # ReLU: max(0, x) — most common
Axon.dense(64, activation: :sigmoid) # Sigmoid: 1/(1+e^-x) — outputs in [0,1]
Axon.dense(64, activation: :tanh) # Tanh: outputs in [-1, 1]
Axon.dense(64, activation: :softmax) # Softmax: normalizes to probabilities
```
**ReLU** (Rectified Linear Unit) is the default choice for hidden layers. It's simple, fast, and avoids the vanishing gradient problem.
### Try It Yourself
```elixir
# Build a model with 2 hidden layers
model =
Axon.input("x", shape: {nil, 10})
|> Axon.dense(32, activation: :relu, name: "hidden1")
|> Axon.dense(16, activation: :relu, name: "hidden2")
|> Axon.dense(5, name: "output")
compiled = ExBurn.Model.compile(model)
IO.puts(ExBurn.Model.summary(compiled))
```
---
## Lesson 3: Training a Classifier
### The Training Loop
Training is the process of adjusting the model's parameters to minimize the loss function. Each iteration:
1. **Forward pass**: Compute predictions from input data
2. **Loss computation**: Measure how wrong the predictions are
3. **Backward pass**: Compute gradients (how to adjust each parameter)
4. **Optimizer step**: Update parameters to reduce loss
### Complete Training Example
```elixir
import Nx
# ── Step 1: Create synthetic data ──────────────────────────
# 100 samples, 4 features, 3 classes
num_samples = 100
num_features = 4
num_classes = 3
# Random features
x = Nx.random_uniform({num_samples, num_features})
# Random integer labels (0, 1, or 2)
y = Nx.random_uniform({num_samples}, type: {:u, 8})
y = Nx.remainder(y, num_classes) |> Nx.as_type({:s, 64})
# ── Step 2: Split into train/validation ────────────────────
{train, val} = ExBurn.Dataset.split({x, y}, val_split: 0.2, shuffle: false)
{train_x, train_y} = train
{val_x, val_y} = val
# ── Step 3: Define the model ──────────────────────────────
model =
Axon.input("input", shape: {nil, num_features})
|> Axon.dense(8, activation: :relu)
|> Axon.dense(num_classes)
# ── Step 4: Compile ───────────────────────────────────────
compiled = ExBurn.Model.compile(model,
loss: :cross_entropy,
optimizer: :adam,
learning_rate: 0.01
)
# ── Step 5: Train ─────────────────────────────────────────
trained = ExBurn.Training.fit(compiled, {train_x, train_y},
epochs: 20,
batch_size: 16,
validation_data: {val_x, val_y},
verbose: true
)
# ── Step 6: Evaluate ──────────────────────────────────────
{loss, accuracy} = ExBurn.Training.evaluate(trained, {val_x, val_y}, true)
IO.puts("Validation loss: #{loss}, accuracy: #{accuracy}")
```
### Understanding the Output
When `verbose: true`, you'll see output like:
```
Training: 80 samples, 5 batches/epoch, 20 epochs
batch_size=16, effective_batch_size=16, optimizer=adam
Epoch 1: loss=1.0986 (1250 samples/s, 64ms) ETA=1s
Epoch 2: loss=1.0852 (1300 samples/s, 61ms) ETA=1s
...
Epoch 20: loss=0.5234 (1350 samples/s, 59ms)
```
Key metrics:
- **loss**: The average loss per batch (lower is better)
- **samples/s**: Training throughput
- **ETA**: Estimated time remaining
### Batch Size
The `batch_size` controls how many samples are processed before updating parameters:
```elixir
# Small batch: noisier gradients, slower training, less memory
batch_size: 8
# Large batch: smoother gradients, faster training, more memory
batch_size: 64
```
### Epochs
One epoch = one full pass through the training data. More epochs = more training, but too many can cause overfitting.
### Try It Yourself
```elixir
# Experiment: try different batch sizes and learning rates
# Which combination converges fastest?
# Which gives the best final accuracy?
```
---
## Lesson 4: Understanding Loss Functions
### What is a Loss Function?
A loss function measures how far the model's predictions are from the true values. Training aims to minimize this value.
### Cross-Entropy Loss (Classification)
Used for multi-class classification. Measures the difference between predicted class probabilities and true labels:
```elixir
# Target as integer class indices
pred = Nx.tensor([[2.0, 1.0, 0.1]]) # model logits for 3 classes
target = Nx.tensor([0]) # true class is 0
# Or target as one-hot encoded
target_onehot = Nx.tensor([[1.0, 0.0, 0.0]])
```
The loss is lower when the model assigns high probability to the correct class:
```elixir
model = Axon.input("x", shape: {nil, 3}) |> Axon.dense(3)
compiled = ExBurn.Model.compile(model, loss: :cross_entropy)
# Good prediction → low loss
good_pred = Nx.tensor([[10.0, 0.1, 0.1]]) # confident and correct
{:ok, loss} = ExBurn.Model.compute_loss(compiled, good_pred, Nx.tensor([0]))
# loss ≈ 0.0001
# Bad prediction → high loss
bad_pred = Nx.tensor([[0.1, 0.1, 10.0]]) # confident but wrong
{:ok, loss} = ExBurn.Model.compute_loss(compiled, bad_pred, Nx.tensor([0]))
# loss ≈ 10.0
```
### Mean Squared Error (Regression)
Used for regression tasks where the target is a continuous value:
```elixir
model = Axon.input("x", shape: {nil, 5}) |> Axon.dense(1)
compiled = ExBurn.Model.compile(model, loss: :mse)
pred = Nx.tensor([[3.0]])
target = Nx.tensor([[5.0]])
{:ok, loss} = ExBurn.Model.compute_loss(compiled, pred, target)
# MSE = (3-5)² = 4.0
```
### Binary Cross-Entropy (Binary Classification)
Used when there are exactly two classes:
```elixir
model = Axon.input("x", shape: {nil, 10}) |> Axon.dense(1)
compiled = ExBurn.Model.compile(model, loss: :binary_cross_entropy)
# Targets are 0.0 or 1.0
pred = Nx.tensor([[0.9]]) # model predicts class 1 with 90% confidence
target = Nx.tensor([[1.0]]) # true class is 1
{:ok, loss} = ExBurn.Model.compute_loss(compiled, pred, target)
# loss ≈ 0.105 (low, because prediction matches target)
```
### Choosing the Right Loss
| Task | Loss Function | Target Format |
|---|---|---|
| Multi-class classification | `:cross_entropy` | Integer indices or one-hot |
| Binary classification | `:binary_cross_entropy` | 0.0 or 1.0 |
| Regression | `:mse` | Continuous values |
---
## Lesson 5: Optimizers and Learning Rates
### What is an Optimizer?
An optimizer determines how to update the model's parameters based on the computed gradients. Different optimizers have different strategies.
### Adam (Default)
Adam adapts the learning rate for each parameter individually. It's a good default for most tasks:
```elixir
ExBurn.Model.compile(model,
optimizer: :adam,
learning_rate: 0.001 # good starting point
)
```
**When to use**: Default choice. Works well with minimal tuning.
**Tips**:
- If loss oscillates → reduce learning rate (try `0.0001`)
- If convergence is very slow → increase learning rate (try `0.01`)
### SGD with Momentum
SGD with momentum accumulates a velocity vector in directions of consistent gradient:
```elixir
ExBurn.Model.compile(model,
optimizer: :sgd,
learning_rate: 0.01 # needs higher LR than Adam
)
# With Nesterov momentum (often converges faster):
ExBurn.Training.fit(model, data, nesterov: true)
```
**When to use**: When you need maximum generalization and have time to tune.
### RMSprop
RMSprop adapts learning rates based on the magnitude of recent gradients:
```elixir
ExBurn.Model.compile(model,
optimizer: :rmsprop,
learning_rate: 0.001
)
```
**When to use**: RNNs, LSTMs, or when Adam diverges.
### Learning Rate Schedules
Instead of a fixed learning rate, you can vary it during training:
```elixir
# Step decay: halve LR every 10 epochs
ExBurn.Training.fit(model, data,
lr_schedule: {:step, 0.001, 10, 0.5}
)
# Exponential decay: multiply LR by 0.95 each epoch
ExBurn.Training.fit(model, data,
lr_schedule: {:exponential, 0.001, 0.95}
)
# Cosine annealing: smooth decay (often best results)
ExBurn.Training.fit(model, data,
lr_schedule: {:cosine, 0.001, 1.0e-5}
)
```
Visual comparison:
```
LR
│
0.001 ─┤ ████
│ ████ ╲ Step (sudden drops)
│ ████ ╲ ╲
│ ████ ╲ ╲
│ ████ ╲ ╲
0.0001 ┤ ╲ ╲
│ ╲ ╲ ╲
│ ╲ ╲ ╲
│ ╲ ╲ ╲
0.00001 ┤──────────────╲──── Cosine (smooth)
└──────────────────────── Epochs
```
### Warmup
Gradually increase the learning rate at the start of training for stability:
```elixir
ExBurn.Training.fit(model, data,
callbacks: [
ExBurn.Training.WarmupCallback.linear(5, 1.0e-5, 0.001)
]
)
```
This ramps the LR from `1.0e-5` to `0.001` over the first 5 epochs.
### Reduce on Plateau
Automatically reduce the learning rate when validation loss stops improving:
```elixir
ExBurn.Training.fit(model, data,
callbacks: [
ExBurn.Training.ReduceLROnPlateauCallback.new(
patience: 5,
factor: 0.5,
min_lr: 1.0e-6
)
]
)
```
### Try It Yourself
```elixir
# Compare optimizers on the same data:
# 1. Adam with lr=0.001
# 2. SGD with lr=0.01 and nesterov=true
# 3. Adam with cosine annealing
# Which converges fastest? Which gives the best final loss?
```
---
## Lesson 6: Overfitting and Regularization
### What is Overfitting?
Overfitting happens when the model memorizes the training data instead of learning general patterns. Signs:
- Training loss keeps decreasing, but validation loss starts increasing
- Large gap between training and validation accuracy
```
Loss
│
│ ╲ ╱ ── training loss (keeps decreasing)
│ ╲ ╱
│ ╲ ╱
│ ╲ ╱ ╱── validation loss (starts increasing = overfitting!)
│ ╲╱ ╱
│ ╱
└──────────────── Epochs
```
### Technique 1: Dropout
Randomly "drops" (sets to zero) a fraction of neurons during training. Forces the network to not rely on any single neuron:
```elixir
model =
Axon.input("x", shape: {nil, 10})
|> Axon.dense(64, activation: :relu)
|> Axon.dropout(rate: 0.5) # drop 50% of neurons
|> Axon.dense(64, activation: :relu)
|> Axon.dropout(rate: 0.3) # drop 30% of neurons
|> Axon.dense(3)
```
**Rule of thumb**: Use `rate: 0.2-0.5` for hidden layers. Don't use dropout on the output layer.
### Technique 2: Weight Decay (L2 Regularization)
Penalizes large weights, encouraging the model to learn simpler patterns:
```elixir
ExBurn.Model.compile(model,
weight_decay: 1.0e-4 # L2 regularization coefficient
)
```
**Rule of thumb**:
- `1.0e-4` — good default
- `1.0e-5` — small datasets (less regularization needed)
- `1.0e-3` — large models that overfit
### Technique 3: Early Stopping
Stop training when validation loss stops improving:
```elixir
ExBurn.Training.fit(model, data,
validation_data: val_data,
callbacks: [
ExBurn.Training.EarlyStoppingCallback.wait(5, 1.0e-4)
]
)
```
This stops training after 5 epochs without at least `1.0e-4` improvement in validation loss.
### Technique 4: Gradient Clipping
Prevents exploding gradients (which cause NaN loss):
```elixir
ExBurn.Training.fit(model, data,
clip_norm: 1.0, # clip gradient norm to 1.0
clip_value: 5.0 # also clip individual gradient values to [-5, 5]
)
```
### Technique 5: Freezing Layers
When fine-tuning a pre-trained model, freeze early layers to preserve learned features:
```elixir
# Freeze the first layer
frozen_model = ExBurn.Model.freeze(model, ["hidden1"])
# Check which layers are frozen
ExBurn.Model.frozen_layers(frozen_model) # #MapSet<["hidden1"]>
# Unfreeze later
unfrozen_model = ExBurn.Model.unfreeze(frozen_model, ["hidden1"])
```
### Try It Yourself
```elixir
# Train a model WITHOUT regularization → observe overfitting
# Then add dropout + weight decay + early stopping → compare
```
---
## Lesson 7: Working with Real Data
### Data Splitting
Always split your data into training, validation, and test sets:
```elixir
# Split into 80% train, 20% validation
{train, val} = ExBurn.Dataset.split({x, y}, val_split: 0.2, shuffle: true, seed: 42)
# For a three-way split:
{train, temp} = ExBurn.Dataset.split({x, y}, val_split: 0.3, seed: 42)
{val, test} = ExBurn.Dataset.split(temp, val_split: 0.5, seed: 42)
# Result: 70% train, 15% val, 15% test
```
Use `seed` for reproducible splits.
### Data Loading
Create a batched data loader for efficient training:
```elixir
loader = ExBurn.Dataset.loader({x, y},
batch_size: 32,
shuffle: true,
drop_last: false # keep partial last batch
)
# Iterate through batches
Enum.each(loader, fn {batch_x, batch_y} ->
# process batch
end)
```
### Normalization
Neural networks train better when input features are on a similar scale:
```elixir
# Standard normalization: zero mean, unit variance
{train_norm, stats} = ExBurn.Dataset.normalize(train_x, method: :standard)
# Apply the same transformation to validation/test data
val_norm = ExBurn.Dataset.normalize_with_stats(val_x, stats)
```
Three normalization methods:
| Method | What it does | When to use |
|---|---|---|
| `:standard` | `(x - mean) / std` | Default for most features |
| `:minmax` | `(x - min) / (max - min)` | When you need values in [0, 1] |
| `:l2` | `x / ||x||_2` | When direction matters more than magnitude |
**Important**: Always compute statistics on training data only, then apply them to validation/test data.
### One-Hot Encoding
Convert integer class labels to one-hot vectors:
```elixir
labels = Nx.tensor([0, 2, 1, 3])
one_hot = ExBurn.Dataset.one_hot(labels, num_classes: 4)
# [[1, 0, 0, 0],
# [0, 0, 1, 0],
# [0, 1, 0, 0],
# [0, 0, 0, 1]]
```
### Dataset Statistics
```elixir
stats = ExBurn.Dataset.stats({x, y})
# %{num_samples: 100, input_shape: {100, 4}, target_shape: {100},
# input_type: {:f, 32}, target_type: {:s, 64}}
```
### Complete Data Pipeline Example
```elixir
# 1. Load your data (however you get it)
# x = ... # your features
# y = ... # your labels
# 2. Split
{train, val} = ExBurn.Dataset.split({x, y}, val_split: 0.2, seed: 42)
# 3. Normalize
{train_x_norm, norm_stats} = ExBurn.Dataset.normalize(elem(train, 0), method: :standard)
val_x_norm = ExBurn.Dataset.normalize_with_stats(elem(val, 0), norm_stats)
# 4. Train
{ExBurn.Model.compile(model), {train_x_norm, elem(train, 1)}}
|> then(fn {compiled, train_data} ->
ExBurn.Training.fit(compiled, train_data,
validation_data: {val_x_norm, elem(val, 1)},
epochs: 50
)
end)
```
---
## Lesson 8: Inference and Deployment
### Running Inference
After training, use the model to make predictions:
```elixir
# Single prediction
input = Nx.tensor([[1.0, 2.0, 3.0, 4.0]])
{:ok, output} = ExBurn.Model.predict(trained_model, input)
Nx.argmax(output) # predicted class
# Batch prediction
batch = Nx.tensor([[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0]])
{:ok, outputs} = ExBurn.Model.predict(trained_model, batch)
```
### GPU vs CPU Inference
```elixir
# GPU inference (via defn compiler)
{:ok, output} = ExBurn.Model.forward(trained_model, input)
# CPU inference (via Axon predict)
{:ok, output} = ExBurn.Model.predict(trained_model, input)
```
### Batched Concurrent Inference with Serving
For production use, `Nx.Serving` handles concurrent batching:
```elixir
serving = ExBurn.Serving.build(trained_model,
batch_size: 32,
batch_timeout: 50,
partitions: System.schedulers_online()
)
# Run inference
output = Nx.Serving.run(serving, input)
```
### Saving and Loading Models
```elixir
# Save to file
ExBurn.Model.save(trained_model, "my_model.bin")
# Load from file
{:ok, loaded_model} = ExBurn.Model.load(compiled_model, "my_model.bin")
# Serialize to binary (for network transfer)
binary = ExBurn.Model.serialize_params(trained_model)
{:ok, params} = ExBurn.Model.deserialize_params(binary)
```
### Export Formats
```elixir
# Compressed Erlang terms (default, portable)
ExBurn.Model.export(model, "model.etf", format: :elixir_terms)
# JSON (human-readable, larger)
ExBurn.Model.export(model, "model.json", format: :json)
# Import
{:ok, model} = ExBurn.Model.import_params(model, "model.etf")
{:ok, model} = ExBurn.Model.import_params(model, "model.json", format: :json)
```
### Model Quantization
Reduce model size for deployment:
```elixir
# Convert to half precision (f16) — 2x smaller
quantized = ExBurn.Model.quantize(trained_model, :f16)
# Or brain float 16 (bf16) — better range than f16
quantized = ExBurn.Model.quantize(trained_model, :bf16)
```
### Benchmarking
Measure inference speed:
```elixir
results = ExBurn.Model.benchmark(trained_model, input, warmup: 3, runs: 10)
# %{avg_ms: 1.234, min_ms: 1.100, max_ms: 1.500,
# median_ms: 1.200, std_ms: 0.120, runs: 10, warmup: 3}
```
---
## Lesson 9: GPU-Accelerated Numerical Functions
### What is `defn`?
`defn` lets you write numerical functions that run on the GPU. The `ExBurn.Defn.Compiler` traces your function and compiles it to Burn GPU kernels.
### Setup
```elixir
Nx.default_backend(ExBurn.Backend)
Nx.Defn.global_default_options(compiler: ExBurn.Defn.Compiler)
```
### Writing `defn` Functions
```elixir
defmodule MyMath do
import Nx.Defn
# Element-wise sigmoid: 1 / (1 + e^(-x))
defn sigmoid(x) do
Nx.divide(1.0, Nx.add(1.0, Nx.exp(Nx.negate(x))))
end
# Linear regression prediction: X @ w + b
defn predict(X, w, b) do
Nx.add(Nx.dot(X, w), b)
end
# Mean squared error
defn mse_loss(y_true, y_pred) do
diff = Nx.subtract(y_true, y_pred)
Nx.mean(Nx.multiply(diff, diff))
end
# ReLU activation
defn relu(x) do
Nx.max(x, 0.0)
end
# L2 normalization
defn l2_normalize(x) do
norm = Nx.sqrt(Nx.sum(Nx.multiply(x, x), axes: [-1], keep_axes: true))
Nx.divide(x, norm)
end
end
# These all run on the GPU!
MyMath.sigmoid(Nx.tensor([1.0, 2.0, 3.0]))
MyMath.relu(Nx.tensor([-1.0, 0.0, 1.0]))
```
### Per-Function Compiler Override
```elixir
defmodule MyModule do
import Nx.Defn
# This function uses ExBurn's GPU compiler
defn gpu_function(x) do
Nx.sin(x) |> Nx.exp()
end
compiler: ExBurn.Defn.Compiler
# This function uses the default (CPU) compiler
defn cpu_function(x) do
Nx.cos(x)
end
end
```
### Control Flow in `defn`
```elixir
defmodule ControlFlow do
import Nx.Defn
defn clip_and_scale(x, min_val, max_val, scale) do
x
|> Nx.clip(min_val, max_val)
|> Nx.multiply(scale)
end
defn conditional_compute(x, threshold) do
# Use Nx.select for conditional operations
Nx.select(
Nx.greater(x, threshold), # condition
Nx.multiply(x, 2.0), # value when true
Nx.divide(x, 2.0) # value when false
)
end
end
```
### Using BurnBridge Directly
For maximum performance, bypass Nx and talk to Burn directly:
```elixir
# Create tensors directly on the GPU
t1 = ExBurn.BurnBridge.zeros([100, 100], :f32)
t2 = ExBurn.BurnBridge.ones([100, 100], :f32)
# Each operation is a single NIF call
t3 = ExBurn.BurnBridge.add(t1, t2)
t4 = ExBurn.BurnBridge.matmul(t1, t2)
t5 = ExBurn.BurnBridge.relu(t3)
# Convert back to Nx when needed
nx_tensor = ExBurn.BurnBridge.to_nx(t3)
```
### Try It Yourself
```elixir
# Implement a GPU-accelerated softmax function using defn
defmodule SoftmaxGPU do
import Nx.Defn
defn softmax(x) do
# Numerically stable softmax
shifted = x - Nx.reduce_max(x, axes: [-1], keep_axes: true)
exp_shifted = Nx.exp(shifted)
exp_shifted / Nx.sum(exp_shifted, axes: [-1], keep_axes: true)
end
end
# Test it
input = Nx.tensor([[1.0, 2.0, 3.0]])
SoftmaxGPU.softmax(input)
# Should sum to 1.0 across the last dimension
```
---
## Lesson 10: Putting It All Together
### Complete Example: Iris-like Classification
This example combines everything from the previous lessons:
```elixir
import Nx
# ── 1. Prepare Data ────────────────────────────────────────
num_samples = 150
num_features = 4
num_classes = 3
# Synthetic data (replace with real data in practice)
x = Nx.random_uniform({num_samples, num_features})
y = Nx.remainder(Nx.iota({num_samples}), num_classes) |> Nx.as_type({:s, 64})
# Split
{train, val} = ExBurn.Dataset.split({x, y}, val_split: 0.2, seed: 42)
{train_x, train_y} = train
{val_x, val_y} = val
# Normalize
{train_x_norm, stats} = ExBurn.Dataset.normalize(train_x, method: :standard)
val_x_norm = ExBurn.Dataset.normalize_with_stats(val_x, stats)
# ── 2. Define Model ────────────────────────────────────────
model =
Axon.input("features", shape: {nil, num_features})
|> Axon.dense(32, activation: :relu, name: "hidden1")
|> Axon.dropout(rate: 0.2)
|> Axon.dense(16, activation: :relu, name: "hidden2")
|> Axon.dropout(rate: 0.2)
|> Axon.dense(num_classes, name: "output")
# ── 3. Compile ─────────────────────────────────────────────
compiled = ExBurn.Model.compile(model,
loss: :cross_entropy,
optimizer: :adam,
learning_rate: 0.001,
weight_decay: 1.0e-4
)
IO.puts(ExBurn.Model.summary(compiled))
# ── 4. Train ───────────────────────────────────────────────
trained = ExBurn.Training.fit(compiled,
{train_x_norm, train_y},
epochs: 50,
batch_size: 16,
shuffle: true,
validation_data: {val_x_norm, val_y},
lr_schedule: {:cosine, 0.001, 1.0e-5},
clip_norm: 1.0,
accuracy: true,
callbacks: [
&ExBurn.Training.LoggingCallback.log/1,
ExBurn.Training.EarlyStoppingCallback.wait(10, 1.0e-5),
ExBurn.Training.HistoryCallback.new()
],
verbose: true
)
# ── 5. Evaluate ────────────────────────────────────────────
{loss, accuracy} = ExBurn.Training.evaluate(trained, {val_x_norm, val_y}, true)
IO.puts("Final — loss: #{Float.round(loss, 4)}, accuracy: #{Float.round(accuracy * 100, 1)}%")
# ── 6. Inference ──────────────────────────────────────────
new_sample = Nx.tensor([[5.1, 3.5, 1.4, 0.2]])
new_sample_norm = ExBurn.Dataset.normalize_with_stats(new_sample, stats)
{:ok, prediction} = ExBurn.Model.predict(trained, new_sample_norm)
predicted_class = Nx.argmax(prediction) |> Nx.to_number()
IO.puts("Predicted class: #{predicted_class}")
# ── 7. Save ───────────────────────────────────────────────
ExBurn.Model.save(trained, "iris_model.bin")
IO.puts("Model saved!")
```
### Training Checklist
Use this checklist for every training run:
- [ ] **Data split**: Train/val/test split with a fixed seed
- [ ] **Normalization**: Fit on training data, transform all splits
- [ ] **Model architecture**: Appropriate depth/width for the problem
- [ ] **Loss function**: Matches the task (classification vs regression)
- [ ] ] **Optimizer**: Start with Adam, lr=0.001
- [ ] **Regularization**: Dropout + weight decay to prevent overfitting
- [ ] **Early stopping**: Stop when validation loss plateaus
- [ ] **Gradient clipping**: Enable if you see NaN loss
- [ ] **Learning rate schedule**: Cosine annealing for best results
- [ ] **Evaluation**: Check both loss and accuracy on validation set
### Common Problems and Solutions
| Problem | Likely Cause | Solution |
|---|---|---|
| Loss is NaN | Exploding gradients | Enable `clip_norm: 1.0`, reduce learning rate |
| Loss doesn't decrease | LR too low, wrong loss | Increase LR, check loss function |
| Loss oscillates | LR too high, batch too small | Reduce LR, increase batch size or use `accumulate_gradients` |
| Overfitting | Model too complex | Add dropout, weight decay, early stopping |
| Training very slow | Large model with numerical gradients | Use `grad_method: :numerical_batch`, reduce model size |
### Next Steps
- [Training Models](02_training.md) — Full API reference for training
- [Training Optimization Guide](05_training_optimization.md) — Advanced tuning techniques
- [Mobile Deployment](03_mobile_deployment.md) — Deploy to iOS/Android
- [Architecture Deep-Dive](04_architecture.md) — How ExBurn works internally