guides/micrograd_demo_walkthrough.md

# Micrograd Demo Walkthrough

## The goal

The demo teaches backpropagation by training a small scalar MLP classifier on a two-dimensional toy dataset. It mirrors the official micrograd workflow while keeping all data generation, loss computation, training, and graph inspection in pure Elixir.

## Scalar autodiff warmup

The notebook starts with scalar values:

```elixir
alias MicrogradEx.Value

x = Value.new(-4.0, label: "x")
y = x |> Value.mul(x) |> Value.relu()

gradients = Value.backward(y)
Value.grad(x, gradients)
```

Each scalar operation creates a new `Value` and records a small local derivative edge. `Value.backward/1` walks the graph in reverse and returns a `Gradients` table.

## The two-moons dataset

The official Python notebook uses sklearn's `make_moons`. MicrogradEx uses `MicrogradEx.Datasets.moons/2`, a deterministic pure-Elixir generator with the same educational role.

Labels are `-1.0` and `1.0` because the max-margin loss uses `yi * scorei`.

## The MLP

The main model is:

```elixir
alias MicrogradEx.NN.MLP

model = MLP.new(2, [16, 16, 1], seed: {1337, 1337, 1337})
```

This means:

* 2 input values;
* first hidden layer with 16 neurons;
* second hidden layer with 16 neurons;
* output layer with 1 neuron.

Hidden layers use ReLU. The final layer is linear.

## Parameter count

The official demo shape has `337` parameters:

```text
First layer: 16 * (2 + 1) = 48
Second layer: 16 * (16 + 1) = 272
Output layer: 1 * (16 + 1) = 17
Total: 337
```

The `+ 1` in each layer is the bias parameter per neuron.

## The max-margin loss

The classification score is the scalar model output. A positive score predicts class `1`; a non-positive score predicts class `-1`.

The loss is:

```text
loss_i = relu(1 - yi * score_i)
data_loss = mean(loss_i)
reg_loss = alpha * sum(p * p)
total_loss = data_loss + reg_loss
```

In code this is `MicrogradEx.Losses.max_margin/4`.

## L2 regularization

The regularization term penalizes large parameters:

```elixir
alpha * sum(p * p for p <- NN.parameters(model))
```

The default `alpha` is `1.0e-4`, matching the official demo.

## The training loop

Training is immutable:

```elixir
gradients = Value.backward(total_loss)
next_model = NN.apply_gradients(model, gradients, learning_rate)
```

`MicrogradEx.Trainer.train/3` runs this loop for 100 steps by default and records loss, data loss, regularization loss, accuracy, and learning rate.

## Plotting loss and accuracy

`MicrogradEx.PlotData` converts training runs into plain rows:

```elixir
PlotData.loss_history(run)
PlotData.accuracy_history(run)
```

The notebook renders those rows with Vega-Lite.

## Decision boundary

The decision boundary is built by evaluating the trained model over a padded grid:

```elixir
PlotData.decision_boundary(run.final_model, dataset, h: 0.25)
```

Every grid point is classified by score sign, then plotted behind the training data.

## Graph inspection

The scalar graph is built during forward operations. MicrogradEx exposes it without mutation:

* `Graph.nodes/2` shows scalar node data and gradients;
* `Graph.edges/1` shows parent-to-child dependencies and local gradients;
* `Graph.to_dot/2` exports DOT text for optional Graphviz rendering.

## What to try next

Change one variable at a time:

* `noise: 0.2`;
* `MLP.new(2, [8, 8, 1])`;
* `steps: 50`;
* `alpha: 0.0`;
* `h: 0.35` for a faster decision-boundary grid.

For a broader set of experiments, open `notebooks/micrograd_extras.livemd`. It compares datasets, model sizes, regularization, learning-rate schedules, decision-boundary resolution, and a spiral dataset challenge.