guides/reading_edifice.md

Select File:
guides/reading_edifice.md

# Reading Edifice
> How to understand and use the code patterns in this library -- Axon computation graphs, the build API, tensor shapes, and running inference.

## What This Guide Covers

Every architecture in Edifice follows the same patterns. Once you understand these patterns,
you can pick up any of the 90+ architectures without re-learning the API. This guide walks
through those patterns with runnable examples.

**Prerequisites:** You should be comfortable with the concepts in
[ML Foundations](ml_foundations.md) and [Core Vocabulary](core_vocabulary.md). Familiarity with
basic Elixir syntax is helpful but not strictly required -- the patterns are simple enough to
follow even if you're new to the language.

## The Stack: Nx, Axon, and Edifice

Edifice sits on top of two foundational Elixir libraries:

```
┌─────────────────────────────────────┐
│           Edifice                    │  90+ architectures, consistent API
│  "What architecture do I want?"      │
├─────────────────────────────────────┤
│           Axon                       │  Model building, computation graphs
│  "How do layers connect?"            │
├─────────────────────────────────────┤
│           Nx                         │  Numerical computing, tensors, autograd
│  "How do I do math on tensors?"      │
├─────────────────────────────────────┤
│     EXLA (optional)                  │  GPU acceleration via XLA compiler
│  "Make it fast on GPU"               │
└─────────────────────────────────────┘
```

**Nx** is Elixir's numerical computing library. It provides tensors (multi-dimensional arrays),
mathematical operations, and automatic differentiation. Think of it as Elixir's equivalent of
NumPy + autograd.

**Axon** builds on Nx to provide a model-building API. You define a neural network as a
**computation graph** -- a description of how data flows through layers. The graph is then
compiled into efficient functions for initialization and prediction.

**Edifice** uses Axon to implement 90+ architectures with a consistent API. Instead of manually
wiring up attention heads, SSM blocks, and normalization layers, you call `Edifice.build/2` and
get a ready-to-use Axon model.

## The Build Pattern

Every architecture module in Edifice has a `build/1` function that returns an Axon model:

```elixir
# The universal pattern
model = SomeModule.build(option1: value1, option2: value2)
```

The model isn't a trained network -- it's a **computation graph** that describes the network's
structure. No weights exist yet. No computation has happened. It's a blueprint.

### Building by Module

You can use any architecture module directly:

```elixir
# Simple feedforward network
model = Edifice.Feedforward.MLP.build(input_size: 256, hidden_sizes: [512, 256])

# Mamba state space model
model = Edifice.SSM.Mamba.build(
  embed_size: 128,
  hidden_size: 256,
  state_size: 16,
  num_layers: 4,
  window_size: 60
)

# Graph convolutional network for classification
model = Edifice.Graph.GCN.build_classifier(
  input_dim: 16,
  hidden_dims: [64, 64],
  num_classes: 2,
  pool: :mean
)
```

### Building by Name (Registry)

The unified registry lets you build any architecture with an atom name:

```elixir
# Same Mamba model, built through the registry
model = Edifice.build(:mamba,
  embed_size: 128,
  hidden_size: 256,
  state_size: 16,
  num_layers: 4,
  window_size: 60
)

# Useful for config-driven experiments
arch_name = :retnet  # could come from a config file
model = Edifice.build(arch_name, embed_size: 256, hidden_size: 512, num_layers: 4)
```

You can explore what's available:

```elixir
# List all 90+ architecture names
Edifice.list_architectures()
# => [:adapter, :ann2snn, :attention, :barlow_twins, :bayesian, :bimamba, ...]

# See architectures grouped by family
Edifice.list_families()
# => %{
#   ssm: [:mamba, :mamba_ssd, :s4, :s4d, :s5, :h3, :hyena, ...],
#   attention: [:attention, :retnet, :rwkv, :gla, :hgrn, ...],
#   feedforward: [:mlp, :kan, :tabnet],
#   ...
# }

# Get the module behind a name
Edifice.module_for(:mamba)
# => Edifice.SSM.Mamba
```

## From Graph to Functions: Axon.build

An Axon model is just a graph. To actually run it, you compile it with `Axon.build/1`:

```elixir
model = Edifice.Feedforward.MLP.build(input_size: 10, hidden_sizes: [64, 32])

# Compile the graph into two functions
{init_fn, predict_fn} = Axon.build(model)
```

This gives you two functions:

- **`init_fn`**: creates the initial (random) parameters
- **`predict_fn`**: runs the forward pass

### Initializing Parameters

`init_fn` takes a **template** (a tensor describing the expected input shape) and an empty
model state:

```elixir
# Template: 1 sample, 10 features -- matches input_size: 10
template = Nx.template({1, 10}, :f32)

# Create random initial parameters
params = init_fn.(template, Axon.ModelState.empty())
```

The template doesn't contain real data -- it just tells Axon the shape and type of inputs to
expect so it can create parameters of the right sizes. `Nx.template/2` creates a placeholder
that takes no memory.

`params` is now an `Axon.ModelState` containing all the network's weights and biases, randomly
initialized. For a 2-layer MLP with sizes [64, 32], this includes:
- Layer 0: a {10, 64} weight matrix + a {64} bias vector
- Layer 1: a {64, 32} weight matrix + a {32} bias vector

### Running Inference

`predict_fn` takes parameters and input data, and runs the forward pass:

```elixir
# Create some input data: 4 samples, 10 features each
input = Nx.broadcast(0.5, {4, 10})

# Run the forward pass
output = predict_fn.(params, input)
# => a tensor of shape {4, 32} (4 samples, 32 features from the last hidden layer)
```

That's it. Three steps: **build** the graph, **init** the parameters, **predict** with data.

## Understanding Tensor Shapes

Shapes are how you reason about what's happening inside a network. Every Edifice architecture
documents its expected input and output shapes.

### Common Shape Patterns

```
{batch_size, features}
  Used by: MLP, classification heads, pooled outputs
  Example: {32, 256} = 32 samples, 256 features each

{batch_size, seq_len, features}
  Used by: Sequence models (Mamba, attention, recurrent, TCN)
  Example: {1, 60, 128} = 1 sample, 60 timesteps, 128 features per step

{batch_size, height, width, channels}
  Used by: Vision models (ViT, ResNet, UNet)
  Example: {16, 224, 224, 3} = 16 RGB images at 224x224

Map with named inputs
  Used by: Graph models (GCN, GAT)
  Example: %{"nodes" => {4, 10, 16}, "adjacency" => {4, 10, 10}}
```

### The Batch Dimension

The first dimension is **always** the batch size. When you see `{nil, 60, 128}` in an Axon
input specification, `nil` means "any batch size." The network doesn't care how many samples
you feed it at once.

```elixir
# These all work with the same model:
predict_fn.(params, Nx.broadcast(0.5, {1, 60, 128}))    # 1 sample
predict_fn.(params, Nx.broadcast(0.5, {32, 60, 128}))   # 32 samples
predict_fn.(params, Nx.broadcast(0.5, {256, 60, 128}))  # 256 samples
```

### Shape Transformations

Most Edifice sequence models output `{batch, hidden_size}` -- they reduce the sequence dimension
by taking the last timestep or pooling. This is because the common use case is classification or
regression from sequences, where you need a fixed-size output regardless of sequence length.

```elixir
# Mamba: sequence in, fixed vector out
model = Edifice.build(:mamba, embed_size: 128, hidden_size: 256, num_layers: 2, window_size: 60)
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({1, 60, 128}, :f32), Axon.ModelState.empty())
output = predict_fn.(params, Nx.broadcast(0.5, {1, 60, 128}))
# output shape: {1, 256}  -- the 60 timesteps have been reduced to a single vector
```

## Generative Models: The Tuple Pattern

Most architectures return a single Axon model. Generative architectures return **tuples** of
models because they have multiple components that are trained differently:

```elixir
# VAE returns an encoder and a decoder
{encoder, decoder} = Edifice.Generative.VAE.build(
  input_size: 784,
  latent_size: 32,
  encoder_sizes: [512, 256],
  decoder_sizes: [256, 512]
)

# Each is a separate Axon model
{enc_init, enc_predict} = Axon.build(encoder)
{dec_init, dec_predict} = Axon.build(decoder)

# GAN returns a generator and a discriminator
{generator, discriminator} = Edifice.Generative.GAN.build(
  latent_size: 128,
  output_size: 784,
  gen_sizes: [256, 512],
  disc_sizes: [512, 256]
)
```

Generative modules also provide associated utility functions for training:

```elixir
# VAE: reparameterization trick and KL divergence
z = Edifice.Generative.VAE.reparameterize(mu, log_var)
kl_loss = Edifice.Generative.VAE.kl_divergence(mu, log_var)
```

## Graph Models: Map Inputs

Graph models expect **maps** as input because graphs have multiple components (nodes, edges,
adjacency matrices):

```elixir
model = Edifice.Graph.GCN.build_classifier(
  input_dim: 16,
  hidden_dims: [64, 64],
  num_classes: 2,
  pool: :mean
)

{init_fn, predict_fn} = Axon.build(model)

# Graph input is a map with named tensors
input = %{
  "nodes" => Nx.broadcast(0.5, {4, 10, 16}),       # 4 graphs, 10 nodes, 16 features
  "adjacency" => Nx.eye(10) |> Nx.broadcast({4, 10, 10})  # adjacency matrices
}

params = init_fn.(
  %{
    "nodes" => Nx.template({4, 10, 16}, :f32),
    "adjacency" => Nx.template({4, 10, 10}, :f32)
  },
  Axon.ModelState.empty()
)

output = predict_fn.(params, input)
# output shape: {4, 2} -- 4 graphs, 2 class probabilities each
```

## Common Options Across Architectures

While each architecture has unique options, several appear across many modules:

| Option | Meaning | Typical Values |
|--------|---------|----------------|
| `embed_size` | Input feature dimension per token | 64, 128, 256, 512 |
| `hidden_size` | Internal representation width | 128, 256, 512, 1024 |
| `num_layers` | Depth of the network (stacked blocks) | 2, 4, 6, 8, 12 |
| `num_heads` | Number of attention heads | 4, 8, 16 |
| `window_size` | Expected sequence length | 60, 128, 512, 1024 |
| `dropout` | Dropout rate for regularization | 0.0, 0.1, 0.2 |
| `activation` | Activation function | `:relu`, `:silu`, `:gelu` |

**Larger values = more capacity** (can learn more complex patterns) but also **more compute
and more data needed** to train effectively.

## Putting It All Together: A Complete Example

Here's a full example showing the lifecycle from architecture selection to inference:

```elixir
# 1. Choose an architecture for sequence classification
#    We have 60-frame game state sequences with 128 features per frame
#    and want to classify into 5 actions
model = Edifice.build(:mamba,
  embed_size: 128,
  hidden_size: 256,
  state_size: 16,
  num_layers: 4,
  window_size: 60
)

# 2. Add a classification head on top
#    Edifice models output a feature vector; we need class probabilities
classifier =
  model
  |> Axon.dense(5, name: "action_head")
  |> Axon.activation(:softmax)

# 3. Compile the full model
{init_fn, predict_fn} = Axon.build(classifier)

# 4. Initialize parameters
template = Nx.template({1, 60, 128}, :f32)
params = init_fn.(template, Axon.ModelState.empty())

# 5. Run inference on a batch of game states
game_states = Nx.broadcast(0.5, {8, 60, 128})  # 8 sequences of 60 frames
predictions = predict_fn.(params, game_states)
# predictions shape: {8, 5} -- probability distribution over 5 actions for each sequence
```

Notice step 2: Edifice models are composable Axon graphs. You can pipe them into additional
layers, combine multiple Edifice models, or use Edifice layers as components in a larger
architecture. This composability is fundamental to the design.

## Comparing Architectures

Because every architecture follows the same API, swapping one for another is trivial:

```elixir
# Try several sequence models with the same input/output contract
architectures = [
  {:mamba, [embed_size: 128, hidden_size: 256, num_layers: 4, window_size: 60]},
  {:retnet, [embed_size: 128, hidden_size: 256, num_layers: 4, num_heads: 4, window_size: 60]},
  {:lstm, [embed_size: 128, hidden_size: 256, num_layers: 4, window_size: 60]},
  {:griffin, [embed_size: 128, hidden_size: 256, num_layers: 4, window_size: 60]}
]

for {name, opts} <- architectures do
  model = Edifice.build(name, opts)
  {init_fn, predict_fn} = Axon.build(model)
  params = init_fn.(Nx.template({1, 60, 128}, :f32), Axon.ModelState.empty())
  output = predict_fn.(params, Nx.broadcast(0.5, {1, 60, 128}))
  IO.puts("#{name}: output shape #{inspect(Nx.shape(output))}")
end
```

This is one of Edifice's core value propositions: the cost of trying a different architecture
is a one-line change.

## Reading Architecture Moduledocs

Every module in Edifice includes documentation you can access in IEx:

```elixir
# In IEx
h Edifice.SSM.Mamba           # Module overview
h Edifice.SSM.Mamba.build     # Build function options and return type
```

The moduledocs follow a consistent pattern:
1. One-line description of the architecture
2. ASCII diagram of the computation flow
3. Options with types and defaults
4. Usage examples with shapes annotated

## What's Next

With the API patterns understood, you're ready to explore architectures:

1. **[Learning Path](learning_path.md)** -- a guided tour through the 19 families in a logical order
2. Any architecture-specific guide (e.g., [State Space Models](state_space_models.md),
   [Attention Mechanisms](attention_mechanisms.md)) -- you now have the vocabulary and API
   knowledge to follow them