README.md

# EMLXAxon

`emlx_axon` provides [Axon](https://github.com/elixir-nx/axon) model rewrites that swap supported nodes for `EMLX.Fast` Metal shader implementations, accelerating inference on Apple Silicon.

It builds on top of [`emlx`](https://github.com/elixir-nx/emlx) and is intended to be used alongside [Bumblebee](https://github.com/elixir-nx/bumblebee) for running LLM serving workloads on MLX.

## Usage

Add `emlx_axon` as a dependency in your `mix.exs`:

```elixir
def deps do
  [
    {:emlx_axon, github: "elixir-nx/emlx", sparse: "emlx_axon", branch: "main"},
    {:emlx, github: "elixir-nx/emlx", branch: "main"}
  ]
end
```

## Model download

The examples and tests that run inference require local model checkpoints downloaded from HuggingFace.

Install the HuggingFace CLI if you don't have it:

```bash
pipx install huggingface_hub[cli]
```

Download the model checkpoints:

```bash
# 0.6B — small, fast to iterate (~400 MB)
huggingface-cli download lmstudio-community/Qwen3-0.6B-MLX-4bit \
  --local-dir ~/models/Qwen3-0.6B-MLX-4bit

# 8B — headline size (~5 GB)
huggingface-cli download lmstudio-community/Qwen3-8B-MLX-4bit \
  --local-dir ~/models/Qwen3-8B-MLX-4bit
```

Export the path before running tests or benchmarks:

```bash
export EMLX_QWEN3_MODEL_PATH=~/models/Qwen3-0.6B-MLX-4bit
```

### Pinning a model revision

For golden-token determinism, pin the model revision in HuggingFace by passing
`--revision <commit_sha>` to `huggingface-cli download`.

### CI note

Tests that require a local checkpoint are excluded from the default `mix test` run
and from CI — do not add a CI job that downloads the checkpoint, as the 8B model
is ~5 GB and the tests require local Apple Silicon hardware.