UPDATE_LLAMA.md

Select File:
# Updating the vendored llama.cpp

erllama vendors a pinned copy of llama.cpp under `c_src/llama.cpp/`.
The current pin is **b9093**.

This file documents the bump procedure.

## Why pin

- Reproducible builds: every developer and CI run compiles the same
  source.
- Hex.pm-friendly: published packages contain the full source so no
  network access is needed at install time.
- Backend stability: llama.cpp moves fast, especially in the model
  zoo. We control when we adopt new architectures.

## What we ship

We vendor only the parts we need. Currently:

```
c_src/llama.cpp/
  CMakeLists.txt            llama.cpp's top-level CMake
  LICENSE                   MIT (dual MIT/Apache, llama.cpp pick MIT)
  cmake/                    CMake helpers (toolchain files, etc)
  include/                  public headers (llama.h, etc)
  src/                      llama core (model.cpp, context.cpp, etc)
  ggml/
    CMakeLists.txt
    cmake/                  ggml CMake helpers (common.cmake, GitVars.cmake)
    include/                public ggml headers
    src/
      CMakeLists.txt
      ggml*.c, ggml*.cpp, ggml*.h  core ggml + frontends
      gguf.cpp                     GGUF file format
      ggml-cpu/                    CPU SIMD kernels (mandatory)
      ggml-metal/                  Apple GPU backend (Apple Silicon)
      ggml-cuda/                   NVIDIA GPU backend (Linux x86-64)
      ggml-blas/                   BLAS backend (OpenBLAS / Accelerate)
```

Excluded (unused or out-of-scope for v1):

- `tools/`, `examples/`, `tests/`, `docs/`, `models/`, `gguf-py/`,
  `benches/`, `ci/`, `scripts/`, `grammars/`, `vendor/`, `.git/`,
  `.github/`, `AUTHORS`, `.devops/`
- ggml backends we do not link: Vulkan, SYCL, OpenCL, CANN, Hexagon,
  HIP, MUSA, RPC, ZDNN, ZenDNN, Virtgpu, Webgpu, OpenVINO

If a user needs one of the excluded backends they can build erllama
against an unvendored llama.cpp via `git+` rebar dep instead of the
hex package; that path is supported but unsupported in this scaffold.

## Bumping

Pick a tag from <https://github.com/ggml-org/llama.cpp/tags>. Newer
tags are usually fine; check the changelog for breaking C-API changes
to `llama_state_seq_*` (the cache layer depends on those).

```sh
# 1. Clone the new tag into a scratch directory.
cd /tmp
git clone --depth=1 --branch=<TAG> https://github.com/ggml-org/llama.cpp llama.cpp.new

# 2. Sync the parts we vendor.
cd /Users/benoitc/Projects/erllama
rm -rf c_src/llama.cpp
mkdir -p c_src/llama.cpp/ggml/src

cp -r /tmp/llama.cpp.new/{src,include,cmake,CMakeLists.txt,LICENSE} \
      c_src/llama.cpp/
cp -r /tmp/llama.cpp.new/ggml/{include,cmake,CMakeLists.txt} \
      c_src/llama.cpp/ggml/
cp /tmp/llama.cpp.new/ggml/src/CMakeLists.txt \
   c_src/llama.cpp/ggml/src/
cp /tmp/llama.cpp.new/ggml/src/ggml*.c \
   /tmp/llama.cpp.new/ggml/src/ggml*.cpp \
   /tmp/llama.cpp.new/ggml/src/ggml*.h \
   /tmp/llama.cpp.new/ggml/src/gguf.cpp \
   c_src/llama.cpp/ggml/src/
cp -r /tmp/llama.cpp.new/ggml/src/{ggml-cpu,ggml-metal,ggml-cuda,ggml-blas} \
      c_src/llama.cpp/ggml/src/

# 3. Rebuild and run the full test gauntlet.
rm -rf _build
rebar3 compile
rebar3 fmt --check && rebar3 lint && rebar3 xref \
    && rebar3 eunit && rebar3 proper && rebar3 ct

# 4. Update the pin reference in this file and in
#    c_src/llama.cpp/.version (if present).

# 5. Commit with a message naming the new tag.
```

## Configuration knobs

The CMake configure step honours these env vars (passed via
`ERLLAMA_OPTS` to `do_cmake.sh`):

```
ERLLAMA_OPTS="-DGGML_CUDA=ON"           # enable CUDA on Linux x86-64
ERLLAMA_OPTS="-DGGML_METAL=OFF"         # disable Metal on Darwin
ERLLAMA_OPTS="-DGGML_BLAS=OFF"          # disable BLAS
ERLLAMA_OPTS="-DCMAKE_BUILD_TYPE=Debug" # debug build
```

The build step honours `ERLLAMA_BUILDOPTS` (passed to `cmake --build`).

## Why we drop `common/`

llama.cpp's `common/` carries HTTP / Hugging Face download helpers
that pull in cpp-httplib (5 MB). erllama uses the public `llama.h` API
directly and provides its own thin sampling / tokenization helpers in
the NIF; nothing in `common/` is on our critical path.