guides/getting_started.md

Select File
guides/getting_started.md

# Getting started

ALLM is a provider-neutral LLM execution library for Elixir. You write
your workflow once — building a request, picking an engine, calling
`generate/3` or `chat/3` — and run it against OpenAI, Anthropic, Gemini,
or any custom adapter without changing the call site.

This guide walks you from a blank `mix.exs` to a working round-trip
against a real provider in five minutes. We'll use `ALLM.Providers.Fake`
(the deterministic test adapter that ships with the library) for the
first pass — it requires no API key and no network — then swap to a real
provider.

## Install

Add ALLM to your `mix.exs` deps:

```elixir
def deps do
  [
    {:allm, "~> 0.3"}
  ]
end
```

Run `mix deps.get`. ALLM pulls in `req`, `finch`, `jason`, and
`telemetry` as transitive deps; you don't need to declare them yourself.

The toolchain floor is Elixir `~> 1.17` and Erlang/OTP 27+.

## Hello, ALLM (no network)

The simplest possible round-trip uses the fake adapter. Open
`iex -S mix` in your project and paste:

    iex> engine = ALLM.Engine.new(
    ...>   adapter: ALLM.Providers.Fake,
    ...>   adapter_opts: [script: [{:text, "Hello, ALLM!"}, {:finish, :stop}]]
    ...> )
    iex> {:ok, %ALLM.ChatResult{final_response: %ALLM.Response{output_text: text}}} =
    ...>   ALLM.chat(engine, [ALLM.user("Hi.")])
    iex> text
    "Hello, ALLM!"

Three things happened:

1. `ALLM.Engine.new/1` built a runtime engine. Engines hold the
   non-serializable bits — adapter module, adapter opts, optional key
   resolver. They're cheap to construct and safe to share across
   processes.
2. `ALLM.chat/3` ran the auto-loop. With no tools declared, the loop
   completes after a single round-trip and returns an `%ALLM.ChatResult{}`
   wrapping the final `%ALLM.Response{}`.
3. The fake adapter ignored the request entirely and returned the
   scripted reply (`"Hello, ALLM!"`). That's the whole point — Fake is
   for testing orchestration, not provider wire fidelity.

## Building a request explicitly

`ALLM.chat/3` accepts either a list of messages or a `%Request{}`. The
list form is shorthand. Here's the explicit form:

    iex> engine = ALLM.Engine.new(
    ...>   adapter: ALLM.Providers.Fake,
    ...>   adapter_opts: [script: [{:text, "Three primes: 2, 3, 5."}, {:finish, :stop}]]
    ...> )
    iex> req = ALLM.request([
    ...>   ALLM.system("Be concise."),
    ...>   ALLM.user("Name three primes.")
    ...> ])
    iex> {:ok, %ALLM.ChatResult{final_response: %ALLM.Response{output_text: text}}} =
    ...>   ALLM.chat(engine, req)
    iex> text
    "Three primes: 2, 3, 5."

`ALLM.request/2` accepts the same opts you'd set on the request struct
directly: `:model`, `:tools`, `:tool_choice`, `:response_format`,
`:stream`, `:max_tokens`, `:temperature`, `:metadata`.

## When to reach for what

| You want to… | Use this | Returns |
|---|---|---|
| One-shot completion | `ALLM.generate/3` | `{:ok, %Response{}}` |
| One-shot streaming | `ALLM.stream_generate/3` | `{:ok, Enumerable.t}` of events |
| Single round-trip with tool execution | `ALLM.step/3` | `{:ok, %StepResult{}}` |
| Multi-turn auto-loop with tools | `ALLM.chat/3` | `{:ok, %ChatResult{}}` |
| Multi-turn auto-loop, streaming | `ALLM.stream/3` | `{:ok, Enumerable.t}` |
| Multi-turn with persistence between turns | `ALLM.Session.*` | `{:ok, %Session{}}` |
| Generate / edit / vary images | `ALLM.generate_image/3` etc. | `{:ok, %ImageResponse{}}` |

## Swap to a real provider

The engine is the only thing that changes — everything downstream stays
identical. For OpenAI:

```elixir
engine = ALLM.Engine.new(
  adapter: ALLM.Providers.OpenAI,
  model: "gpt-4.1-mini"
)

{:ok, response} = ALLM.generate(engine, ALLM.request([ALLM.user("Hi.")]))
```

For Anthropic:

```elixir
engine = ALLM.Engine.new(
  adapter: ALLM.Providers.Anthropic,
  model: "claude-sonnet-4-6"
)
```

For Gemini:

```elixir
engine = ALLM.Engine.new(
  adapter: ALLM.Providers.Gemini,
  model: "gemini-3-flash-preview"
)
```

Each provider has its own model strings; otherwise the call site is
byte-identical.

## Where do API keys come from?

You have four resolution paths, in priority order:

1. **Per-call** — `ALLM.generate(engine, req, api_key: "sk-...")`. Wins
   over everything. Use this for multi-tenant SaaS where the key changes
   per request.
2. **Engine-level resolver** — `ALLM.Engine.new(adapter: ..., keys: %{my_provider: fn -> System.fetch_env!("MY_KEY") end})`.
3. **Application config** — `config :allm, :keys, openai: "sk-..."`.
4. **Environment variable** — each provider has a default
   (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`).

Engines never persist API keys — they round-trip safely through ETF and
JSON. See `multi_tenant_keys.md` for the full resolution chain.

## Where to next

Pick the path that matches what you're building:

* **Streaming UI** → `streaming.md` — events, filters, cancellation.
* **Tool calls** → `tools.md` — auto loop, manual mode, ask-user.
* **Multi-turn persistence** → `sessions.md` — `%Session{}` and the
  status union.
* **Multi-modal input** → `vision.md` — `TextPart` and `ImagePart`.
* **Image generation** → `image_generation.md` — `generate_image/3`,
  `edit_image/4`, `image_variations/3`.
* **Production hardening** → `errors_and_retries.md` and
  `multi_tenant_keys.md`.

## Testing your integration

`ALLM.Providers.Fake` is the canonical test vehicle. Drop it into your
`config/test.exs`-built engine and write deterministic assertions
against scripted replies — no network, no flakes, no mocking
infrastructure.

    iex> engine = ALLM.Engine.new(
    ...>   adapter: ALLM.Providers.Fake,
    ...>   adapter_opts: [script: [{:text, "ok"}, {:finish, :stop}]]
    ...> )
    iex> {:ok, %ALLM.Response{output_text: text}} =
    ...>   ALLM.generate(engine, ALLM.request([ALLM.user("ping")]))
    iex> text
    "ok"

The `examples/` directory in the repository contains 15 numbered scripts
you can run against any of the bundled providers — see
`examples/README.md`.