docs/partners/zai.md

Select File
docs/partners/zai.md

# BeamWeaver Z.ai

BeamWeaver includes a Z.ai provider under `BeamWeaver.ZAI` for GLM-5.2 Chat
Completions.

## Surface

- `BeamWeaver.ZAI.ChatModel` implements `BeamWeaver.Core.ChatModel`.
- `BeamWeaver.ZAI.Client` calls
  `https://api.z.ai/api/paas/v4/chat/completions` by default.
- Model initialization is strict: use `zai:glm-5.2`. Bare `glm-*` identifiers
  and other `zai:*` models are rejected before transport.
- Runtime config uses `config :beam_weaver, :zai`; `runtime.exs` reads
  `ZAI_API_KEY` plus optional `ZAI_BASE_URL` or `ZAI_API_URL`.
- Standard function tools are rendered as OpenAI-compatible Chat Completions
  `tools`. Z.ai `tool_stream: true` is supported for streaming tool-call
  argument chunks and requires `stream: true`.
- Structured output uses JSON object mode:
  `response_format: %{type: "json_object"}`. BeamWeaver schema requests are
  mapped to JSON object mode, the schema is injected as provider-visible
  instructions, and the response is parsed and validated locally. JSON Schema
  request mode is not enabled for this provider.
- Streaming reconstructs text, reasoning content, streamed tool-call chunks,
  final usage chunks, and `finish_reason: "length"` truncation.
- Usage metadata tracks prompt, completion, total, cached input, and reasoning
  output tokens. Cost metadata uses the checked-in GLM-5.2 prices and does not
  double-bill reasoning tokens.
- Token counting uses BeamWeaver's approximate fallback.

## Usage

```elixir
config :beam_weaver,
  zai: [
    api_key: System.fetch_env!("ZAI_API_KEY")
  ]
```

```elixir
{:ok, model} =
  BeamWeaver.Models.init_chat_model("zai:glm-5.2",
    reasoning_effort: :low,
    thinking: %{type: :enabled},
    max_output_tokens: 1_024
  )

{:ok, message} =
  BeamWeaver.Core.ChatModel.invoke(model, [
    BeamWeaver.Core.Message.user("Reply with a concise plan.")
  ])
```

Stream with usage and tool-call argument chunks:

```elixir
{:ok, message} =
  BeamWeaver.ZAI.ChatModel.stream_response(
    model,
    [BeamWeaver.Core.Message.user("Call get_weather for Tokyo.")],
    tools: [weather_tool],
    tool_choice: "auto",
    tool_stream: true
  )
```

## Profile

`zai:glm-5.2` is checked in with:

- 1,000,000 input tokens
- 131,072 maximum output tokens
- text input and output
- reasoning output
- function tools
- JSON object mode
- streaming
- usage metadata
- Chat Completions API only

GLM-5.2 cost metadata:

- input: `$1.40 / 1M tokens`
- cached input: `$0.26 / 1M tokens`
- output: `$4.40 / 1M tokens`

## Unsupported Z.ai Surfaces

- Additional Z.ai models are intentionally not routed yet.
- JSON Schema request mode is not enabled until live or documented support is
  clear for this endpoint.
- Z.ai media, built-in tools, and non-chat APIs are not exposed in BeamWeaver.

## Related Guides

- [Models](../models.md)
- [Prompt Caching](../prompt_caching.md#zai-glm)