CUSTOM_ENDPOINTS.md

Select File:
# Custom LLM Endpoint Guide

This guide explains how to integrate custom LLM endpoints with GettextTranslator using LangChain 0.4.0.

## Table of Contents

- [Overview](#overview)
- [Supported Models](#supported-models)
- [Response Format Requirements](#response-format-requirements)
  - [Synchronous Response](#synchronous-non-streaming-response)
  - [Streaming Response](#streaming-response)
  - [Error Response](#error-response)
- [Custom Adapter Implementation](#custom-adapter-implementation)
- [Configuration](#configuration)
- [Testing](#testing-your-custom-endpoint)
- [Common Issues](#common-issues)
- [Examples](#examples)

## Overview

GettextTranslator uses LangChain to communicate with LLM providers. As of version 0.5.0, the library uses LangChain 0.4.0, which introduced breaking changes in how messages are structured.

**Key Change in LangChain 0.4.0:**
- Message `content` is now a **list of ContentPart structs** instead of plain strings
- All custom endpoints must return responses in this format

## Supported Models

LangChain 0.4.0 officially supports:

| Provider | Module | Status |
|----------|--------|--------|
| OpenAI | `LangChain.ChatModels.ChatOpenAI` | ✅ Fully Supported |
| Anthropic | `LangChain.ChatModels.ChatAnthropic` | ✅ Fully Supported |
| Google Gemini | `LangChain.ChatModels.ChatGoogleAI` | ✅ Fully Supported |
| Google Vertex AI | `LangChain.ChatModels.ChatVertexAI` | ✅ Fully Supported |
| Ollama | `LangChain.ChatModels.ChatOllamaAI` | ⚠️ May not work |
| Others | Custom implementation | ⚠️ Requires custom adapter |

**Important:** If you're using Ollama or other unsupported models, they may not function correctly with LangChain 0.4.0. Consider:
- Using GettextTranslator 0.4.5 (with LangChain 0.3.3)
- Switching to a supported provider
- Implementing a custom adapter (see below)

## Response Format Requirements

### Synchronous (Non-Streaming) Response

Your LLM endpoint must return a tuple `{:ok, updated_chain}` where the chain contains the response message.

**Required Structure:**

```elixir
{:ok, %LangChain.Chains.LLMChain{
  last_message: %LangChain.Message{
    role: :assistant,
    content: [
      %LangChain.Message.ContentPart{
        type: :text,
        content: "Translated text here"
      }
    ],
    status: :complete,
    index: 0
  },
  messages: [
    # All messages in the conversation
  ]
}}
```

**Critical Requirements:**

1. **Content as List**: `content` MUST be a list of `ContentPart` structs
   ```elixir
   # ✅ CORRECT
   content: [%ContentPart{type: :text, content: "text"}]

   # ❌ WRONG
   content: "text"
   ```

2. **ContentPart Structure**: Each part must have `type` and `content`
   ```elixir
   %LangChain.Message.ContentPart{
     type: :text,        # Required: :text, :image, :tool_call, etc.
     content: "string"   # Required: the actual content
   }
   ```

3. **Message Fields**:
   - `role`: Must be `:assistant` for LLM responses
   - `status`: Should be `:complete` when done
   - `content`: List of ContentPart structs (never a string)

### Streaming Response

For streaming responses, emit deltas via callbacks.

**Delta Structure:**

```elixir
# Each delta emitted
%LangChain.MessageDelta{
  role: :assistant,
  content: [
    %LangChain.Message.ContentPart{
      type: :text,
      content: "Partial text chunk"
    }
  ],
  status: :incomplete  # or :complete for the final delta
}
```

**Streaming Requirements:**

1. The `on_llm_new_delta` callback receives a **list of deltas**:
   ```elixir
   def handle_delta(deltas) when is_list(deltas) do
     # Process list of MessageDelta structs
   end
   ```

2. Merge deltas using `LLMChain.merge_deltas/2`:
   ```elixir
   updated_chain = LLMChain.merge_deltas(current_chain, deltas)
   ```

3. Access merged content via `MessageDelta.merged_content`:
   ```elixir
   text = updated_chain.delta.merged_content
   ```

4. Final delta must have `status: :complete`

### Error Response

On error, return a three-element tuple:

```elixir
{:error, updated_chain, reason}
```

**Components:**
- `updated_chain`: The chain state at the time of error (can be `nil`)
- `reason`: Error description (string, atom, or structured error)

**Example:**

```elixir
{:error, chain, "API rate limit exceeded"}
```

GettextTranslator handles errors gracefully:
- Logs the error with context
- Returns an empty translation `{:ok, ""}`
- Allows the translation process to continue

## Custom Adapter Implementation

### Minimal Adapter Example

```elixir
defmodule MyApp.CustomLLMAdapter do
  @moduledoc """
  Custom LLM adapter for GettextTranslator using LangChain 0.4.0.
  """

  use LangChain.ChatModels.ChatModel

  alias LangChain.Message
  alias LangChain.Message.ContentPart

  defstruct [
    :model,
    :temperature,
    :endpoint,
    :api_key
  ]

  @type t :: %__MODULE__{
    model: String.t(),
    temperature: float(),
    endpoint: String.t(),
    api_key: String.t()
  }

  @doc """
  Creates a new instance of the custom LLM adapter.

  ## Examples

      iex> MyApp.CustomLLMAdapter.new!(%{
      ...>   model: "custom-model-v1",
      ...>   temperature: 0.0,
      ...>   endpoint: "https://api.example.com/v1/chat",
      ...>   api_key: "sk-..."
      ...> })
  """
  @impl true
  def new(attrs \\ %{}) do
    %__MODULE__{
      model: attrs[:model] || "default-model",
      temperature: attrs[:temperature] || 0.7,
      endpoint: attrs[:endpoint] || "https://api.example.com/v1/chat",
      api_key: attrs[:api_key]
    }
    |> validate()
  end

  @impl true
  def new!(attrs \\ %{}) do
    case new(attrs) do
      {:ok, adapter} -> adapter
      {:error, reason} -> raise ArgumentError, reason
    end
  end

  defp validate(adapter) do
    cond do
      is_nil(adapter.api_key) ->
        {:error, "API key is required"}

      is_nil(adapter.endpoint) ->
        {:error, "Endpoint URL is required"}

      true ->
        {:ok, adapter}
    end
  end

  @doc """
  Sends messages to the LLM and returns the response.

  This is the main function called by LangChain to get completions.
  """
  @impl true
  def call(adapter, messages, _functions \\ []) do
    # Build the request payload
    payload = build_payload(adapter, messages)

    # Make the API request
    case make_api_request(adapter, payload) do
      {:ok, response_text} ->
        # Convert to LangChain 0.4.0 format
        message = %Message{
          role: :assistant,
          content: [
            %ContentPart{
              type: :text,
              content: response_text
            }
          ],
          status: :complete
        }

        {:ok, message}

      {:error, reason} ->
        {:error, nil, reason}
    end
  end

  defp build_payload(adapter, messages) do
    %{
      model: adapter.model,
      temperature: adapter.temperature,
      messages: Enum.map(messages, &message_to_api_format/1)
    }
  end

  defp message_to_api_format(%Message{role: role, content: content}) do
    %{
      role: to_string(role),
      content: ContentPart.parts_to_string(content)
    }
  end

  defp make_api_request(adapter, payload) do
    headers = [
      {"Authorization", "Bearer #{adapter.api_key}"},
      {"Content-Type", "application/json"}
    ]

    body = Jason.encode!(payload)

    case HTTPoison.post(adapter.endpoint, body, headers) do
      {:ok, %{status_code: 200, body: response_body}} ->
        case Jason.decode(response_body) do
          {:ok, %{"choices" => [%{"message" => %{"content" => content}} | _]}} ->
            {:ok, content}

          {:error, _} = error ->
            error
        end

      {:ok, %{status_code: status_code, body: body}} ->
        {:error, "API returned status #{status_code}: #{body}"}

      {:error, %HTTPoison.Error{reason: reason}} ->
        {:error, "HTTP request failed: #{inspect(reason)}"}
    end
  end
end
```

### Streaming Adapter Example

```elixir
defmodule MyApp.StreamingLLMAdapter do
  use LangChain.ChatModels.ChatModel

  alias LangChain.Message
  alias LangChain.Message.ContentPart
  alias LangChain.MessageDelta
  alias LangChain.Chains.LLMChain

  # ... struct and new/new! implementations ...

  @impl true
  def call(adapter, messages, _functions \\ []) do
    # For streaming, we need to handle Server-Sent Events (SSE)
    payload = build_payload(adapter, messages, stream: true)

    # Initialize the delta accumulator
    delta_acc = %MessageDelta{
      role: :assistant,
      content: [],
      status: :incomplete
    }

    case stream_api_request(adapter, payload, delta_acc) do
      {:ok, final_delta} ->
        # Convert final delta to message
        message = MessageDelta.to_message(final_delta)
        {:ok, message}

      {:error, reason} ->
        {:error, nil, reason}
    end
  end

  defp stream_api_request(adapter, payload, delta_acc) do
    headers = [
      {"Authorization", "Bearer #{adapter.api_key}"},
      {"Content-Type", "application/json"},
      {"Accept", "text/event-stream"}
    ]

    body = Jason.encode!(payload)

    # Use streaming HTTP client
    case HTTPoison.post(adapter.endpoint, body, headers, stream_to: self(), async: :once) do
      {:ok, %HTTPoison.AsyncResponse{id: ref}} ->
        receive_stream(ref, delta_acc)

      {:error, reason} ->
        {:error, reason}
    end
  end

  defp receive_stream(ref, delta_acc) do
    receive do
      %HTTPoison.AsyncChunk{id: ^ref, chunk: chunk} ->
        # Parse SSE chunk
        case parse_sse_chunk(chunk) do
          {:ok, content_chunk} ->
            # Create a delta for this chunk
            new_delta = %MessageDelta{
              role: :assistant,
              content: [%ContentPart{type: :text, content: content_chunk}],
              status: :incomplete
            }

            # Merge with accumulator
            updated_acc = merge_message_deltas(delta_acc, new_delta)

            # Request next chunk
            HTTPoison.stream_next(%HTTPoison.AsyncResponse{id: ref})
            receive_stream(ref, updated_acc)

          {:done} ->
            # Stream complete
            {:ok, %{delta_acc | status: :complete}}

          {:error, reason} ->
            {:error, reason}
        end

      %HTTPoison.AsyncEnd{id: ^ref} ->
        {:ok, %{delta_acc | status: :complete}}

      %HTTPoison.Error{id: ^ref, reason: reason} ->
        {:error, reason}
    after
      30_000 ->
        {:error, "Stream timeout"}
    end
  end

  defp parse_sse_chunk(chunk) do
    # Parse Server-Sent Events format
    # Example: "data: {\"choices\":[{\"delta\":{\"content\":\"text\"}}]}\n\n"
    case String.trim(chunk) do
      "data: [DONE]" ->
        {:done}

      "data: " <> json_data ->
        case Jason.decode(json_data) do
          {:ok, %{"choices" => [%{"delta" => %{"content" => content}} | _]}} ->
            {:ok, content}

          _ ->
            {:ok, ""}
        end

      _ ->
        {:ok, ""}
    end
  end

  defp merge_message_deltas(acc, new_delta) do
    # Merge content lists
    merged_content = acc.content ++ new_delta.content
    %{acc | content: merged_content}
  end
end
```

## Configuration

### Basic Configuration

```elixir
# config/config.exs
config :gettext_translator, GettextTranslator,
  endpoint: MyApp.CustomLLMAdapter,
  endpoint_model: "custom-model-v1",
  endpoint_temperature: 0,
  endpoint_config: %{
    "api_key" => System.get_env("CUSTOM_LLM_API_KEY"),
    "endpoint" => "https://api.example.com/v1/chat"
  },
  persona: "You are a professional translator. Translate accurately while preserving meaning and length.",
  style: "Casual, using simple language",
  ignored_languages: ["en"]
```

### Runtime Configuration

The `endpoint_config` map is dynamically applied at runtime. GettextTranslator converts config keys to LangChain application environment variables:

```elixir
# This config map:
endpoint_config: %{
  "api_key" => "sk-...",
  "custom_setting" => "value"
}

# Becomes:
Application.put_env(:langchain, :api_key, "sk-...")
Application.put_env(:langchain, :custom_setting, "value")
```

This allows you to configure any LangChain-compatible adapter without code changes.

### Environment-Specific Configuration

```elixir
# config/dev.exs
config :gettext_translator, GettextTranslator,
  endpoint: LangChain.ChatModels.ChatOllamaAI,
  endpoint_model: "llama3.2:latest",
  endpoint_temperature: 0,
  endpoint_config: %{},  # Local Ollama, no config needed
  persona: "You are a professional translator.",
  style: "Casual",
  ignored_languages: ["en"]

# config/prod.exs
config :gettext_translator, GettextTranslator,
  endpoint: LangChain.ChatModels.ChatOpenAI,
  endpoint_model: "gpt-4",
  endpoint_temperature: 0,
  endpoint_config: %{
    "openai_key" => System.get_env("OPENAI_API_KEY")
  },
  persona: "You are a professional translator.",
  style: "Casual",
  ignored_languages: ["en"]
```

## Testing Your Custom Endpoint

### 1. Unit Test Your Adapter

```elixir
# test/my_app/custom_llm_adapter_test.exs
defmodule MyApp.CustomLLMAdapterTest do
  use ExUnit.Case

  alias MyApp.CustomLLMAdapter
  alias LangChain.Message
  alias LangChain.Message.ContentPart

  test "new/1 creates adapter with valid config" do
    {:ok, adapter} = CustomLLMAdapter.new(%{
      model: "test-model",
      api_key: "test-key",
      endpoint: "https://test.com"
    })

    assert adapter.model == "test-model"
    assert adapter.api_key == "test-key"
  end

  test "new/1 validates required fields" do
    assert {:error, _} = CustomLLMAdapter.new(%{model: "test"})
  end

  test "call/2 returns properly formatted message" do
    adapter = CustomLLMAdapter.new!(%{
      model: "test-model",
      api_key: "test-key",
      endpoint: "https://test.com"
    })

    messages = [
      Message.new_user!("Translate 'hello' to Spanish")
    ]

    # Mock the API response
    # ... your mocking logic ...

    assert {:ok, response} = CustomLLMAdapter.call(adapter, messages)
    assert %Message{} = response
    assert response.role == :assistant
    assert is_list(response.content)
    assert [%ContentPart{type: :text, content: text}] = response.content
    assert is_binary(text)
  end
end
```

### 2. Integration Test with GettextTranslator

```elixir
# test/integration/translation_test.exs
defmodule GettextTranslatorIntegrationTest do
  use ExUnit.Case

  test "translates with custom adapter" do
    provider = %{
      ignored_languages: ["en"],
      persona: "Professional translator",
      style: "Casual",
      endpoint: %{
        config: %{
          "api_key" => "test-key"
        },
        adapter: MyApp.CustomLLMAdapter,
        model: "test-model",
        temperature: 0
      }
    }

    opts = %{
      language_code: "es",
      message: "Hello, world!"
    }

    assert {:ok, translation} = GettextTranslator.Processor.LLM.translate(provider, opts)
    assert is_binary(translation)
    assert translation != ""
  end
end
```

### 3. Manual Testing

```bash
# Run translation with your custom adapter
mix gettext_translator.run

# Check logs for errors
tail -f log/dev.log | grep -i "error\|translat"
```

### 4. Verify Response Format

Add logging to verify your adapter returns the correct format:

```elixir
def call(adapter, messages, _functions) do
  case make_api_request(adapter, messages) do
    {:ok, response_text} ->
      message = %Message{
        role: :assistant,
        content: [%ContentPart{type: :text, content: response_text}],
        status: :complete
      }

      # Debug logging
      require Logger
      Logger.debug("Adapter response: #{inspect(message)}")

      {:ok, message}

    {:error, reason} ->
      Logger.error("Adapter error: #{inspect(reason)}")
      {:error, nil, reason}
  end
end
```

## Common Issues

### Issue 1: "content is not a list"

**Error:**
```
** (FunctionClauseError) no function clause matching in ContentPart.parts_to_string/1
```

**Cause:** Your adapter is returning `content` as a string instead of a list of ContentPart structs.

**Solution:**
```elixir
# ❌ WRONG
content: "translated text"

# ✅ CORRECT
content: [%ContentPart{type: :text, content: "translated text"}]
```

### Issue 2: "undefined function ContentPart.new/1"

**Error:**
```
** (UndefinedFunctionError) function LangChain.Message.ContentPart.new/1 is undefined
```

**Cause:** Trying to use `ContentPart.new/1` which doesn't exist.

**Solution:** Use struct syntax instead:
```elixir
# ❌ WRONG
ContentPart.new(%{type: :text, content: "text"})

# ✅ CORRECT
%ContentPart{type: :text, content: "text"}
```

### Issue 3: "pattern match failed on {:ok, result}"

**Error:**
```
** (MatchError) no match of right hand side value: {:ok, %Message{...}}
```

**Cause:** GettextTranslator expects `{:ok, %{last_message: %Message{}}}` but your adapter returns `{:ok, %Message{}}`.

**Solution:** Make sure your LLMChain implementation wraps the message properly:
```elixir
# Your adapter's call/2 should return:
{:ok, %Message{...}}

# LLMChain will wrap it as:
{:ok, %LLMChain{last_message: %Message{...}}}
```

### Issue 4: Empty translations returned

**Symptoms:** Translations complete but return empty strings.

**Possible Causes:**
1. API errors being silently caught
2. Response parsing errors
3. Content extraction failing

**Debug Steps:**
```elixir
# Add detailed logging in your adapter
def call(adapter, messages, _functions) do
  Logger.debug("Sending messages: #{inspect(messages)}")

  case make_api_request(adapter, payload) do
    {:ok, response_text} ->
      Logger.debug("Received response: #{inspect(response_text)}")
      # ... rest of code

    {:error, reason} ->
      Logger.error("API error: #{inspect(reason)}")
      {:error, nil, reason}
  end
end
```

### Issue 5: Streaming not working

**Symptoms:** Streaming responses timeout or fail.

**Checklist:**
- [ ] Endpoint supports Server-Sent Events (SSE)
- [ ] `Accept: text/event-stream` header is set
- [ ] SSE parsing handles `data:` prefix correctly
- [ ] Stream timeout is sufficient (30+ seconds)
- [ ] Deltas are properly merged
- [ ] Final delta has `status: :complete`

## Examples

### Example 1: OpenAI-Compatible Endpoint

Many providers offer OpenAI-compatible APIs. You can use them with the built-in adapter:

```elixir
config :gettext_translator, GettextTranslator,
  endpoint: LangChain.ChatModels.ChatOpenAI,
  endpoint_model: "your-model-name",
  endpoint_temperature: 0,
  endpoint_config: %{
    "openai_key" => System.get_env("API_KEY"),
    "openai_endpoint" => "https://your-provider.com/v1/chat/completions"
  },
  persona: "Professional translator",
  style: "Casual",
  ignored_languages: ["en"]
```

### Example 2: Azure OpenAI

```elixir
config :gettext_translator, GettextTranslator,
  endpoint: LangChain.ChatModels.ChatOpenAI,
  endpoint_model: "gpt-4",
  endpoint_temperature: 0,
  endpoint_config: %{
    "openai_key" => System.get_env("AZURE_OPENAI_KEY"),
    "openai_endpoint" => "https://your-resource.openai.azure.com/openai/deployments/your-deployment/chat/completions?api-version=2023-05-15"
  },
  persona: "Professional translator",
  style: "Casual",
  ignored_languages: ["en"]
```

### Example 3: Local LLM with Ollama

```elixir
# Note: Ollama support may be limited in LangChain 0.4.0
config :gettext_translator, GettextTranslator,
  endpoint: LangChain.ChatModels.ChatOllamaAI,
  endpoint_model: "llama3.2:latest",
  endpoint_temperature: 0,
  endpoint_config: %{
    # Empty if using default local endpoint
  },
  persona: "Professional translator",
  style: "Casual",
  ignored_languages: ["en"]
```

### Example 4: Multiple Providers (Environment-Based)

```elixir
# config/config.exs
config :gettext_translator, GettextTranslator,
  endpoint: System.get_env("LLM_PROVIDER", "ollama") |> provider_module(),
  endpoint_model: System.get_env("LLM_MODEL", "llama3.2:latest"),
  endpoint_temperature: 0,
  endpoint_config: provider_config(),
  persona: "Professional translator",
  style: "Casual",
  ignored_languages: ["en"]

defp provider_module("openai"), do: LangChain.ChatModels.ChatOpenAI
defp provider_module("anthropic"), do: LangChain.ChatModels.ChatAnthropic
defp provider_module("gemini"), do: LangChain.ChatModels.ChatGoogleAI
defp provider_module("ollama"), do: LangChain.ChatModels.ChatOllamaAI
defp provider_module(_), do: LangChain.ChatModels.ChatOpenAI

defp provider_config do
  case System.get_env("LLM_PROVIDER", "ollama") do
    "openai" ->
      %{"openai_key" => System.get_env("OPENAI_API_KEY")}

    "anthropic" ->
      %{"anthropic_key" => System.get_env("ANTHROPIC_API_KEY")}

    "gemini" ->
      %{"google_ai_key" => System.get_env("GOOGLE_AI_KEY")}

    _ ->
      %{}
  end
end
```

## Building Your Own LLM Gateway

If you're running LLMs locally (e.g., in Docker) and want to provide an HTTP API that works with LangChain:

**📘 See [LLM_GATEWAY_EXAMPLE.md](LLM_GATEWAY_EXAMPLE.md)** for a complete, production-ready Elixir implementation that includes:

- OpenAI-compatible HTTP API endpoint
- Queue system with GenStage for backpressure control
- Support for multiple LLM backends (Ollama, vLLM, TGI)
- API key authentication and rate limiting
- Both streaming (SSE) and synchronous responses
- Complete Docker deployment setup

This gateway sits between LangChain clients and your local LLM, handling queuing, authentication, and protocol translation.

## Additional Resources

- [LangChain Elixir Documentation](https://hexdocs.pm/langchain/)
- [LangChain 0.4.0 Changelog](https://hexdocs.pm/langchain/changelog.html)
- [GettextTranslator Repository](https://github.com/marmend-company/gettext_translator)
- [OpenAI API Documentation](https://platform.openai.com/docs/api-reference)
- [Anthropic API Documentation](https://docs.anthropic.com/claude/reference/)
- [Ollama API Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md)
- [vLLM Documentation](https://docs.vllm.ai/)
- [Text Generation Inference](https://github.com/huggingface/text-generation-inference)

## Support

If you encounter issues with custom endpoints:

1. Check this guide for common issues
2. Enable debug logging in your adapter
3. Verify response format matches LangChain 0.4.0 requirements
4. Open an issue on [GitHub](https://github.com/marmend-company/gettext_translator/issues) with:
   - Your adapter code (sanitized)
   - Error messages and stack traces
   - LangChain version
   - Example request/response payloads