README.md

# AcceptLanguage

[![CI](https://github.com/cyril/accept_language.ex/actions/workflows/elixir.yml/badge.svg?branch=main)](https://github.com/cyril/accept_language.ex/actions)
[![Hex Version](https://img.shields.io/hexpm/v/accept_language.svg)](https://hex.pm/packages/accept_language)
[![Hex Docs](https://img.shields.io/badge/hex-docs-lightgreen.svg)](https://hexdocs.pm/accept_language/)
[![Elixir](https://img.shields.io/badge/elixir-~>_1.14-blueviolet.svg)](https://elixir-lang.org/)
[![License](https://img.shields.io/hexpm/l/accept_language.svg)](https://github.com/cyril/accept_language.ex/blob/main/LICENSE)

A lightweight, zero-dependency Elixir library for parsing the `Accept-Language` HTTP header field.

This implementation conforms to:

- [RFC 7231 Section 5.3.5](https://www.rfc-editor.org/rfc/rfc7231#section-5.3.5) — Accept-Language header field definition
- [RFC 7231 Section 5.3.1](https://www.rfc-editor.org/rfc/rfc7231#section-5.3.1) — Quality values syntax
- [RFC 4647 Section 3.3.1](https://www.rfc-editor.org/rfc/rfc4647#section-3.3.1) — Basic Filtering matching scheme
- [BCP 47](https://www.rfc-editor.org/info/bcp47) — Tags for Identifying Languages

> **Note**
> RFC 7231 obsoletes [RFC 2616](https://www.rfc-editor.org/rfc/rfc2616) (the original HTTP/1.1 specification). The `Accept-Language` header behavior defined in RFC 2616 Section 14.4 remains unchanged in RFC 7231, ensuring full backward compatibility.

## Installation

Add `accept_language` to your list of dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:accept_language, "~> 0.1.0"}
  ]
end
```

## Usage

```elixir
AcceptLanguage.negotiate("da, en-GB;q=0.8, en;q=0.7", [:en, :da])
# => :da
```

## Behavior

### Quality values

Quality values (q-values) indicate relative preference, ranging from `0` (not acceptable) to `1` (most preferred). When omitted, the default is `1`.

Per RFC 7231 Section 5.3.1, valid q-values have at most three decimal places: `0`, `0.7`, `0.85`, `1.000`. Invalid q-values cause the associated language range to be ignored.

```elixir
AcceptLanguage.negotiate("da, en-GB;q=0.8, en;q=0.7", [:en, :da])
# => :da       (q=1 beats q=0.8)

AcceptLanguage.negotiate("da, en-GB;q=0.8, en;q=0.7", [:en, :"en-GB"])
# => :"en-GB"  (q=0.8 beats q=0.7)

AcceptLanguage.negotiate("da, en-GB;q=0.8, en;q=0.7", [:ja])
# => nil       (no match)
```

### Declaration order

When multiple languages share the same q-value, declaration order in the header determines priority—the first declared language wins:

```elixir
AcceptLanguage.negotiate("en;q=0.8, fr;q=0.8", [:en, :fr])
# => :en  (declared first)

AcceptLanguage.negotiate("fr;q=0.8, en;q=0.8", [:en, :fr])
# => :fr  (declared first)
```

### Basic Filtering

This library implements the Basic Filtering matching scheme defined in RFC 4647 Section 3.3.1. A language range matches a language tag if, in a case-insensitive comparison, it exactly equals the tag, or if it exactly equals a prefix of the tag such that the first character following the prefix is `-`.

```elixir
AcceptLanguage.negotiate("de-de", [:"de-DE-1996"])
# => :"de-DE-1996"  (prefix match)

AcceptLanguage.negotiate("de-de", [:"de-Deva"])
# => nil  ("de-de" is not a prefix of "de-Deva")

AcceptLanguage.negotiate("de-de", [:"de-Latn-DE"])
# => nil  ("de-de" is not a prefix of "de-Latn-DE")
```

Prefix matching respects hyphen boundaries:

```elixir
AcceptLanguage.negotiate("zh", [:"zh-TW"])
# => :"zh-TW"  ("zh" matches "zh-TW")

AcceptLanguage.negotiate("zh", [:zhx])
# => nil  ("zh" does not match "zhx" — different language code)

AcceptLanguage.negotiate("zh-TW", [:zh])
# => nil  (more specific range does not match less specific tag)
```

### Wildcards

The wildcard `*` matches any language not matched by another range in the header. This behavior is specific to HTTP, as noted in RFC 4647 Section 3.3.1.

```elixir
AcceptLanguage.negotiate("de, *;q=0.5", [:ja])
# => :ja  (matched by wildcard)

AcceptLanguage.negotiate("de, *;q=0.5", [:de, :ja])
# => :de  (explicit match takes precedence)
```

### Exclusions

A q-value of `0` explicitly marks a language as not acceptable:

```elixir
AcceptLanguage.negotiate("*, en;q=0", [:en])
# => nil  (English explicitly excluded)

AcceptLanguage.negotiate("*, en;q=0", [:ja])
# => :ja  (Japanese matched by wildcard)
```

Exclusions apply via prefix matching:

```elixir
AcceptLanguage.negotiate("*, en;q=0", [:"en-GB"])
# => nil  (en-GB excluded via "en" prefix)
```

### Case insensitivity

Matching is case-insensitive per RFC 4647 Section 2, but the original case of available language tags is preserved in the return value:

```elixir
AcceptLanguage.negotiate("EN-GB", [:"en-gb"])
# => :"en-gb"

AcceptLanguage.negotiate("en-gb", [:"EN-GB"])
# => :"EN-GB"
```

### Defensive limits

To prevent denial-of-service via adversarial headers, the parser enforces two limits:

- **Field size**: headers exceeding 4096 bytes are treated as absent (returns `nil`)
- **Range count**: at most 50 language ranges are processed; any beyond this are silently discarded

These thresholds are well above real-world usage (browsers typically send 2–10 ranges in under 200 bytes) and should not affect legitimate traffic.

### BCP 47 language tags

Full support for BCP 47 language tags including script subtags, region subtags, and variant subtags:

```elixir
# Script subtags
AcceptLanguage.negotiate("zh-Hant", [:"zh-Hant-TW", :"zh-Hans-CN"])
# => :"zh-Hant-TW"

# Variant subtags
AcceptLanguage.negotiate("de-1996, de;q=0.9", [:"de-CH-1996", :"de-CH"])
# => :"de-CH-1996"
```

## Integration examples

### Plug

```elixir
defmodule MyApp.Plug.Locale do
  @behaviour Plug

  @available_locales [:en, :fr, :de]
  @default_locale :en

  @impl true
  def init(opts), do: opts

  @impl true
  def call(conn, _opts) do
    locale =
      conn
      |> Plug.Conn.get_req_header("accept-language")
      |> List.first()
      |> AcceptLanguage.negotiate(@available_locales)
      |> Kernel.||(@default_locale)

    Plug.Conn.assign(conn, :locale, locale)
  end
end
```

### Phoenix

```elixir
defmodule MyAppWeb.SetLocalePlug do
  import Plug.Conn

  @available_locales Gettext.known_locales(MyAppWeb.Gettext) |> Enum.map(&String.to_atom/1)
  @default_locale Gettext.get_locale(MyAppWeb.Gettext) |> String.to_atom()

  def init(opts), do: opts

  def call(conn, _opts) do
    locale =
      conn
      |> get_req_header("accept-language")
      |> List.first()
      |> AcceptLanguage.negotiate(@available_locales)
      |> Kernel.||(@default_locale)

    Gettext.put_locale(MyAppWeb.Gettext, Atom.to_string(locale))
    assign(conn, :locale, locale)
  end
end
```

## Standards compliance

### Supported specifications

| Specification | Description | Status |
|---------------|-------------|--------|
| RFC 7231 §5.3.5 | Accept-Language header field | ✅ Supported |
| RFC 7231 §5.3.1 | Quality values (qvalues) | ✅ Supported |
| RFC 4647 §2.1 | Basic Language Range syntax | ✅ Supported |
| RFC 4647 §3.3.1 | Basic Filtering scheme | ✅ Supported |
| RFC 7230 §3.2.3 | OWS (optional whitespace) handling | ✅ Supported |
| BCP 47 | Language tag structure | ✅ Supported |

### Not implemented

| Specification | Description | Reason |
|---------------|-------------|--------|
| RFC 4647 §2.2 | Extended Language Range | Not used by HTTP |
| RFC 4647 §3.3.2 | Extended Filtering | Not used by HTTP |
| RFC 4647 §3.4 | Lookup scheme | Design choice — Basic Filtering is appropriate for HTTP content negotiation |

## Documentation

- [API documentation on HexDocs](https://hexdocs.pm/accept_language)
- [RFC 7231 — HTTP/1.1 Semantics and Content](https://www.rfc-editor.org/rfc/rfc7231)
- [RFC 4647 — Matching of Language Tags](https://www.rfc-editor.org/rfc/rfc4647)
- [BCP 47 — Tags for Identifying Languages](https://www.rfc-editor.org/info/bcp47)

## See also

- [accept_language.rb](https://github.com/cyril/accept_language.rb) — Ruby equivalent of this library

## Versioning

This library follows [Semantic Versioning 2.0](https://semver.org/).

## License

Available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).