README.md

# KubernetesProbes

Handles Kubernetes liveness and readiness probes for Elixir/Phoenix applications using OTP's native shutdown sequence.

Replaces the deprecated [`traffic_drain_plug`](https://gitlab.com/pandosearch/traffic_drain_plug) library.

## How it works

The library has two components that work together:

**`KubernetesProbes.Plug`** is added as the first plug in your Phoenix endpoint. It intercepts requests to the liveness and readiness probe paths and responds immediately, before any other plugs run. The liveness probe returns 200 as long as the BEAM is up. The readiness probe returns 200 when the app is ready to serve traffic, and 503 during startup or while draining.

**`KubernetesProbes.Drainer`** is a `GenServer` added as the **last child** of your application supervisor. On `SIGTERM`, OTP terminates children in reverse order, so the Drainer terminates first. Its `terminate/2` callback immediately flips the readiness probe to 503 via `:persistent_term` and sleeps for the configured drain window. This gives Kubernetes time to stop routing new traffic before the Endpoint, Repo, and other resources are torn down.

## Installation

```elixir
# mix.exs
{:kubernetes_probes, "~> 0.1"}
```

## Usage

**1. Add the Drainer as the last child in your application supervisor:**

```elixir
# lib/my_app/application.ex
children = [
  MyApp.Repo,
  MyAppWeb.Endpoint,
  # Must be last — terminates first on shutdown
  {KubernetesProbes.Drainer, wait: 20_000}
]
```

**2. Add the Plug as the first plug in your endpoint:**

```elixir
# lib/my_app_web/endpoint.ex
plug KubernetesProbes.Plug

# With a custom readiness check (e.g. database connectivity):
plug KubernetesProbes.Plug, ready?: &MyApp.repos_ready?/0

# With custom probe paths:
plug KubernetesProbes.Plug, liveness_path: "/healthz", readiness_path: "/readyz"
```

## Probe endpoints

The default paths are `/probe/liveness` and `/probe/readiness`. Both can be overridden via the `:liveness_path` and `:readiness_path` plug options.

| Path | Method | Description |
|------|--------|-------------|
| `/probe/liveness` | GET | Returns 200 while the BEAM is running |
| `/probe/readiness` | GET | Returns 200 when the drainer is `:running` and `ready?` returns `true`; 503 while draining or not ready |

## Configuration

### Drainer options

| Option | Default | Description |
|--------|---------|-------------|
| `:wait` | `20_000` | Drain window in milliseconds |

### Plug options

| Option | Default | Description |
|--------|---------|-------------|
| `:ready?` | `fn -> true end` | Zero-arity function returning a boolean. Called on each readiness request while the drainer is `:running` |
| `:liveness_path` | `"/probe/liveness"` | Path for the liveness probe |
| `:readiness_path` | `"/probe/readiness"` | Path for the readiness probe |

### Shorten the drain window in dev and test

```elixir
# config/dev.exs — avoid 20s hang on Ctrl-C
config :my_app, KubernetesProbes.Drainer, wait: 100

# config/test.exs — avoid slow suite teardown
config :my_app, KubernetesProbes.Drainer, wait: 10
```

Pass the configured value when adding the child:

```elixir
{KubernetesProbes.Drainer, wait: Application.compile_env(:my_app, [KubernetesProbes.Drainer, :wait], 20_000)}
```

## Kubernetes deployment

Set `terminationGracePeriodSeconds` to at least the drain window plus a few seconds for the rest of the shutdown sequence. With the default 20 s drain window, 30 s is a safe value.

### Deployment

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - name: my-app
          ports:
            - containerPort: 4000
          livenessProbe:
            httpGet:
              path: /probe/liveness
              port: 4000
            initialDelaySeconds: 30
            periodSeconds: 30
            timeoutSeconds: 5
          readinessProbe:
            httpGet:
              path: /probe/readiness
              port: 4000
            initialDelaySeconds: 10
            periodSeconds: 2
            successThreshold: 1
```

### StatefulSet

```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: my-app
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - name: my-app
          ports:
            - containerPort: 4000
          livenessProbe:
            httpGet:
              path: /probe/liveness
              port: 4000
            initialDelaySeconds: 30
            periodSeconds: 30
            timeoutSeconds: 5
          readinessProbe:
            httpGet:
              path: /probe/readiness
              port: 4000
            initialDelaySeconds: 10
            periodSeconds: 2
            successThreshold: 1
```