lib/existence.ex

defmodule Existence do
  @moduledoc """
  Health-checks management and access module.

  Module provides functions for accessing an overall health-check state and individual dependencies
  checks results.
  Module is also used to start an `Existence` process as a part of an application supervision tree.

  `Existence` works by asynchronously spawning user defined dependencies checks functions.
  Individual dependencies checks functions results are evaluated to establish an overall
  health-check state.
  Overall health-check state is healthy only when all user defined dependencies checks are healthy.
  It is assumed that healthy state is represented by an `:ok` atom for both dependencies checks and
  for the overall health-check.
  Any other result in dependencies checks is associated with an unhealthy dependency check state.
  Overall health-check unhealthy state is represented by an `:error` atom.

  User defined dependencies checks functions are spawned as monitored isolated processes.
  If user dependency check function raises, throws an error, timeouts or fails in any other way it
  doesn't have a negative impact on other processes and it is gracefully handled by the library.

  Current dependencies checks functions results and current overall health-check state are stored
  in an ETS table.
  Whenever user executes any of available state getters, request is made against ETS table which
  has `:read_concurrency` set to `true`.
  In practice it means that library can handle unlimited numbers of requests per second
  without blocking any other processes, adding latency to the response or overloading dependency
  with synchronous  requests.

  Module provides following functions to access checks states:
  * `get_state/1` and `get_state!/1` to get overall health-check state,
  * `get_checks/1` and `get_checks!/1` to get dependencies checks states.

  Functions with bangs are negligibly cheaper computationally because they don't check if ETS table
  storing given Existence instance state exists and they will raise if such table doesn't exist.

  ## Usage
  After defining dependencies checks options, `Existence` can be started using
  your application supervisor:
  ```elixir
  # lib/my_app/application.ex
  def start(_type, _args) do
    health_checks = [
      # minimal dependency check configuration:
      check_1: %{
        mfa: {MyApp.Checks, :check_1, []}
      },
      # complete dependency check configuration:
      check_2: %{
        mfa: {MyApp.Checks, :check_2, []},
        initial_delay: 1_000,
        interval: 30_000,
        state: :ok,
        timeout: 1_000
      }
    ]

    children = [
      {Existence, checks: health_checks, state: :ok}
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
  ```
  When `Existence` is started it has assigned an initial overall health-check state, which
  by default is equal to an `:error` atom, meaning an unhealthy state.
  Initial overall health-check state can be changed with a `:state` key. In a code example above
  initial overall health-check state is set to a healthy state with: `state: :ok`.

   Multiple `Existence` instances can be started by using common Elixir child identifiers
  `:id` and `:name`, for example:
  ```elixir
  children = [
    {Existence, checks: readiness_checks,  id: ExistenceReadiness, name: ReadinessCheck},
    {Existence, checks: liveness_checks, name: {:local, LivenessCheck}}
  ]
  ```

  ## Configuration
  `Existence` options:
  * `:id` - term used to identify the child specification internally. Please refer to the
    `Supervisor` "Child specification" documentation section for details on child `:id` key.
    Default: `Existence`.
  * `:name` - name used to start `Existence` process locally. If defined as an `atom()`
    `:gen_statem.start_link/3` is used to start `Existence` process without registration.
    If defined as a `{:local, atom()}` tuple, `:gen_statem.start_link/4` is invoked and process is
    registered locally with a given name.
    Key value is used to select `Existence` instance when running any of the state getters,
    for example: `get_state(CustomName)`. Default: `Existence`.
  * `:checks` - keyword list with user defined dependencies checks parameters, see description
    below for details. Default: `[]`.
  * `:state` - initial overall `Existence` instance health-check state. Default: `:error`.
  * `:on_state_change` - MFA tuple pointing at user function which will be synchronously applied
    on the overall health-check state change.
    User function should be of two arity. As a first argument it will receive current state as
    `:ok | :error` atom. As a second argument function will receive static arg given in the MFA tuple.
    Default: `nil`.

  Dependencies checks are defined using a keyword list with configuration parameters defined
  as a maps, see code example above.

  Dependencies checks configuration options:
  * `:mfa` - `{module, function, arguments}` tuple specifying user defined function to spawn when
    executing given dependency check. Please refer to `Kernel.spawn_monitor/3` documentation for
    the MFA pattern explanation. Function will be spawned with arguments given in the `:mfa` key.
    Required.
  * `:initial_delay` - amount of time in milliseconds to wait before spawning a dependency check
    for the first time. Can be used to wait for a dependency process to properly initialize before
    executing dependency check function first time when application is started. Default: `100`.
  * `:interval` - time interval in milliseconds specifying how frequently given check should be
    executed and dependency checked. Default: `30_000`.
  * `:state` - initial dependency check state when starting `Existence`. Default: `:error`.
  * `:timeout` - after spawning dependency check function we will wait `:timeout` amount of
    milliseconds for the dependency check function to complete.
    If dependency check function will do not complete within a given timeout, dependency check
    function process will be killed, and dependency check state will assume a `:killed` value.
    Default: `5_000`.

  ## Dependencies checks
  User defined dependencies checks functions must return an `:ok` atom within given `:timeout`
  interval to acquire a healthy state.
  Any other values returned by dependencies checks functions are considered as an unhealthy state.

  Example health-checks callback functions for two popular dependencies, PostgreSQL and Redis:
  ```elixir
  #lib/checks.ex
  def check_postgres() do
    "SELECT 1;"
    |> MyApp.Repo.query()
    |> case do
      {:ok, %Postgrex.Result{num_rows: 1, rows: [[1]]}} -> :ok
      _ -> :error
    end
  end

  def check_redis() do
    case MyApp.Redix.command(["PING"]) do
      {:ok, "PONG"} -> :ok
      _ -> :error
    end
  end
  ```
  Please notice that dependencies checks functions in the code example above are not wrapped in a
  `try/1` blocks.
  Dependencies checks functions are spawned as monitored processes.
  Whenever check function will raise, parent health-check process will be notified with an `:info`
  `:DOWN` message and dependency check status will be assigned a tuple containing an exception and
  a stack trace, for example:
  ```elixir
  # def check_1(), do: raise("CustomError")
  iex> Existence.get_checks()
  [
    check_1: {%RuntimeError{message: "CustomError"}, [ # ... stack trace ]}
  ]
  iex> Existence.get_state()
  :error
  ```
  """

  @behaviour :gen_statem

  @enforce_keys [:mfa]
  defstruct [
    :mfa,
    initial_delay: 100,
    interval: 30_000,
    state: :error,
    timeout: 5_000,
    spawn_proc: {nil, nil}
  ]

  @doc """
  Get dependencies checks states for `name` instance.

  Function gets current dependencies checks states for an instance started with a given `name`,
  by default `Existence`.

  Dependencies checks functions results are returned as a keyword list.
  If no checks were defined function will return an empty list.
  Function returns `:undefined` if `name` instance doesn't exist.

  Dependency check function result equal to an `:ok` atom means healthy state, any other term is
  associated with an unhealthy state.

  ##### Example:
  ```elixir
  iex> Existence.get_checks()
  [check_1: :ok, check_2: :ok]
  ```

  ```
  iex> Existence.get_checks(NotExisting)
  :undefined
  ```

  """
  @spec get_checks(name :: atom()) :: [] | [key: :ok | any()] | :undefined
  def get_checks(name \\ __MODULE__) do
    tbl = ets_table_name(name)

    case :ets.whereis(tbl) do
      :undefined -> :undefined
      _ -> :ets.select(tbl, [{{{:check_state, :"$1"}, :"$2"}, [], [{{:"$1", :"$2"}}]}])
    end
  end

  @doc """
  Same as `get_checks/1` but raises if `name` instance doesn't exist.

  Function will raise with an `ArgumentError` exception if instance `name` doesn't exist.

  ##### Example:
  ```elixir
  iex> Existence.get_checks!()
  [check_1: :ok, check_2: :ok]
  ```

  ```elixir
  iex> Existence.get_checks!(NotExisting)
  ** (ArgumentError) errors were found at the given arguments:
  ```
  """
  @spec get_checks!(name :: atom()) :: [] | [key: :ok | any()]
  def get_checks!(name \\ __MODULE__) do
    name
    |> ets_table_name()
    |> :ets.select([{{{:check_state, :"$1"}, :"$2"}, [], [{{:"$1", :"$2"}}]}])
  end

  @doc """
  Get an overall health-check state for `name` instance.

  Function gets current overall health-check state for an instance started with a given `name`,
  by default `Existence`.

  Function returns an `:ok` when overall health-check state is healthy and an `:error` when state
  is unhealthy.
  Function returns `:undefined` if `name` instance doesn't exist.

  Overall health-check state is healthy only when all dependencies health checks are healthy.

  ##### Example:
  ```elixir
  iex> Existence.get_state()
  :ok
  ```

  ```elixir
  iex> Existence.get_state(NotExisting)
  :undefined
  ```
  """
  @spec get_state(name :: atom()) :: :ok | :error | :undefined
  def get_state(name \\ __MODULE__) do
    with tbl <- ets_table_name(name),
         tid when tid not in [:undefined] <- :ets.whereis(tbl),
         [{:state, state}] <- :ets.lookup(tbl, :state) do
      state
    else
      _ -> :undefined
    end
  end

  @doc """
  Same as `get_state/1` but raises if `name` instance doesn't exist.

  Function will raise with an `ArgumentError` exception if instance `name` doesn't exist.

  ##### Example:
  ```elixir
  iex> Existence.get_state!()
  :ok
  ```

  ```elixir
  iex> Existence.get_state!(NotExisting)
  ** (ArgumentError) errors were found at the given arguments:
  ```
  """
  @spec get_state!(name :: atom()) :: :ok | :error
  def get_state!(name \\ __MODULE__) do
    [{:state, state}] =
      name
      |> ets_table_name()
      |> :ets.lookup(:state)

    state
  end

  @doc false
  def child_spec(init_arg) do
    {id, init_arg} = Keyword.pop(init_arg, :id, __MODULE__)
    Supervisor.child_spec(%{id: id, start: {__MODULE__, :start_link, [init_arg]}}, [])
  end

  @doc false
  def start_link(init_arg) do
    case Keyword.pop(init_arg, :name, __MODULE__) do
      {name, init_arg} when is_atom(name) ->
        init_arg = Keyword.put(init_arg, :ets_name, ets_table_name(name))
        :gen_statem.start_link(__MODULE__, init_arg, [])

      {{:local, name}, init_arg} when is_atom(name) ->
        init_arg = Keyword.put(init_arg, :ets_name, ets_table_name(name))
        :gen_statem.start_link({:local, name}, __MODULE__, init_arg, [])
    end
  end

  @impl true
  def init(args) do
    ets_tab = Keyword.fetch!(args, :ets_name)

    :ets.new(ets_tab, [
      :set,
      :named_table,
      :public,
      read_concurrency: true,
      write_concurrency: false
    ])

    checks =
      args
      |> Keyword.get(:checks, [])
      |> Enum.map(fn {check_id, params} -> {check_id, struct!(__MODULE__, params)} end)

    Enum.each(checks, fn {check_id, params} ->
      set_check_state(%{ets_tab: ets_tab}, check_id, Map.fetch!(params, :state))
      Process.send_after(self(), {:spawn_check, check_id}, Map.fetch!(params, :initial_delay))
    end)

    data = %{
      checks: checks,
      ets_tab: ets_tab,
      on_state_change: Keyword.get(args, :on_state_change)
    }

    case Keyword.get(args, :state, :error) do
      :ok -> {:ok, :healthy, data}
      _ -> {:ok, :unhealthy, data}
    end
  end

  @impl true
  def callback_mode(), do: [:state_functions, :state_enter]

  @impl true
  def terminate(_reason, _state, data), do: set_state(data, :terminate)

  # ________unhealthy
  @doc false
  def unhealthy(:enter, state, data) when state in [:healthy, :unhealthy] do
    set_state(data, :unhealthy)
    :keep_state_and_data
  end

  def unhealthy(:info, {:check_result, result, check_id}, data) do
    set_check_state(data, check_id, result)

    if is_ets_healthy?(data),
      do: {:next_state, :healthy, data},
      else: {:keep_state, data}
  end

  def unhealthy(:info, {:DOWN, ref, :process, pid, :normal}, data) do
    pid
    |> find_check(ref, data)
    |> maybe_respawn_check()

    :keep_state_and_data
  end

  def unhealthy(:info, {:DOWN, ref, :process, pid, error}, data) do
    {check_id, _check_params} = check = find_check(pid, ref, data)
    maybe_respawn_check(check)
    set_check_state(data, check_id, error)

    :keep_state_and_data
  end

  def unhealthy(:info, {:spawn_check, check_id}, data),
    do: {:keep_state, spawn_check(check_id, data)}

  # ________healthy
  @doc false
  def healthy(:enter, state, data) when state in [:healthy, :unhealthy] do
    set_state(data, :healthy)
    :keep_state_and_data
  end

  def healthy(:info, {:check_result, result, check_id}, data) do
    set_check_state(data, check_id, result)

    case result do
      :ok -> {:keep_state, data}
      _err -> {:next_state, :unhealthy, data}
    end
  end

  def healthy(:info, {:DOWN, ref, :process, pid, :normal}, data) do
    pid
    |> find_check(ref, data)
    |> maybe_respawn_check()

    :keep_state_and_data
  end

  def healthy(:info, {:DOWN, ref, :process, pid, error}, data) do
    {check_id, _check_params} = check = find_check(pid, ref, data)
    maybe_respawn_check(check)
    set_check_state(data, check_id, error)

    {:next_state, :unhealthy, data}
  end

  def healthy(:info, {:spawn_check, check_id}, data),
    do: {:keep_state, spawn_check(check_id, data)}

  # ________helpers
  defp ets_table_name(name), do: Module.concat(name, Table)

  defp find_check(pid, ref, %{checks: checks}),
    do: Enum.find(checks, nil, fn {_check_id, params} -> {pid, ref} == params.spawn_proc end)

  defp maybe_respawn_check({check_id, check_params}),
    do: Process.send_after(self(), {:spawn_check, check_id}, check_params.interval)

  defp maybe_respawn_check(_invalid_check), do: :ok

  defp spawn_check(check_id, %{checks: checks} = data) do
    %{mfa: {m, f, a}, timeout: timeout} = params = Keyword.fetch!(checks, check_id)
    from = self()

    {pid, ref} =
      spawn_monitor(fn ->
        :timer.kill_after(timeout, self())
        result = apply(m, f, a)
        send(from, {:check_result, result, check_id})
      end)

    params = Map.put(params, :spawn_proc, {pid, ref})
    checks = Keyword.put(checks, check_id, params)
    Map.put(data, :checks, checks)
  end

  defp set_state(%{ets_tab: ets_tab, on_state_change: {m, f, a}}, state) do
    state = parse_state(state)
    :ets.insert(ets_tab, {:state, state})
    apply(m, f, [state, a])
  end

  defp set_state(%{ets_tab: ets_tab, on_state_change: nil}, state) do
    state = parse_state(state)
    :ets.insert(ets_tab, {:state, state})
  end

  defp set_check_state(%{ets_tab: ets_tab}, check_id, result),
    do: :ets.insert(ets_tab, {{:check_state, check_id}, result})

  defp is_ets_healthy?(%{ets_tab: ets_tab}) do
    case :ets.select(ets_tab, [
           {{{:check_state, :"$1"}, :"$2"}, [{:"/=", :"$2", :ok}], [:"$2"]}
         ]) do
      [] -> true
      _ -> false
    end
  end

  defp parse_state(:healthy), do: :ok
  defp parse_state(_), do: :error
end