lib/existence.ex

defmodule Existence do
  @moduledoc """
  Health-checks start and state access module.

  Module provides functions for accessing an overall health-check state and individual dependencies
  checks results.
  Module is also used to start an `Existence` process as a part of an application supervision tree.

  `Existence` works by asynchronously spawning user defined dependencies checks functions.
  Individual dependencies checks functions results are evaluated to establish an overall
  health-check state.
  Overall health-check state is healthy only when all user defined dependencies checks are healthy.
  It is assumed that healthy state is represented by an `:ok` atom for both dependencies checks and
  for the overall health-check.
  Any other result in dependencies checks is associated with an unhealthy dependency check state.
  Overall health-check unhealthy state is represented by an `:error` atom.

  User defined dependencies checks functions are spawned as monitored isolated processes.
  If user dependency check function raises, throws an error, timeouts or fails in any way it
  doesn't have a negative impact on other processes, including user application.

  Current dependencies checks functions results and current overall health-check state are stored
  in an ETS table.
  Whenever user executes any of available state getters, `get_state/1` or `get_checks/1`,
  request is made against ETS table which has `:read_concurrency` set to `true`.
  In practice it means that library can handle huge numbers of requests per second
  without blocking any other processes.

  Module provides two functions to access checks states:
  * `get_state/1` to get overall health-check state,
  * `get_checks/1` to get dependencies checks states.

  ## Usage
  After defining dependencies checks parameters, `Existence` can be started using
  your application supervisor:
  ```elixir
  #lib/my_app/application.ex
  def start(_type, _args) do
    health_checks = [
      # minimal dependency check configuration:
      check_1: %{
        mfa: {MyApp.Checks, :check_1, []}
      },
      # complete dependency check configuration:
      check_2: %{
        mfa: {MyApp.Checks, :check_2, []},
        initial_delay: 1_000,
        interval: 30_000,
        state: :ok,
        timeout: 1_000
      }
    ]

    children = [
      {Existence, checks: health_checks, state: :ok}
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
  ```
  When `Existence` is started it has assigned an initial overall health-check state, which
  by default is equal to an `:error` atom, meaning an unhealthy state.
  Initial overall health-check state can be changed with a `:state` key. In a code example above
  initial overall health-check state is set to a healthy state with: `state: :ok`.

  `Existence` supports starting multiple instances by using common Elixir child identifiers:
  `:id` and `:name`, for example:
  ```elixir
  children = [
    {Existence, id: ExistenceReadiness, name: ReadinessCheck},
    {Existence, name: {:local, LivenessCheck}}
  ]
  ```

  ## Configuration
  `Existence` startup options:
  * `:id` - any term used to identify the child specification internally. Please refer to the
  `Supervisor` "Child specification" documentation section for details on child `:id` key.
  Default: `Existence`.
  * `:name` - name used to start `Existence` `:gen_statem` process locally. If defined
  as an `atom()` `:gen_statem.start_link/3` is used to start `Existence` process
  without registration.
  If defined as a `{:local, atom()}` tuple, `:gen_statem.start_link/4` is invoked and process is
  registered locally with a given name.
  Key value is used to select `Existence` instance when running `get_state/1` or `get_checks/1`.
  Default: `Existence`.
  * `:checks` - keyword list with user defined dependencies checks parameters, see description
  below for details. Default: `[]`.
  * `:state` - initial overall `Existence` instance health-check state. Default: `:error`.

  Dependencies checks are defined using a keyword list with configuration parameters defined
  as a maps.

  Dependencies checks configuration options:
  * `:mfa` - `{module, function, arguments}` tuple specifying user defined function to spawn when
  executing given dependency check. Please refer to `Kernel.apply/3` documentation for
  the MFA pattern explanation. Required.
  * `:initial_delay` - amount of time in milliseconds to wait before spawning a dependency check
  for the first time. Can be used to wait for a dependency process to properly initialize before
  executing dependency check function first time when application is started. Default: `100`.
  * `:interval` - time interval in milliseconds specifying how frequently given check should be
  executed. Default: `30_000`.
  * `:state` - initial dependency check state when starting `Existence`. Default: `:error`.
  * `:timeout` - after spawning dependency check function library will wait `:timeout` amount of
  milliseconds for the dependency check function to complete.
  If dependency check function will do not complete within a given timeout, dependency check
  function process will be killed, and dependency check state will assume a `:killed` atom value.
  Default: `5_000`.

  ## Dependencies checks
  User defined dependencies checks functions must return an `:ok` atom for the healthy state.
  Any other values returned by dependencies checks functions are considered as an unhealthy state.

  Example checks for two popular dependencies, PostgreSQL and Redis:
  ```elixir
  #lib/checks.ex
  def check_postgres() do
    "SELECT 1;"
    |> MyApp.Repo.query()
    |> case do
      {:ok, %Postgrex.Result{num_rows: 1, rows: [[1]]}} -> :ok
      _ -> :error
    end
  end

  def check_redis() do
    case MyApp.Redix.command(["PING"]) do
      {:ok, "PONG"} -> :ok
      _ -> :error
    end
  end
  ```
  Please notice that dependencies checks functions in the code example above are not wrapped in a
  `try/1` blocks.
  Dependencies checks functions are spawned as monitored processes.
  Whenever check function will raise, parent health-check process will be notified with an `:info`
  `:DOWN` message and dependency check status will be assigned a tuple containing an exception and
  a stack trace, for example:
  ```elixir
  # def check_1(), do: raise("CustomError")
  iex> Existence.get_checks()
  [
    check_1: {%RuntimeError{message: "CustomError"}, [ # ... stack trace ]}
  ]
  iex> Existence.get_state()
  :error
  ```
  """

  @behaviour :gen_statem

  @enforce_keys [:mfa]
  defstruct [
    :mfa,
    initial_delay: 100,
    interval: 30_000,
    state: :error,
    timeout: 5_000,
    spawn_proc: {nil, nil}
  ]

  # TODO decide if ets_exists?/1 should be executed on each get_* call
  # def ets_exists?(table \\ @ets_table_name), do: :ets.whereis(table) != :undefined

  @doc """
  Get dependencies checks states.

  Function gets current dependencies checks states for an `Existence` instance started with
  a given `name`.
  If `name` is not provided, checks states for instance with default `:name` (`Existence`)
  are returned.

  Dependencies checks functions results are returned as a keyword list.
  If no checks were defined function will return an empty list.

  Dependency check function result equal to an `:ok` atom means healthy state, any other term is
  associated with an unhealthy state.

  Function will raise with an `ArgumentError` exception if `Existence` instance `name`
  doesn't exist.

  ##### Example:
  ```elixir
  iex> Existence.get_checks()
  [check_1: :ok, check_2: :ok]
  ```

  ```
  iex> Existence.get_checks(NotExisting)
  ** (ArgumentError) errors were found at the given arguments:
  ```

  """
  @spec get_checks(name :: atom()) :: [] | [key: :ok | any()]
  def get_checks(name \\ __MODULE__) do
    name
    |> ets_table_name()
    |> :ets.select([{{{:check_state, :"$1"}, :"$2"}, [], [{{:"$1", :"$2"}}]}])
  end

  @doc """
  Get an overall health-check state.

  Function gets current overall health-check state for an `Existence` instance started with
  a given `name`.
  If `name` is not provided, overall health-check state for an instance with default `:name`
  (`Existence`) is returned.

  Function returns an `:ok` atom when overall health-check state is healthy and an `:error` atom
  otherwise.
  Overall health-check state is healthy only when all dependencies health checks are healthy.

  Function will raise with an `ArgumentError` exception if `Existence` instance `name`
  doesn't exist.

  ##### Example:
  ```elixir
  iex> Existence.get_state()
  :ok
  ```

  ```elixir
  iex> Existence.get_state(NotExisting)
  ** (ArgumentError) errors were found at the given arguments:
  ```
  """
  @spec get_state(name :: atom()) :: :ok | :error
  def get_state(name \\ __MODULE__) do
    [{:state, state}] =
      name
      |> ets_table_name()
      |> :ets.lookup(:state)

    state
  end

  @doc false
  def child_spec(init_arg) do
    {id, init_arg} = Keyword.pop(init_arg, :id, __MODULE__)
    Supervisor.child_spec(%{id: id, start: {__MODULE__, :start_link, [init_arg]}}, [])
  end

  @doc false
  def start_link(init_arg) do
    case Keyword.pop(init_arg, :name, __MODULE__) do
      {name, init_arg} when is_atom(name) ->
        init_arg = Keyword.put(init_arg, :ets_name, ets_table_name(name))
        :gen_statem.start_link(__MODULE__, init_arg, [])

      {{:local, name}, init_arg} when is_atom(name) ->
        init_arg = Keyword.put(init_arg, :ets_name, ets_table_name(name))
        :gen_statem.start_link({:local, name}, __MODULE__, init_arg, [])
    end
  end

  @impl true
  def init(args) do
    Process.flag(:trap_exit, true)
    ets_tab = Keyword.fetch!(args, :ets_name)

    :ets.new(ets_tab, [
      :set,
      :named_table,
      :public,
      read_concurrency: true,
      write_concurrency: false
    ])

    checks =
      args
      |> Keyword.get(:checks, [])
      |> Enum.map(fn {check_id, params} -> {check_id, struct!(__MODULE__, params)} end)

    Enum.each(checks, fn {check_id, params} ->
      set_check_state(%{ets_tab: ets_tab}, check_id, Map.fetch!(params, :state))
      Process.send_after(self(), {:spawn_check, check_id}, Map.fetch!(params, :initial_delay))
    end)

    data = %{checks: checks, ets_tab: ets_tab}

    case Keyword.get(args, :state, :error) do
      :ok -> {:ok, :healthy, data}
      _ -> {:ok, :unhealthy, data}
    end
  end

  @impl true
  def callback_mode(), do: [:state_functions, :state_enter]

  @impl true
  def terminate(_reason, _state, data), do: set_state(data, :terminate)

  # ________unhealthy
  @doc false
  def unhealthy(:enter, state, data) when state in [:healthy, :unhealthy] do
    set_state(data, :unhealthy)
    :keep_state_and_data
  end

  def unhealthy(:info, {:check_result, result, check_id}, data) do
    set_check_state(data, check_id, result)

    if is_ets_healthy?(data),
      do: {:next_state, :healthy, data},
      else: {:keep_state, data}
  end

  def unhealthy(:info, {:DOWN, ref, :process, pid, :normal}, data) do
    find_check(pid, ref, data)
    |> maybe_respawn_check()

    :keep_state_and_data
  end

  def unhealthy(:info, {:DOWN, ref, :process, pid, error}, data) do
    {check_id, _check_params} = check = find_check(pid, ref, data)
    maybe_respawn_check(check)
    set_check_state(data, check_id, error)

    :keep_state_and_data
  end

  def unhealthy(:info, {:spawn_check, check_id}, data),
    do: {:keep_state, spawn_check(check_id, data)}

  # ________healthy
  @doc false
  def healthy(:enter, state, data) when state in [:healthy, :unhealthy] do
    set_state(data, :healthy)
    :keep_state_and_data
  end

  def healthy(:info, {:check_result, result, check_id}, data) do
    set_check_state(data, check_id, result)

    case result do
      :ok -> {:keep_state, data}
      _err -> {:next_state, :unhealthy, data}
    end
  end

  def healthy(:info, {:DOWN, ref, :process, pid, :normal}, data) do
    find_check(pid, ref, data)
    |> maybe_respawn_check()

    :keep_state_and_data
  end

  def healthy(:info, {:DOWN, ref, :process, pid, error}, data) do
    {check_id, _check_params} = check = find_check(pid, ref, data)
    maybe_respawn_check(check)
    set_check_state(data, check_id, error)

    {:next_state, :unhealthy, data}
  end

  def healthy(:info, {:spawn_check, check_id}, data),
    do: {:keep_state, spawn_check(check_id, data)}

  # ________helpers
  defp ets_table_name(name), do: Module.concat(name, Table)

  defp find_check(pid, ref, %{checks: checks}),
    do: Enum.find(checks, nil, fn {_check_id, params} -> {pid, ref} == params.spawn_proc end)

  defp maybe_respawn_check({check_id, check_params}),
    do: Process.send_after(self(), {:spawn_check, check_id}, check_params.interval)

  defp maybe_respawn_check(_invalid_check), do: :ok

  defp spawn_check(check_id, %{checks: checks} = data) do
    %{mfa: {m, f, a}, timeout: timeout} = params = Keyword.fetch!(checks, check_id)
    from = self()

    {pid, ref} =
      spawn_monitor(fn ->
        :timer.kill_after(timeout, self())
        result = apply(m, f, a)
        send(from, {:check_result, result, check_id})
      end)

    params = Map.put(params, :spawn_proc, {pid, ref})
    checks = Keyword.put(checks, check_id, params)
    Map.put(data, :checks, checks)
  end

  defp set_state(%{ets_tab: ets_tab}, state) do
    case state do
      :healthy -> :ets.insert(ets_tab, {:state, :ok})
      _ -> :ets.insert(ets_tab, {:state, :error})
    end
  end

  defp set_check_state(%{ets_tab: ets_tab}, check_id, result),
    do: :ets.insert(ets_tab, {{:check_state, check_id}, result})

  defp is_ets_healthy?(%{ets_tab: ets_tab}) do
    case :ets.select(ets_tab, [
           {{{:check_state, :"$1"}, :"$2"}, [{:"/=", :"$2", :ok}], [:"$2"]}
         ]) do
      [] -> true
      _ -> false
    end
  end
end