lib/data_schema.ex

defmodule DataSchema do
  @moduledoc """
  DataSchemas are declarative specifications of how to create structs from some kind of
  data source. For example you can define a schema that knows how to turn an elixir map
  into a struct, casting all of the values as it goes. Alternatively you can set up a
  schema to ingest XML data and create structs from the values inside the XML.

  Below is an example of a simple schema:

      defmodule Blog do
        import DataSchema, only: [data_schema: 1]

        data_schema([
          field: {:name, "name", &{:ok, to_string(&1)}}
        ])
      end

  This says we will create a struct with a `:name` key and will get the value for that key
  from under the `"name"` key in the source data. That value will be passed to `to_string/1`
  and the result of that function will end up as the value under `:name` in the resulting
  struct.

  In general this is the format for a field:

      field {:content, "text", &cast_string/1}
      #  ^         ^      ^              ^
      # field type |      |              |
      # struct key name   |              |
      #    path to data in the source    |
      #                           casting function

  Depending on your input data type the path pointing to a value in it may need to be
  interpreted differently. For our example of a map input type, the "path" is really just
  a key on that input map. But there is still flexibility in how we use that key to access
  the value; we could use `Map.get/2` or `Map.fetch/2` for example. Additionally, for
  different input data types what the path looks like and what it means for how you access
  data can be different. Let's say your input data type was XML your path could be ".//MyNode",
  ie could be an xpath. In which case what you do with that xpath is going to be different
  from what you would do with a map key.

  DataSchema allows for different schemas to handle different input types AND allows for
  the same input type to be handled differently in different schemas.

  Finally when creating the struct we can choose to stop as soon as we find an error or to
  simply put whatever is returned from a casting function into the struct we are making.
  The latter approach encourages people to raise exceptions from their casting functions
  to halt the creation of the struct.

  ### Field Types

  There are 5 kinds of struct fields we can have:

  1. `field`     - The value will be a casted value from the source data.
  2. `list_of`   - The value will be a list of casted values created from the source data.
  3. `has_one`   - The value will be created from a nested data schema (so will be a struct)
  4. `has_many`  - The value will be created by casting a list of values into a data schema.
  (You end up with a list of structs defined by the provided schema). Similar to has_many in ecto
  5. `aggregate` - The value will a casted value formed from multiple bits of data in the source.

  ### Examples

  See the guides for more in depth examples but below you can see how we create a schema
  that will take a map of data and create a struct out of it. Given the following schema:

      defmodule Sandwich do
        require DataSchema

        DataSchema.data_schema([
          field: {:type, "the_type", &{:ok, String.upcase(&1)}},
          list_of: {:fillings, "the_fillings", &({:ok, String.downcase(&1["name"])})}
        ])
      end

      input_data = %{
        "the_type" => "fake steak",
        "the_fillings" => [
          %{"name" => "fake stake", "good?" => true},
          %{"name" => "SAUCE"},
          %{"name" => "sweetcorn"},
        ]
      }

      DataSchema.to_struct(input_data, Sandwich)
      # outputs the following:
      %Sandwich{
        type: "FAKE STEAK",
        fillings: ["fake stake", "sauce", "sweetcorn"],
      }
  """
  @available_types [:field, :has_one, :has_many, :aggregate, :list_of]
  @non_null_error_message "Field was marked as not null but was found to be null."

  @doc """
  A macro that creates a data schema. By default all struct fields are required but you
  can specify that a field be optional by passing the correct option in. See the Options
  section below for more.

  ### Field Types

  There are 5 kinds of struct fields we can have:

  1. `field`     - The value will be a casted value from the source data.
  2. `list_of`   - The value will be a list of casted values created from the source data.
  3. `has_one`   - The value will be created from a nested data schema (so will be a struct)
  4. `has_many`  - The value will be created by casting a list of values into a data schema.
  (You end up with a list of structs defined by the provided schema). Similar to has_many in ecto
  5. `aggregate` - The value will a casted value formed from multiple bits of data in the source.

  ### Options

  Available options are:

    - `:optional?` - specifies whether or not the field in the struct should be included in
    the `@enforce_keys` for the struct. By default all fields are required but you can mark
    them as optional by setting this to `true`. This will also be checked when creating a
    struct with `DataSchema.to_struct/2` returning an error if the required field is null.

  For example:
      defmodule Sandwich do
        require DataSchema

        DataSchema.data_schema([
          field: {:type, "the_type", &{:ok, String.upcase(&1)}, optional?: true},
        ])
      end

  ### Examples

  See the guides for more in depth examples but below you can see how we create a schema
  that will take a map of data and create a struct out of it. Given the following schema:

      defmodule Sandwich do
        require DataSchema

        DataSchema.data_schema([
          field: {:type, "the_type", &{:ok, String.upcase.(&1)}},
          list_of: {:fillings, "the_fillings", &({:ok, String.downcase(&1["name"])})}
        ])
      end

      input_data = %{
        "the_type" => "fake steak",
        "the_fillings" => [
          %{"name" => "fake stake", "good?" => true},
          %{"name" => "SAUCE"},
          %{"name" => "sweetcorn"},
        ]
      }

      DataSchema.to_struct(input_data, Sandwich)
      # outputs the following:
      %Sandwich{
        type: "FAKE STEAK",
        fillings: ["fake stake", "sauce", "sweetcorn"],
      }
  """
  defmacro data_schema(fields) do
    quote do
      @doc false
      def __data_schema_fields, do: unquote(fields)
      # __MODULE__ refers to the module that this macro is used in - ie the schema module.
      # We add a __data_accessor function so to_struct can call it, we default to a Map
      # accessor if no accessor is provided.
      if Module.has_attribute?(__MODULE__, :data_accessor) do
        @doc false
        def __data_accessor do
          @data_accessor
        end
      else
        @doc false
        def __data_accessor, do: DataSchema.MapAccessor
      end

      @enforce_keys Enum.reduce(
                      unquote(fields),
                      [],
                      fn
                        # Validates the shape of the fields at compile time.
                        {type, {_, _, _, _}}, _acc
                        when type not in unquote(@available_types) ->
                          message = """
                          Field #{inspect(type)} is not a valid field type.
                          Check the docs in DataSchema for more information on how fields should be written.
                          The available types are: #{inspect(unquote(@available_types))}
                          """

                          raise DataSchema.InvalidSchemaError, message: message

                        {type, {_, _, _}}, _acc
                        when type not in unquote(@available_types) ->
                          message = """
                          Field #{inspect(type)} is not a valid field type.
                          Check the docs in DataSchema for more information on how fields should be written.
                          The available types are: #{inspect(unquote(@available_types))}
                          """

                          raise DataSchema.InvalidSchemaError, message: message

                        {:aggregate, {_, schema, _, _}}, _acc
                        when not is_atom(schema) and not is_list(schema) ->
                          raise DataSchema.InvalidSchemaError, """
                          An :aggregate field should provide a nested schema to describe the data to be extracted.
                          This can be a module of another DataSchema or a list of schema fields:

                              defmodule Thing do
                                import DataSchema, only: [data_schema: 1]

                                @fields [
                                  field: {:date, "date", &Date.from_iso8601/1},
                                  field: {:time, "time", &Time.from_iso8601/1}
                                ]

                                data_schema([
                                  aggregate: {:datetime, @fields, NaiveDateTime.new(&1.date, &1.time)}
                                ])
                              end

                          Or:

                              defmodule Thing do
                                import DataSchema, only: [data_schema: 1]

                                defmodule DateTime do
                                  import DataSchema, only: [data_schema: 1]

                                  data_schema([
                                    field: {:date, "date", &Date.from_iso8601/1},
                                    field: {:time, "time", &Time.from_iso8601/1}
                                  ])
                                end

                                data_schema([
                                  aggregate: {:datetime, DateTime, &NaiveDateTime.new(&1.date, &1.time)}
                                ])
                              end

                          Provided schema: #{inspect(schema)}
                          """

                        {:aggregate, {_, schema, _}}, _acc
                        when not is_atom(schema) and not is_list(schema) ->
                          raise DataSchema.InvalidSchemaError, """
                          An :aggregate field should provide a nested schema to describe the data to be extracted.
                          This can be a module of another DataSchema or a list of schema fields:

                              defmodule Thing do
                                import DataSchema, only: [data_schema: 1]

                                @fields [
                                  field: {:date, "date", &Date.from_iso8601/1},
                                  field: {:time, "time", &Time.from_iso8601/1}
                                ]

                                data_schema([
                                  aggregate: {:datetime, @fields, NaiveDateTime.new(&1.date, &1.time)}
                                ])
                              end

                          Or:

                              defmodule Thing do
                                import DataSchema, only: [data_schema: 1]

                                defmodule DateTime do
                                  import DataSchema, only: [data_schema: 1]

                                  data_schema([
                                    field: {:date, "date", &Date.from_iso8601/1},
                                    field: {:time, "time", &Time.from_iso8601/1}
                                  ])
                                end

                                data_schema([
                                  aggregate: {:datetime, DateTime, &NaiveDateTime.new(&1.date, &1.time)}
                                ])
                              end

                          Provided schema: #{inspect(schema)}
                          """

                        {type, {_, _, module, _}}, _acc
                        when type in [:has_one, :has_many] and not is_atom(module) and
                               not is_list(module) ->
                          message = """
                          #{type} fields require a DataSchema module as their casting function:

                              data_schema([
                                #{type}: {:foo, "path", Foo}
                                #                        ^^
                                # Should be a DataSchema module
                              ])

                          Or an inline list of fields like so:

                              @foo_fields [
                                field: {:bar, "bar", &{:ok, to_string(&1)}}
                              ]

                              data_schema([
                                #{type}: {:foo, "path", @foo_fields}
                                #                          ^^
                                # Or a list of fields inline.
                              ])

                          You provided the following as a schema: #{inspect(module)}.
                          Ensure you haven't used the wrong field type.
                          """

                          raise DataSchema.InvalidSchemaError, message: message

                        {type, {_, _, module}}, _acc
                        when type in [:has_one, :has_many] and not is_atom(module) and
                               not is_list(module) ->
                          message = """
                          #{type} fields require a DataSchema module as their casting function:

                              data_schema([
                                #{type}: {:foo, "path", Foo}
                                #                        ^^
                                # Should be a DataSchema module
                              ])

                          Or an inline list of fields like so:

                              @foo_fields [
                                field: {:bar, "bar", &{:ok, to_string(&1)}}
                              ]

                              data_schema([
                                #{type}: {:foo, "path", @foo_fields}
                                #                          ^^
                                # Or a list of fields inline.
                              ])

                          You provided the following as a schema: #{inspect(module)}.
                          Ensure you haven't used the wrong field type.
                          """

                          raise DataSchema.InvalidSchemaError, message: message

                        {_, {field, _, _, opts}}, acc ->
                          # By default fields are required but they can be marked as optional.
                          if Keyword.get(opts, :optional?, false) do
                            acc
                          else
                            [field | acc]
                          end

                        # If no options are provided the field is enforced.
                        {_, {field, _, _}}, acc ->
                          [field | acc]

                        _, _ ->
                          raise DataSchema.InvalidSchemaError
                      end
                    )
      defstruct Enum.map(unquote(fields), fn
                  {_, {field, _, _}} when not is_atom(field) ->
                    message = """
                    The provided struct keys must be atoms. See docs for more information:

                        data_schema([
                          field: {:foo, "foo", &{:ok, &1}}
                        #          ^^^
                        #   must be an atom!
                        ])
                    """

                    raise DataSchema.InvalidSchemaError, message: message

                  {_, {field, _, _, _}} when not is_atom(field) ->
                    message = """
                    The provided struct keys must be atoms. See docs for more information:

                        data_schema([
                          field: {:foo, "foo", &{:ok, &1}}
                        #          ^^^
                        #   must be an atom!
                        ])
                    """

                    raise DataSchema.InvalidSchemaError, message: message

                  {_, {field, _, _}} ->
                    field

                  {_, {field, _, _, _}} ->
                    field

                  _ ->
                    raise DataSchema.InvalidSchemaError
                end)
    end
  end

  @doc """
  Accepts an data schema module and some source data and attempts to create the struct
  defined in the schema from the source data recursively.

  We essentially visit each field in the schema and extract the data the field points to
  from the sauce data, passing it to the field's casting function before setting the
  result of that as the value on the struct.

  This function takes a simple approach to creating the struct - whatever you return from
  a casting function will be set as the value of the struct field. You should raise if
  you want casting to fail.

  ### Examples

      data = %{ "spice" => "enables space travel" }

      defmodule Foo do
        require DataSchema

        DataSchema.data_schema(
          field: {:a_rocket, "spice", &({:ok, &1})}
        )
      end

      DataSchema.to_struct(data, Foo)
      # => Outputs the following:
      %Foo{a_rocket: "enables space travel"}
  """
  def to_struct(data, %schema{}) do
    to_struct(data, schema, [])
  end

  def to_struct(data, schema) do
    to_struct(data, schema, [])
  end

  def to_struct(data, %schema{}, opts) do
    to_struct(data, schema, opts)
  end

  def to_struct(data, schema, opts) when is_atom(schema) do
    if !function_exported?(schema, :__data_schema_fields, 0) do
      raise "Provided schema is not a valid DataSchema: #{inspect(schema)}"
    end

    fields = schema.__data_schema_fields()
    accessor = schema.__data_accessor()
    struct = struct(schema, %{})
    to_struct(data, struct, fields, accessor, opts)
  end

# defmodule DateAndTime do
#   defstruct [:date, :time]
# end

# data = %{"date" => "1", "time" => "2"}
# fields = [
#   field: {:date, "date", &Date.from_iso8601/1},
#   field: {:time, "time", &Time.from_iso8601/1}
# ]
# accessor = MapAccessor
# struct_or_schema = DateAndTime

# Now there is the Q of should we default the accessor and opts... We'd need a map input
# to not clash arity though. or a new name for this like
# "schemaless_to_struct" or "to_existing_struct"
  def to_struct(data, struct, fields, accessor) do
    to_struct(data, struct, fields, accessor, [])
  end

  def to_struct(data, struct, fields, accessor, opts) do
    # Right now we fail as soon as we get an error. If this error is nested deep then we
    # generate a recursive error that points to the value that caused it. We can imagine
    # instead "collecting" errors - meaning continuing with struct creation to gather up
    # all possible errors that will happen on struct creation. How to do this boggles the
    # mind a bit. But we'd need an option I do know that....
    # collect_errors? = Keyword.get(opts, :collect_errors, false)

    Enum.reduce_while(fields, struct, fn
      {:aggregate, {field, schema_mod, cast_fn, field_opts}}, struct when is_atom(schema_mod) ->
        nullable? = Keyword.get(field_opts, :optional?, false)
        fields = schema_mod.__data_schema_fields()
        accessor = schema_mod.__data_accessor()
        aggregate = struct(schema_mod, %{})
        aggregate(fields, accessor, data, opts, field, cast_fn, aggregate, struct, nullable?)

      {:aggregate, {field, schema_mod, cast_fn}}, struct when is_atom(schema_mod) ->
        fields = schema_mod.__data_schema_fields()
        accessor = schema_mod.__data_accessor()
        aggregate = struct(schema_mod, %{})
        aggregate(fields, accessor, data, opts, field, cast_fn, aggregate, struct, false)

      {:aggregate, {field, fields, cast_fn, field_opts}}, struct when is_list(fields) ->
        nullable? = Keyword.get(field_opts, :optional?, false)
        aggregate(fields, accessor, data, opts, field, cast_fn, %{}, struct, nullable?)

      {:aggregate, {field, fields, cast_fn}}, struct when is_list(fields) ->
        aggregate(fields, accessor, data, opts, field, cast_fn, %{}, struct, false)

      {field_type, {field, paths, cast_fn, field_opts}}, struct ->
        nullable? = Keyword.get(field_opts, :optional?, false)
        process_field({field_type, {field, paths, cast_fn}}, struct, nullable?, accessor, data)

      {_, {_, _, _}} = field, struct ->
        # By default fields are not nullable.
        nullable? = false
        process_field(field, struct, nullable?, accessor, data)
    end)
    |> case do
      :error -> :error
      {:error, error_message} -> {:error, error_message}
      struct -> {:ok, struct}
    end
  end

  defp process_field({:field, {field, path, cast_fn}}, struct, nullable?, accessor, data) do
    case call_cast_fn(cast_fn, accessor.field(data, path)) do
      {:ok, nil} ->
        if nullable? do
          {:cont, Map.put(struct, field, nil)}
        else
          # Instead of halt we would have to
          {:halt, {:error, null_error(%DataSchema.Errors{}, field)}}
        end

      {:ok, value} ->
        {:cont, Map.put(struct, field, value)}

      {:error, _} = error ->
        {:halt, error}

      :error ->
        {:halt, :error}
    end
  end

  defp process_field({:has_one, {field, path, cast_module}}, struct, nullable?, accessor, data) do
    case accessor.has_one(data, path) do
      nil ->
        if nullable? do
          # Should we still call cast fn? There is no cast to happen here as cast is to_struct
          # which happens automatically.
          {:cont, Map.put(struct, field, nil)}
        else
          {:halt, {:error, null_error(%DataSchema.Errors{}, field)}}
        end

      value ->
        case to_struct(value, cast_module) do
          # It's not possible for to_struct to return nil so we don't handle that case here
          {:ok, value} -> {:cont, Map.put(struct, field, value)}
          {:error, _} = error -> {:halt, error}
          :error -> {:halt, :error}
        end
    end
  end

  defp process_field(
         {:has_many, {field, path, cast_module}},
         struct,
         nullable?,
         accessor,
         data
       ) do
    case accessor.has_many(data, path) do
      nil ->
        if nullable? do
          {:cont, Map.put(struct, field, nil)}
        else
          {:halt, {:error, null_error(%DataSchema.Errors{}, field)}}
        end

      data ->
        data
        |> Enum.reduce_while([], fn datum, acc ->
          # It's not possible for to_struct to return nil so we don't worry about it here.
          case to_struct(datum, cast_module) do
            {:ok, struct} -> {:cont, [struct | acc]}
            {:error, _} = error -> {:halt, error}
            :error -> {:halt, :error}
          end
        end)
        |> case do
          {:error, _} = error ->
            {:halt, error}

          :error ->
            {:halt, :error}

          relations when is_list(relations) ->
            {:cont, %{struct | field => :lists.reverse(relations)}}
        end
    end
  end

  defp process_field({:list_of, {field, path, cast_module}}, struct, nullable?, accessor, data) do
    case accessor.list_of(data, path) do
      nil ->
        if nullable? do
          {:cont, Map.put(struct, field, nil)}
        else
          {:halt, {:error, null_error(%DataSchema.Errors{}, field)}}
        end

      data ->
        data
        |> Enum.reduce_while([], fn datum, acc ->
          case call_cast_fn(cast_module, datum) do
            {:ok, nil} ->
              if nullable? do
                # Do we add nil or do we remove them? a list of nils seeeeems bad. But is it
                # better to not remove information...?
                # {:cont, [nil | acc]}

                {:cont, acc}
              else
                {:halt, {:error, "Got null for a field that can't be null."}}
              end

            {:ok, value} ->
              {:cont, [value | acc]}

            {:error, _} = error ->
              {:halt, error}

            :error ->
              {:halt, :error}
          end
        end)
        |> case do
          {:error, error} ->
            {:halt, {:error, %DataSchema.Errors{errors: [{field, error}]}}}

          :error ->
            {:halt, :error}

          relations when is_list(relations) ->
            {:cont, %{struct | field => :lists.reverse(relations)}}
        end
    end
  end

  defp aggregate(fields, accessor, data, opts, field, cast_fn, aggregate, parent, nullable?) do
    case to_struct(data, aggregate, fields, accessor, opts) do
      :error ->
        {:halt, :error}

      {:error, error} ->
        {:halt, {:error, %DataSchema.Errors{errors: [{field, error}]}}}

      {:ok, values_map} ->
        case call_cast_fn(cast_fn, values_map) do
          {:ok, nil} ->
            if nullable? do
              {:cont, Map.put(parent, field, nil)}
            else
              {:halt, {:error, null_error(%DataSchema.Errors{}, field)}}
            end

          {:ok, value} ->
            {:cont, Map.put(parent, field, value)}

          {:error, error} ->
            {:halt, {:error, %DataSchema.Errors{errors: [{field, error}]}}}

          :error ->
            {:halt, :error}
        end
    end
  end

  defp null_error(error, field) do
    DataSchema.Errors.add_error(error, {field, @non_null_error_message})
  end

  # This just lets us use either a module name for the data type OR a one arity fn.
  defp call_cast_fn(module, value) when is_atom(module), do: module.cast(value)
  defp call_cast_fn(fun, value) when is_function(fun, 1), do: fun.(value)
end