lib/polymer.ex

defmodule Bio.Polymer do
  @moduledoc """
  Deals with conversions between polymers.

  The sequences that this will work with must define an implementation for the
  `Bio.Polymeric` protocol. This is then used with the definition of
  the `to/1` callbacks for the `Bio.Convertible` behaviour. These will
  be given the kmer enumeration that they define with that function.

  This module wraps the logic of accessing a given polymer's defined
  conversions. The primary idea is that I wanted to expose the ability to
  provide a non-default conversion without losing the semantics of a simple
  default when it's present.

  To put that in more concrete terms, I wanted this to be viable:

      iex>dna = DnaStrand.new("ttagccgt", label: "a label")
      ...>Bio.Polymer.convert(dna, RnaStrand)
      {:ok, %RnaStrand{sequence: "uuagccgu", length: 8, label: "a label"}}

  But, and this is the important part, other conversions are not well defined by
  defaults. For example:

      iex>amino = AminoAcid.new("maktg")
      ...>Bio.Polymer.convert(amino, DnaStrand)
      {:error, :undef_conversion}

  The `:undef_conversion` indicates that there is no viable default
  implementation of the conversion between these polymers. It _does not_
  indicate that there is none. Obviously one can convert from an amino acid to
  _some_ DNA strand. However, because this would imply making a selection from
  the available codons, that is left to the logic of whatever application is
  doing so.

  The way that you would do that is straight forward, you would define a
  conversion module and pass it to the `convert/3` function as the keyword
  argument `:conversion`. For example, if we wanted to defined a mapping that
  converted into a compressed DNA representation, we could do:

      iex>defmodule CompressedAminoConversion do
      ...>  use Bio.Convertible do
      ...>    def to(DnaStrand), do: {:ok, &compressed/2, 1}
      ...>  end
      ...>
      ...>  def compressed({:ok, knumerable, data}, _) do
      ...>    data = data
      ...>           |> Map.drop([ :length ])
      ...>           |> Map.to_list()
      ...>    knumerable
      ...>    |> Enum.map(&to_codon/1)
      ...>    |> Enum.join("")
      ...>    |> DnaStrand.new(data)
      ...>  end
      ...>
      ...>  defp to_codon(aa) do
      ...>    case aa do
      ...>      "a" -> "gcn"
      ...>      "r" -> "cgn"
      ...>      "n" -> "aay"
      ...>      "d" -> "gay"
      ...>      "c" -> "tgy"
      ...>      "e" -> "gar"
      ...>      "q" -> "car"
      ...>      "g" -> "ggn"
      ...>      "h" -> "cay"
      ...>      "i" -> "ath"
      ...>      "l" -> "ctn"
      ...>      "k" -> "aar"
      ...>      "m" -> "atg"
      ...>      "f" -> "tty"
      ...>      "p" -> "ccn"
      ...>      "s" -> "tcn"
      ...>      "t" -> "acn"
      ...>      "w" -> "tgg"
      ...>      "y" -> "tay"
      ...>      "v" -> "gtn"
      ...>    end
      ...>  end
      ...>end
      ...>amino = AminoAcid.new("maktg", label: "polypeptide-∂")
      ...>Bio.Polymer.convert(amino, DnaStrand, conversion: CompressedAminoConversion)
      {:ok, %DnaStrand{sequence: "atggcnaaracnggn", length: 15, label: "polypeptide-∂"}}

  This is made possible because of the simple implementation of the
  `Bio.Polymeric` interface for the `Bio.Sequence.AminoAcid`. If
  you want to define your own convertible polymer types, you can. It requires
  defining the module and the implementation of `convert/1`. You can read the
  `Bio.Sequence.AminoAcid` source for more clarity on the details.

  This package attempts to define reasonable defaults for all the occasions
  which it can. This includes converting DNA into RNA, and RNA to DNA. The
  conversions from DNA/RNA to Amino Acid are done using standard codon tables.

  The Conversion module idea is provided as an escape hatch for more particular
  applications which may require bespoke logic. An example would be converting
  Amino Acids into a DNA sequence, as above. There are likely more use cases
  than I could possibly compile on my own, so I tried to come up with a way to
  alleviate that pressure.
  """
  alias Bio.Polymeric

  @doc """
  Apply a conversion to a given datum.

  The `convert/3` function is at the core of using the `Bio.Polymer`
  module. By passing the function a struct and the module you wish to convert
  to, you are hooking into the underlying implementation of the
  `Bio.Convertible` for that module. This means that both the struct
  you given _as well as the module_ must have this implemented.

  # Examples
  Given a struct and module with a known conversion:

      iex>dna = DnaStrand.new("ttagccgt", label: "a label")
      ...>Bio.Polymer.convert(dna, RnaStrand)
      {:ok, %RnaStrand{sequence: "uuagccgu", length: 8, label: "a label"}}

  Given a struct and module with unknown conversions:

      iex>amino = AminoAcid.new("maktg")
      ...>Bio.Polymer.convert(amino, DnaStrand)
      {:error, :undef_conversion}

  Given a struct that doesn't implement `Bio.Sequential`:

      iex>Bio.Polymer.convert(%SomeModule{}, DnaStrand)
      {:error, :no_converter}
  """
  @spec convert(struct(), module(), keyword()) :: {:ok, struct()} | {:error, :undef_conversion}
  def convert(%_{} = data, module, opts \\ []) do
    case Keyword.get(opts, :conversion) do
      nil ->
        conversion_module = apply(data.__struct__, :converter, [])

        case apply(conversion_module, :to, [module]) do
          {:ok, kwise_converter, k} ->
            data
            |> Polymeric.kmers(k)
            |> kwise_converter.(module)
            |> then(&{:ok, &1})

          otherwise ->
            otherwise
        end

      conversion_module ->
        case apply(conversion_module, :to, [module]) do
          {:ok, kwise_converter, k} ->
            data
            |> Polymeric.kmers(k)
            |> kwise_converter.(module)
            |> then(&{:ok, &1})

          otherwise ->
            otherwise
        end
    end
  rescue
    UndefinedFunctionError -> {:error, :no_converter}
  end

  def valid?(%_{} = data, alphabet \\ nil) do
    case {Map.get(data, :alphabet), alphabet} do
      {nil, nil} -> false
      {builtin, nil} -> Polymeric.valid?(data, builtin)
      {_, given} -> Polymeric.valid?(data, given)
    end
  end

  @doc """
  Validate a given sequence struct according to its `Bio.Polymeric` implementation.
  """
  @spec validate(struct(), String.t() | nil) ::
          {:ok, struct()}
          | {:error, :no_alpha}
          | {:error, {atom(), String.t(), integer()}}
          | {:error, [{atom(), String.t(), integer()}]}
  def validate(data, alphabet \\ nil)

  def validate(%_{} = data, alphabet) do
    case {Map.get(data, :alphabet), alphabet} do
      {nil, nil} -> {:error, :no_alpha}
      {builtin, nil} -> Polymeric.validate(data, builtin)
      {_, given} -> Polymeric.validate(data, given)
    end
  end
end