lib/sequence.ex

defmodule Bio.Sequence do
  @moduledoc """
  `Bio.Sequence` is the basic building block of the sequence types.

  The core concept here is that a polymer is a sequence of elements encoded as a
  binary. This is stored in the base `%Bio.Sequence{}` struct, which has both a
  `sequence` and `length` field, and may carry a `label` and `alphabet` field as
  well.

  The struct is intentionally sparse on information since this is meant to
  compose into larger data types. For example, the `Bio.Sequence.DnaDoubleStrand` struct,
  which has two polymer `Bio.Sequence.DnaStrand`s as the `top_strand` and
  `bottom_strand` fields.

  Because many of the sequence behaviors are shared, they are implemented by
  `Bio.BaseSequence` and used in the modules that need them. This allows us to
  ensure that there is a consistent implementation of the `Enumerable` protocol,
  which in turn allows for common interaction patterns _a la_ Python strings:

      iex>"gmc" in Bio.Sequence.new("agmctbo")
      true

      iex>Bio.Sequence.new("agmctbo")
      ...>|> Enum.map(&(&1))
      ["a", "g", "m", "c", "t", "b", "o"]

  My hope is that this alleviates some of the pain of coming from a language
  where strings are slightly more complex objects.

  Additionally, you should look at the `Bio.Enum` module for dealing with cases
  where the `Enum` default implementation results in odd behavior. It also
  implements certain behaviors like returning the same type for functions:

      iex>Bio.Sequence.new("agmctbo")
      ...>|> Enum.slice(2, 2)
      'mc'


  vs

      iex>alias Bio.Enum, as: Bnum
      ...>Bio.Sequence.new("agmctbo")
      ...>|> Bnum.slice(2, 2)
      %Bio.Sequence{sequence: "mc", length: 2}
  """
  use Bio.BaseSequence

  defmodule Conversions do
    @moduledoc false
    use Bio.Convertible
  end

  @impl Bio.Sequential
  def converter, do: Conversions

  @impl Bio.Sequential
  def fasta_line(%__MODULE__{sequence: seq, label: label}), do: ">#{label}\n#{seq}\n"
end