lib/sequence/sequence.ex

Select File:
lib/sequence/sequence.ex

defmodule Bio.Sequence do
  @moduledoc """
  `Bio.Sequence` is the basic building block of the sequence types.

  The core concept here is that a polymer is a sequence of elements encoded as a
  binary. This is stored in the base `%Bio.Sequence{}` struct, which has both a
  `sequence` and `length` field, and may carry a `label` as well.

  The struct is intentionally sparse on information since this is meant to
  compose into larger data types. For example, the `Bio.Sequence.DnaDoubleStrand` struct,
  which has two polymer `Bio.Sequence.DnaStrand`s as the `top_strand` and
  `bottom_strand` fields.

  Because many of the sequence behaviors are shared, they are implemented by
  `Bio.SimpleSequence` and used in the modules that need them. This allows us to
  ensure that there is a consistent implementation of the `Enumerable` protocol,
  which in turn allows for common interaction patterns a la Python strings:

  # Examples
      iex>"gmc" in Bio.Sequence.new("agmctbo")
      true

      iex>Bio.Sequence.new("agmctbo")
      ...>|> Enum.map(&(&1))
      ["a", "g", "m", "c", "t", "b", "o"]

      iex>alias Bio.Enum, as: Bnum
      ...>Bio.Sequence.new("agmctbo")
      ...>|> Bnum.slice(2, 2)
      %Bio.Sequence{sequence: "mc", length: 2, label: ""}

  My hope is that this alleviates some of the pain of coming from a language
  where strings are slightly more complex objects.
  """
  use Bio.SimpleSequence

  defmodule Conversions do
    @moduledoc false
    use Bio.Behaviours.Converter
  end

  @impl Bio.Behaviours.Sequence
  def converter, do: Conversions

  @impl Bio.Behaviours.Sequence
  def fasta_line(%__MODULE__{sequence: seq, label: label}), do: ">#{label}\n#{seq}\n"
end