lib/bitwise_ip.ex

defmodule BitwiseIp do
  @moduledoc """
  A struct representing an IP address encoded as an integer.

  The [Internet Protocol](https://en.wikipedia.org/wiki/Internet_Protocol)
  defines computer network addresses using fixed-width integers, which are
  efficient for both transmission and the implementation of logic using bitwise
  operations. [IPv4](https://en.wikipedia.org/wiki/IPv4) uses 32-bit integers,
  providing a space of 4,294,967,296 unique addresses. Due to the growing size
  of the internet, [IPv6](https://en.wikipedia.org/wiki/IPv6) uses 128-bit
  integers to provide an absurdly large address space.

  These integers, however, are hard for humans to read. Therefore, we've
  adopted customary notations that are a little easier to digest. IPv4 uses a
  dotted octet notation, where each of the four bytes are written in decimal
  notation and separated by `.`, as in `127.0.0.1`. IPv6 is similar, but uses
  hexadecimal notation on each of eight hextets separated by `:`, as in
  `a:1:b:2:c:3:d:4`.

  As such, representations for IP addresses in modern software have drifted
  away from fixed-width integers. `:inet` represents IP addresses as tuples
  like `{127, 0, 0, 1}` for IPv4 and `{0xA, 1, 0xB, 2, 0xC, 3, 0xD, 4}` for
  IPv6. These are less efficient in both the space to store the addresses and
  the time it takes to perform various operations. For example, whereas
  comparing two 32-bit IPv4 addresses is typically one machine instruction,
  comparing two tuples involves memory indirection for the tuple layout and 4
  separate integer comparisons. This could be even worse if you represent IPs
  as strings in their human-readable format.

  The difference is probably negligible for your application. In fact, Elixir &
  Erlang don't have great support for fixed-width integer representations (see
  `t:t/0` for details). But in the interest of getting back to basics,
  `BitwiseIp` provides the missing interface for manipulating IP addresses as
  the integers they were designed to be. This makes certain logic much easier
  to express and improves micro-benchmarks compared to tuple-based libraries
  (for whatever that's worth). The most useful functionality is in
  `BitwiseIp.Block`, which represents a
  [CIDR](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing) block.
  However, `BitwiseIp` is the fundamental structure that `BitwiseIp.Block` is
  built on.
  """

  defstruct [:proto, :addr]

  @typedoc """
  An integer-encoded IP address.

  This type takes on two different shapes depending on the IP protocol. The
  supported protocols are IPv4 and IPv6.

  Normally, the distinction would be down to the number of bits in a
  fixed-width integer representation. However, the Erlang VM doesn't support
  fixed-width integers, so there's no way to tell IPv4 addresses apart from
  IPv6 addresses using just a number. Therefore, this type is a struct with two
  fields:

  * `:proto` - the protocol, either `:v4` or `:v6`
  * `:addr` - the integer encoding of the address

  But again, the VM does not support fixed-width integers for the `:addr`. In
  the Erlang runtime system, the smallest unit of memory is a *word*: 4 bytes
  on a 32-bit architecture, 8 bytes on a 64-bit architecture. Data is stored
  using *tagged pointers*, where one word has 4 bits reserved as a *tag*
  enumerating type information. One pattern of 4 bits says "I'm a float",
  another pattern says "I'm an integer", and so on. When the data is small
  enough to fit in the remaining bits of the word (28 bits or 60 bits,
  depending on the architecture), it is stored as an *immediate* value.
  Otherwise, it is *boxed* and the word instead contains a pointer to a section
  of memory on the heap, which can basically be arbitrarily large. Read more in
  [*A staged tag scheme for
  Erlang*](http://www.it.uu.se/research/publications/reports/2000-029/) by
  Mikael Pettersson.

  What this means for us is that `:addr` may or may not spill onto the heap. On
  a 32-bit machine, only IP addresses in the range of 0 to 2^28 fit as
  immediate values. This covers most of the IPv4 range, but only a small
  portion of the IPv6 range. 64-bit machines have 60 bits to play with, which
  would comfortably fit any IPv4 address, but still requires boxing of IPv6
  addresses. According to the [Erlang efficiency
  guide](http://erlang.org/doc/efficiency_guide/advanced.html), large integers
  are stored across at least 3 words. What's more, because we have to
  distinguish between integers using the struct with the `:proto` field, each
  IP address requires an additional map allocation, which carries some
  overhead.

  So this isn't going to be a maximally compact representation of an IP
  address. Such a thing isn't really possible on the Erlang VM. However,
  storing the bulk of it as a single integer still lets us perform efficient
  bitwise operations with less overhead than, say, `:inet`-style tuples of
  multiple integers.
  """

  @type t() :: v4() | v6()

  @typedoc """
  An IPv4 address.

  The `:addr` is an unsigned integer between 0 and 2^32 - 1. See `t:t/0` for
  discussion about the in-memory representation.
  """

  @type v4() :: %BitwiseIp{proto: :v4, addr: integer()}

  @typedoc """
  An IPv6 address.

  The `:addr` is an unsigned integer between 0 and 2^128 - 1. See `t:t/0` for
  discussion about the in-memory representation.
  """

  @type v6() :: %BitwiseIp{proto: :v6, addr: integer()}

  @doc """
  An error-raising variant of `parse/1`.

  This function parses IPv4 and IPv6 strings in their respective notations and
  produces an encoded `BitwiseIp` struct. If the string is invalid, it raises
  an `ArgumentError`.

  `BitwiseIp` implements the `String.Chars` protocol, so parsing can be undone
  using `to_string/1`.

  ## Examples

  ```
  iex> BitwiseIp.parse!("127.0.0.1")
  %BitwiseIp{proto: :v4, addr: 2130706433}

  iex> BitwiseIp.parse!("::1")
  %BitwiseIp{proto: :v6, addr: 1}

  iex> BitwiseIp.parse!("not an ip")
  ** (ArgumentError) Invalid IP address "not an ip"

  iex> BitwiseIp.parse!("192.168.0.1") |> to_string()
  "192.168.0.1"

  iex> BitwiseIp.parse!("fc00::") |> to_string()
  "fc00::"
  ```
  """

  @spec parse!(String.t()) :: t()

  def parse!(address) do
    case parse(address) do
      {:ok, ip} -> ip
      {:error, message} -> raise ArgumentError, message
    end
  end

  @doc """
  Parses a string into a bitwise IP.

  This function parses IPv4 and IPv6 strings in their respective notations and
  produces an encoded `BitwiseIp` struct. This is done in an error-safe way by
  returning a tagged tuple. To raise an error, use `parse!/1` instead.

  `BitwiseIp` implements the `String.Chars` protocol, so parsing can be undone
  using `to_string/1`.

  ## Examples

  ```
  iex> BitwiseIp.parse("127.0.0.1")
  {:ok, %BitwiseIp{proto: :v4, addr: 2130706433}}

  iex> BitwiseIp.parse("::1")
  {:ok, %BitwiseIp{proto: :v6, addr: 1}}

  iex> BitwiseIp.parse("not an ip")
  {:error, "Invalid IP address \\"not an ip\\""}

  iex> BitwiseIp.parse("192.168.0.1") |> elem(1) |> to_string()
  "192.168.0.1"

  iex> BitwiseIp.parse("fc00::") |> elem(1) |> to_string()
  "fc00::"
  ```
  """

  @spec parse(String.t()) :: {:ok, t()} | {:error, String.t()}

  def parse(address) do
    case :inet.parse_strict_address(address |> to_charlist()) do
      {:ok, ip} -> {:ok, encode(ip)}
      {:error, _} -> {:error, "Invalid IP address #{inspect(address)}"}
    end
  end

  @doc """
  Encodes an `:inet`-style tuple as a bitwise IP.

  The Erlang standard library represents IP addresses as tuples of integers: 4
  octet values for IPv4, 8 hextet values for IPv6. This function encodes the
  separate values as a single number, which gets wrapped into a `BitwiseIp`
  struct. This can be undone with `decode/1`.

  Beware of redundant usage in performance-critical paths. Because of the
  overhead in encoding the integer, excessive translation back & forth between
  the formats may outweigh any benefits gained from other operations on the
  single-integer representation.

  ## Examples

  ```
  iex> BitwiseIp.encode({127, 0, 0, 1})
  %BitwiseIp{proto: :v4, addr: 2130706433}

  iex> BitwiseIp.encode({0, 0, 0, 0, 0, 0, 0, 1})
  %BitwiseIp{proto: :v6, addr: 1}
  ```
  """

  def encode(inet)

  @spec encode(:inet.ip4_address()) :: v4()

  def encode({a, b, c, d}) do
    <<ip::32>> = <<a::8, b::8, c::8, d::8>>
    %BitwiseIp{proto: :v4, addr: ip}
  end

  @spec encode(:inet.ip6_address()) :: v6()

  def encode({a, b, c, d, e, f, g, h}) do
    <<ip::128>> = <<a::16, b::16, c::16, d::16, e::16, f::16, g::16, h::16>>
    %BitwiseIp{proto: :v6, addr: ip}
  end

  @doc """
  Decodes a bitwise IP into an `:inet`-style tuple.

  The Erlang standard library represents IP addresses as tuples of integers: 4
  octet values for IPv4, 8 hextet values for IPv6. This function decodes the
  single number from a `BitwiseIp` struct into its constituent parts. This can
  be undone with `encode/1`.

  Beware of redundant usage in performance-critical paths. Because of the
  overhead in decoding the integer, excessive translation back & forth between
  the formats may outweigh any benefits gained from other operations on the
  single-integer representation.

  ## Examples

  ```
  iex> BitwiseIp.decode(%BitwiseIp{proto: :v4, addr: 2130706433})
  {127, 0, 0, 1}

  iex> BitwiseIp.decode(%BitwiseIp{proto: :v6, addr: 1})
  {0, 0, 0, 0, 0, 0, 0, 1}
  ```
  """

  def decode(bitwise_ip)

  @spec decode(v4()) :: :inet.ip4_address()

  def decode(%BitwiseIp{proto: :v4, addr: ip}) do
    <<a::8, b::8, c::8, d::8>> = <<ip::32>>
    {a, b, c, d}
  end

  @spec decode(v6()) :: :inet.ip6_address()

  def decode(%BitwiseIp{proto: :v6, addr: ip}) do
    <<a::16, b::16, c::16, d::16, e::16, f::16, g::16, h::16>> = <<ip::128>>
    {a, b, c, d, e, f, g, h}
  end

  defimpl String.Chars do
    def to_string(ip) do
      BitwiseIp.decode(ip) |> :inet.ntoa() |> Kernel.to_string()
    end
  end
end