# MIT License
#
# Copyright (c) 2019-2023 Knoxen
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
defmodule Puid do
@moduledoc """
Simple, fast, flexible and efficient generation of probably unique identifiers (`puid`, aka
random strings) of intuitively specified entropy using pre-defined or custom characters.
## Overview
`Puid` provides fast and efficient generation of random IDs. For the purposes of `Puid`, a random
ID is considered a random string used in a context of uniqueness, that is, random IDs are a bunch
of random strings that are hopefully unique.
Random string generation can be thought of as a _transformation_ of some random source of entropy
into a string _representation_ of randomness. A general purpose random string library used for
random IDs should therefore provide user specification for each of the following three key
aspects:
### Entropy source
What source of randomness is being transformed? `Puid` allows easy specification of the function
used for source randomness.
### ID characters
What characters are used in the ID? `Puid` provides 16 pre-defined character sets, as well as
allows custom character designation, including Unicode
### ID randomness
What is the resulting “randomness” of the IDs? Note this isn't necessarily the same as the
randomness of the entropy source. `Puid` allows explicit specification of ID randomness in an
intuitive manner.
## Examples
Creating a random ID generator using `Puid` is a simple as:
```elixir
iex> defmodule(RandId, do: use(Puid))
iex> RandId.generate()
"8nGA2UaIfaawX-Og61go5A"
```
Options allow easy and complete control of ID generation.
### Entropy Source
`Puid` uses
[:crypto.strong_rand_bytes/1](https://www.erlang.org/doc/man/crypto.html#strong_rand_bytes-1) as
the default entropy source. The `rand_bytes` option can be used to specify any function of the
form `(non_neg_integer) -> binary` as the source:
```elixir
iex > defmodule(PrngPuid, do: use(Puid, rand_bytes: &:rand.bytes/1))
iex> PrngPuid.generate()
"bIkrSeU6Yr8_1WHGvO0H3M"
```
### ID Characters
By default, `Puid` use the [RFC 4648](https://tools.ietf.org/html/rfc4648#section-5) file system &
URL safe characters. The `chars` option can by used to specify any of 16 [pre-defined character
sets](#Chars) or custom characters, including Unicode:
```elixir
iex> defmodule(HexPuid, do: use(Puid, chars: :hex))
iex> HexPuid.generate()
"13fb81e35cb89e5daa5649802ad4bbbd"
iex> defmodule(DingoskyPuid, do: use(Puid, chars: "dingosky"))
iex> DingoskyPuid.generate()
"yiidgidnygkgydkodggysonydodndsnkgksgonisnko"
iex> defmodule(DingoskyUnicodePuid, do: use(Puid, chars: "dîñgø$kyDÎÑGØßK¥", total: 2.5e6, risk: 1.0e15))
iex> DingoskyUnicodePuid.generate()
"øßK$ggKñø$dyGîñdyØøØÎîk"
```
### ID Randomness
Generated IDs have 128-bit entropy by default. `Puid` provides a simple, intuitive way to specify
ID randomness by declaring a `total` number of possible IDs with a specified `risk` of a repeat in
that many IDs:
To generate up to _10 million_ random IDs with _1 in a trillion_ chance of repeat:
```elixir
iex> defmodule(MyPuid, do: use(Puid, total: 10.0e6, risk: 1.0e15))
iex> MyPuid.generate()
"T0bFZadxBYVKs5lA"
```
The `bits` option can be used to directly specify an amount of ID randomness:
```elixir
iex> defmodule(Token, do: use(Puid, bits: 256, chars: :hex_upper))
iex> Token.generate()
"6E908C2A1AA7BF101E7041338D43B87266AFA73734F423B6C3C3A17599F40F2A"
```
## Module API
Module functions:
- **generate/0**: Generate a random **puid**
- **total/1**: total **puid**s which can be generated at a specified `risk`
- **risk/1**: risk of generating `total` **puid**s
- **encode/1**: Encode `bytes` into a **puid**
- **decode/1**: Decode a `puid` into **bytes**
- **info/0**: Module information
The `total/1`, `risk/1` functions provide approximations to the **risk** of a repeat in some **total** number of generated **puid**s. The mathematical approximations used purposely _overestimate_ **risk** and _underestimate_ **total**.
The `encode/1`, `decode/1` functions convert **puid**s to and from **bits** to facilitate binary data storage, e.g. as an **Ecto** type. Note that for efficiency `Puid` operates at a bit level, so `decode/1` of a **puid** produces _representative_ bytes such that `encode/1` of those **bytes** produces the same **puid**. The **bytes** are the **puid** specific _bitstring_ with 0 bit values appended to the ending byte boundary.
The `info/0` function returns a `Puid.Info` structure consisting of:
- source characters
- name of pre-defined `Puid.Chars` or `:custom`
- entropy bits per character
- total entropy bits
- may be larger than the specified `bits` since it is a multiple of the entropy bits per
character
- entropy representation efficiency
- ratio of the **puid** entropy to the bits required for **puid** string representation
- entropy source function
- **puid** string length
#### Example
```elixir
iex> defmodule(SafeId, do: use(Puid))
iex> SafeId.generate()
"CSWEPL3AiethdYFlCbSaVC"
iex> SafeId.total(1_000_000)
104350568690606000
iex> SafeId.risk(1.0e12)
9007199254740992
iex> SafeId.decode("CSWEPL3AiethdYFlCbSaVC")
<<9, 37, 132, 60, 189, 192, 137, 235, 97, 117, 129, 101, 9, 180, 154, 84, 32>>
iex> SafeId.encode(<<9, 37, 132, 60, 189, 192, 137, 235, 97, 117, 129, 101, 9, 180, 154, 84, 32>>)
"CSWEPL3AiethdYFlCbSaVC"
iex> SafeId.info()
%Puid.Info{
characters: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_",
char_set: :safe64,
entropy_bits: 132.0,
entropy_bits_per_char: 6.0,
ere: 0.75,
length: 22,
rand_bytes: &:crypto.strong_rand_bytes/1
}
```
"""
import Puid.Entropy
import Puid.Util
@type t :: binary
@doc false
defmacro __using__(opts) do
quote do
alias Puid.Chars
puid_default = %Puid.Info{}
chars = unquote(opts)[:chars]
bits = unquote(opts)[:bits]
risk = unquote(opts)[:risk]
total = unquote(opts)[:total]
{puid_charlist, puid_char_set} =
if is_nil(chars) do
{puid_default.characters |> to_charlist(), puid_default.char_set}
else
charlist = Chars.charlist!(chars)
if is_atom(chars), do: {charlist, chars}, else: {charlist, :custom}
end
chars_encoding = Chars.encoding(puid_charlist)
if !is_nil(total) and is_nil(risk),
do: raise(Puid.Error, "Must specify risk when specifying total")
if is_nil(total) and !is_nil(risk),
do: raise(Puid.Error, "Must specify total when specifying risk")
entropy_bits =
cond do
is_nil(bits) and is_nil(total) ->
puid_default.entropy_bits
is_number(bits) and bits < 1 ->
raise Puid.Error, "Invalid bits. Must be greater than 1"
is_number(bits) ->
bits
!is_nil(bits) ->
raise Puid.Error, "Invalid bits. Must be numeric"
true ->
bits(total, risk)
end
rand_bytes = unquote(opts[:rand_bytes]) || (&:crypto.strong_rand_bytes/1)
if !is_function(rand_bytes), do: raise(Puid.Error, "rand_bytes not a function")
if :erlang.fun_info(rand_bytes)[:arity] !== 1,
do: raise(Puid.Error, "rand_bytes not arity 1")
chars_count = length(puid_charlist)
entropy_bits_per_char = :math.log2(chars_count)
puid_len = (entropy_bits / entropy_bits_per_char) |> :math.ceil() |> round()
avg_rep_bits_per_char =
puid_charlist
|> to_string()
|> byte_size()
|> Kernel.*(8)
|> Kernel./(chars_count)
ere = (entropy_bits_per_char / avg_rep_bits_per_char) |> Float.round(2)
puid_bits_per_char = log_ceil(chars_count)
@entropy_bits entropy_bits_per_char * puid_len
@bits_per_puid puid_len * puid_bits_per_char
@puid_len puid_len
defmodule __MODULE__.Bits,
do:
use(Puid.Bits,
chars_count: chars_count,
puid_len: puid_len,
rand_bytes: rand_bytes
)
if chars_encoding == :ascii do
defmodule __MODULE__.Encoder,
do:
use(Puid.Encoder.ASCII,
charlist: puid_charlist,
bits_per_char: puid_bits_per_char,
puid_len: puid_len
)
defmodule __MODULE__.Decoder,
do:
use(Puid.Decoder.ASCII,
charlist: puid_charlist,
puid_len: puid_len
)
else
defmodule __MODULE__.Encoder,
do:
use(Puid.Encoder.Utf8,
charlist: puid_charlist,
bits_per_char: puid_bits_per_char,
puid_len: puid_len
)
end
@doc """
Generate a `puid`
"""
@spec generate() :: String.t()
def generate(),
do: __MODULE__.Bits.generate() |> __MODULE__.Encoder.encode()
@doc """
Encode `bits` into a `puid`.
`bits` must contain enough bits to create a `puid`. The rest are ignored.
"""
@spec encode(bits :: bitstring()) :: String.t() | Puid.Error.t()
def encode(bits)
def encode(<<_::size(@bits_per_puid)>> = bits) do
try do
__MODULE__.Encoder.encode(bits)
rescue
_ ->
{:error, "unable to encode"}
end
end
def encode(_),
do: {:error, "unable to encode"}
@doc """
Decode `puid` into representative `bits`.
`puid` must a representative **puid** from this module.
NOTE: `decode/1` is not supported for non-ascii character sets
"""
@spec decode(puid :: String.t()) :: bitstring() | Puid.Error.t()
def decode(puid)
if chars_encoding == :ascii do
def decode(puid),
do: __MODULE__.Decoder.decode(puid)
else
def decode(_),
do: {:error, "not supported for non-ascii characters sets"}
end
@doc """
Approximate **total** possible **puid**s at a specified `risk`
"""
@spec total(risk :: float()) :: integer()
def total(risk),
do: round(Puid.Entropy.total(@entropy_bits, risk))
@doc """
Approximate **risk** in genertating `total` **puid**s
"""
@spec risk(total :: float()) :: integer()
def risk(total),
do: round(Puid.Entropy.risk(@entropy_bits, total))
mod_info = %Puid.Info{
characters: puid_charlist |> to_string(),
char_set: puid_char_set,
entropy_bits_per_char: Float.round(entropy_bits_per_char, 2),
entropy_bits: Float.round(@entropy_bits, 2),
ere: ere,
length: puid_len,
rand_bytes: rand_bytes
}
@puid_mod_info mod_info
@doc """
`Puid.Info` module info
"""
@spec info() :: %Puid.Info{}
def info(),
do: @puid_mod_info
end
end
end