Skip to main content

README.md

# UUUIDv7 or uUUIDv7 or microUUIDv7 for Elixir

Version 7 UUIDs with submicrosecond precision, bulk generation, and a
boxed-struct Ecto type that skips the hex round-trip on inserts.

Normally UUIDv7 uses millisecond precision, which causes ordering issues
when generating UUIDs in bulk inside the same millisecond. This library
uses the 12-bit sub-millisecond field for added precision — at the cost
of trading 12 bits of randomness from `rand_a` (62 bits of `rand_b`
remain, still well above any cryptographic uniqueness threshold).

Requires Elixir 1.18+.

## Usage

### Basic UUID generation

```elixir
# Hex string
UUIDv7.generate()
# => "018e90d8-06e8-7f9f-bfd7-6730ba98a51b"

# Raw 16-byte binary
UUIDv7.bingenerate()
# => <<1, 142, 144, 216, 6, 232, 127, 159, 191, 215, 103, 48, 186, 152, 165, 27>>

# For a specific datetime
{:ok, dt, _} = DateTime.from_iso8601("2024-01-01T00:00:00.000Z")
UUIDv7.generate_from_datetime(dt)
```

### Bulk generation

`bingenerate_many/1` / `generate_many/1` produce strictly-ordered batches
much faster than calling `bingenerate/0` in a loop — the system clock is
read once and all the random bytes come from a single
`:crypto.strong_rand_bytes/1` call.

```elixir
UUIDv7.bingenerate_many(1000)
# => [<<...>>, <<...>>, ...]  # 1000 raw binaries, strictly ordered

UUIDv7.generate_many(1000)
# => ["018e...", "018e...", ...]  # same as hex strings
```

For batches of `n <= 4096` the 12-bit sub-millisecond field is
partitioned into `n` equal slots, with a crypto-strong jitter inside
each slot:

  * **strict ordering** — slot `i+1`'s minimum is always above slot
    `i`'s maximum
  * **randomized positions** inside each slot, so the 12-bit field
    doesn't fall on predictable multiples of `step`
  * **no PRNG overhead** — both `rand_b` (62 bits) and the jitter
    (12 bits) come from the same crypto call

For `n > 4096` the batch is auto-chunked across consecutive
milliseconds (each ms can hold at most 4096 strictly-ordered values):

```elixir
uuids = UUIDv7.bingenerate_many(10_000)
length(uuids)                         # => 10_000
uuids == Enum.sort(uuids)             # => true
```

This burns `div(n - 1, 4096)` "future" milliseconds; subsequent
`bingenerate/0` calls landing in those same milliseconds still won't
collide thanks to the 62 random bits per UUID, but the batch reaches
ahead of the wall clock.

### Benchmarks

From `bench/bench.exs` on Ryzen 7 7840HS (Elixir 1.19, JIT on):

| n     | `bingenerate/0` loop | `bingenerate_many/1` | speedup |
|-------|----------------------|----------------------|---------|
| 10    | 7.95 μs              | 1.22 μs              | 6.5×    |
| 100   | 75.40 μs             | 2.97 μs              | 25.4×   |
| 1000  | 802.05 μs            | 24.52 μs             | 32.7×   |
| 4096  | 3275 μs              | 207.56 μs            | 15.8×   |

### Boxed UUID struct + Ecto type

`UUID` is a struct that wraps the raw 16-byte binary instead of the
hex-encoded string. `UUIDv7.Boxed` is the matching Ecto type:

```elixir
schema "users" do
  @primary_key {:id, UUIDv7.Boxed, autogenerate: true}
  ...
end
```

Why use it? `Ecto.UUID` carries UUIDs as 36-char hex strings, which
forces a `hex → binary` decode on every `Repo.insert/1` (because
Postgres stores them as 16-byte binaries). `UUIDv7.Boxed` keeps the raw
bytes in a `%UUID{}` struct and only formats to hex on `to_string/1`,
`inspect/1`, and JSON encoding — so the round-trip cost is paid once at
display time, not on every write.

**Round-trip benchmark** (`autogenerate/0` + `dump/1`, per row):

| Type                    | ips    | avg     | memory |
|-------------------------|--------|---------|--------|
| `UUIDv7.Boxed` (struct) | 1.04 M | 0.96 μs | 272 B  |
| `UUIDv7` (hex string)   | 0.73 M | 1.37 μs | 358 B  |

1.43× faster and 24% less memory per insert.

```elixir
# Constructing
UUID.new("018e90d8-06e8-7f9f-bfd7-6730ba98a51b")
UUID.new(<<1, 142, 144, 216, 6, 232, 127, 159, 191, 215, 103, 48, 186, 152, 165, 27>>)

# Inspecting renders pasteable code — no import needed
inspect(uuid)
# => "UUID.new(\"018e90d8-06e8-7f9f-bfd7-6730ba98a51b\")"

# String/JSON encoding lazily formats to hex
to_string(uuid)
# => "018e90d8-06e8-7f9f-bfd7-6730ba98a51b"

# Bulk-generate boxed values
UUIDv7.Boxed.generate_many(1000)
# => [%UUID{}, %UUID{}, ...]
```

The `UUID` struct is version-agnostic — any 16-byte UUID can live
inside it. `UUIDv7.Boxed` is the v7 autogen integration.

### Date / DateTime <-> UUID

`min_uuid/1` returns a UUID with the given timestamp and all random bits
set to zero — useful for half-closed time-range queries on a UUIDv7
primary key. Accepts both `Date` (midnight UTC) and `DateTime`:

```elixir
# WHERE uuid >= ^UUIDv7.min_uuid(~D[2024-01-01])
#   AND uuid <  ^UUIDv7.min_uuid(~D[2024-02-01])

UUIDv7.min_uuid(~D[2024-01-01])
# => "018cc251-f400-7000-8000-000000000000"

UUIDv7.min_uuid(~U[2024-01-01 12:30:00Z])
# => "018cca8d-0d00-7000-8000-000000000000"
```

Ranges are continuous with no gaps or overlaps — ideal for partition
pruning when the table is range-partitioned on a UUIDv7 primary key.

The inverse extractors return UTC values:

```elixir
UUIDv7.to_datetime("018ecb40-c457-73e6-a400-000398daddd9")
# => ~U[2024-04-11 03:43:23.223Z]

UUIDv7.to_date("018ecb40-c457-73e6-a400-000398daddd9")
# => ~D[2024-04-11]
```

Both accept the hex string form or the raw 16-byte binary.

### Utility functions

```elixir
# Extract the millisecond timestamp from a UUID
UUIDv7.extract_timestamp(uuid)
# => 1712807003223

# Encode/decode between hex string and 16-byte binary
raw = UUIDv7.decode("018e90d8-06e8-7f9f-bfd7-6730ba98a51b")
hex = UUIDv7.encode(raw)
```

## Installation

```elixir
def deps do
  [
    {:uuuidv7, "~> 0.3.0"}
  ]
end
```

Optional dependencies — all auto-detected:

  * `ecto` — required for `UUIDv7` and `UUIDv7.Boxed` Ecto type integration
  * `jason` — `UUID` gets a `Jason.Encoder` impl when Jason is present
    (otherwise the Elixir 1.18+ built-in `JSON` module's
    `JSON.Encoder` impl is used)

## Running benchmarks

```bash
mix run bench/bench.exs
```