README.md

# FeistelCipher

Encrypted integer IDs using Feistel cipher

> **Database Support**: PostgreSQL only (uses PostgreSQL triggers and functions)

## Why?

**Problem**: Sequential IDs (1, 2, 3...) leak business information:
- Competitors can estimate your growth rate
- Users can enumerate resources (`/posts/1`, `/posts/2`...)
- Total record counts are exposed

**Common Solutions & Issues**:
- **UUIDs**: Strong uniqueness, but values differ across seed runs and are often too long for URLs (`/posts/550e8400-e29b-41d4-a716-446655440000`)
- **Random integers**: Shorter than UUIDs, but introduce collision risk and extra generation complexity

**This Library's Approach**:
- Store sequential integers internally
- Expose encrypted integers externally (non-sequential, unpredictable)
- Deterministic cipher core: the same `seq` value always maps to the same encrypted data component
- Automatic encryption via database trigger
- Adjustable bit size per column
- **Time-based prefix** for PostgreSQL incremental backup optimization

> If you need fully stable IDs across seed runs/environments, use `time_bits: 0` so IDs are generated from the ciphered data component only.

## Installation

> **Using Ash Framework?**
>
> If you're using [Ash Framework](https://ash-hq.org/), use [`ash_feistel_cipher`](https://github.com/devall-org/ash_feistel_cipher) instead! It provides a declarative DSL to configure Feistel cipher encryption directly in your Ash resources.
>
> For plain Ecto users, continue below.

### Using igniter (Recommended)

```bash
mix igniter.install feistel_cipher
```

### Manual Installation

```elixir
# mix.exs
def deps do
  [{:feistel_cipher, "~> 1.0"}]
end
```

Then run:
```bash
mix deps.get
mix feistel_cipher.install
```

> ⚠️ `mix feistel_cipher.install` is provided by Igniter. If your project does not use Igniter, create a migration manually and call `FeistelCipher.up_v1_functions/1` in `up` and `FeistelCipher.down_v1_functions/1` in `down`.

### Installation Options

Both methods support the following options:

* `--repo` or `-r`: Specify an Ecto repo (optional if auto-detection finds one)
* `--functions-prefix` or `-p`: PostgreSQL schema prefix (default: `public`)
* `--functions-salt` or `-s`: Cipher salt constant, max 2^31-1 (default: randomly generated)

> ⚠️ **Security Note**: A cryptographically random salt is generated by default for each project. This ensures that encryption patterns cannot be analyzed across different projects. Never use the same salt across multiple production projects.

> **Fun Fact**: Notice the timestamp `19730501000000` in the migration file generated during installation? That's May 1, 1973 - the day [Horst Feistel published his groundbreaking paper](https://en.wikipedia.org/wiki/Feistel_cipher#History) at IBM, introducing the cipher structure that powers this library. We thought it deserved a permanent timestamp in your database history! 🎂


## Upgrading from v0.x

See [UPGRADE.md](UPGRADE.md) for the migration guide.

## Usage Example

### 1. Create Migration

```elixir
defmodule MyApp.Repo.Migrations.CreatePosts do
  use Ecto.Migration

  def up do
    create table(:posts) do
      add :seq, :bigserial
      add :title, :string
    end

    # 1 day buckets
    execute FeistelCipher.up_for_v1_trigger("public", "posts", "seq", "id",
      time_bucket: 86400
    )
  end

  def down do
    execute FeistelCipher.down_for_v1_trigger("public", "posts", "seq", "id")
    drop table(:posts)
  end
end
```

### 2. Define Schema

```elixir
defmodule MyApp.Post do
  use Ecto.Schema

  # Hide seq in API responses
  @derive {Jason.Encoder, except: [:seq]}

  schema "posts" do
    field :seq, :id, read_after_writes: true
    field :title, :string
  end
end
```

The `read_after_writes: true` option tells Ecto to fetch the `seq` value after INSERT (since it's generated by the database).

Now when you insert a record, `seq` auto-increments and the trigger automatically sets `id = [time_prefix | feistel_cipher_v1(seq)]`:

```elixir
%Post{title: "Hello"} |> Repo.insert!()
# => %Post{id: 8234567890123, seq: 1, title: "Hello"}

# In API responses, only id is exposed (seq is hidden)
```

**Security Note**: Keep `seq` internal. Only expose `id` in APIs to prevent enumeration attacks.

## ID Structure

The generated ID has the structure `[time_bits | data_bits]`:

```
┌─────────────────┬──────────────────────────────────────────┐
│   time_bits     │              data_bits                   │
│   (15 bits)     │              (38 bits)                   │
│   time prefix   │     feistel_cipher_v1(seq)               │
└─────────────────┴──────────────────────────────────────────┘
```

- **time_bits** (upper): Derived from current time. Rows created in the same time bucket share the same prefix, clustering them on nearby PostgreSQL pages.
- **data_bits** (lower): The sequential value encrypted with Feistel cipher.

### Why Time Prefix?

PostgreSQL incremental backups (e.g., pg_basebackup with WAL, pgBackRest) back up entire **pages** (8KB blocks). Without a time prefix, Feistel cipher distributes IDs uniformly across all pages — meaning each new row touches a different page, and incremental backups become as large as full backups.

With a time prefix, rows from the same time bucket land on nearby pages, so incremental backups only need to capture the recently-modified pages.


### When to Use Time Prefix (`time_bits > 0`)

Use a time prefix when you want write locality and smaller incremental backups on large/high-write tables.

- Example: `events`, `logs`, `orders`, `messages` tables that receive continuous inserts.
- Typical config: `time_bits: 15`, `time_bucket: 86400` (daily, default) or `3600` (hourly for tighter locality windows).
- With `time_bits: 15`, `time_bucket: 86400`, and `encrypt_time: false`, the time prefix wraps after about 89 years 9 months.

### When NOT to Use Time Prefix (`time_bits: 0`)

Disable time prefix when you only need opaque IDs and don't need backup/page-locality optimization.

- Example: small reference tables (`countries`, `roles`, `currencies`) or low-write admin/config tables.
- Also useful when you want the simplest mode: `id = feistel_cipher_v1(seq)` with no time component.

## Trigger Options

`up_for_v1_trigger/5` takes 4 positional arguments and an options keyword list:

- Positional arguments: `prefix`, `table`, `from`, `to`
- Options:

> ⚠️ **Important**: Parameter changes should be handled as explicit migrations. Some options (like `time_bits`/`time_bucket`/`encrypt_time`) can be changed technically, but old/new IDs will use different semantics. Core cipher options (`data_bits`/`key`/`rounds`) should be treated as immutable in-place.

- `time_bits`: Time prefix bits (default: 15). Set to 0 for no time prefix
- `time_bucket`: Time bucket size in seconds (default: `86400`)
  - Example: `86400` for 1 day (default), `3600` for 1 hour
  - Rows inserted within the same bucket share the same time prefix
- `time_offset`: Time offset in seconds applied before bucket calculation (default: `0`)
  - Formula: `time_value = floor((epoch + time_offset) / time_bucket)`
  - Sign convention: positive values move the boundary earlier in local time; negative values move it later
  - Example: `time_bucket: 86400`, `time_offset: 21600` shifts daily boundary from `00:00 UTC` to `18:00 UTC` (`03:00 KST`)
  - Use this when business day boundaries differ from UTC midnight, or when multiple countries need a stable operational cutover time
- `encrypt_time`: Whether to encrypt the time prefix with Feistel cipher (default: `false`)
  - `false`: Time prefix may reflect recent bucket progression, but it is **not** a globally orderable timestamp
  - `true`: Time prefix is encrypted (hides time patterns, but same-bucket rows still share prefix). `time_bits` must be even
- `data_bits`: Data cipher bits (default: 38, must be even)
  - **Choose different sizes per column**: Unlike UUIDs (fixed 16 bytes), tailor each column's ID length
  - Example: User ID = 32 bits (~4B values), Post ID = 40 bits (~1T values)
  - Input values in `from` must fit this range (`0..2^data_bits-1`), or INSERT/UPDATE fails with a database error
- `rounds`: Number of Feistel rounds (default: 16, min: 1, max: 32)
  - **Default 16** provides good security/performance balance
  - **Note**: Diagrams and proofs in this README use 2 rounds for simplicity
  - More rounds = more secure but slower
  - Odd rounds (1, 3, 5...) and even rounds (2, 4, 6...) are both supported
- `key`: Encryption key (auto-generated if not specified)
- `functions_prefix`: Schema where cipher functions reside (default: `public`)

**Constraints**:
- `time_bits + data_bits` must be ≤ 63 when `encrypt_time: false`, and ≤ 62 when `encrypt_time: true`
- `time_bits` must be even when `encrypt_time: true`
- `data_bits` must be even

> ⚠️ You cannot reliably compare IDs by `time_bits` alone to determine temporal order. Because `time_value = floor(now / time_bucket) mod 2^time_bits`, the prefix wraps after `time_bucket * 2^time_bits` seconds. This feature is intended to improve PostgreSQL incremental backup locality, not to provide UUIDv7-style global time ordering.

### Why `time_offset` Exists

`time_bucket` alone uses UTC-based boundaries. For daily buckets, that means bucket changes at UTC midnight, which may split a local business day at awkward local times (for example, evening in the Americas or early morning in Europe).

`time_offset` lets you align bucket boundaries to your operational day (for example, 03:00 local cutover) without changing `time_bucket` size. This improves practical continuity for time-prefix clustering, especially when `encrypt_time: true` is enabled and the prefix itself is not human-readable.

In this library, `time_offset` is added to epoch before bucketing. That is why `+21600` (not `-21600`) gives a 03:00 KST boundary for daily buckets.

Example with custom options:
```elixir
execute FeistelCipher.up_for_v1_trigger(
  "public", "posts", "seq", "id",
  time_bits: 8,
  time_bucket: 86400,
  time_offset: 21600,
  data_bits: 32,
  key: 123456789,
  rounds: 8,
  functions_prefix: "crypto"
)
```

Example without time prefix:
```elixir
execute FeistelCipher.up_for_v1_trigger(
  "public", "posts", "seq", "id",
  time_bits: 0
)
```

## Advanced Usage

### Column Rename

When renaming columns that have triggers, drop and recreate the trigger:

```elixir
defmodule MyApp.Repo.Migrations.RenamePostsColumns do
  use Ecto.Migration

  def change do
    # 1. Drop the old trigger
    execute FeistelCipher.down_for_v1_trigger("public", "posts", "seq", "id")

    # 2. Rename columns
    rename table(:posts), :seq, to: :sequence
    rename table(:posts), :id, to: :external_id

    # 3. Recreate trigger with SAME encryption parameters
    # IMPORTANT: Generate key using OLD column names (seq, id)
    old_key = FeistelCipher.generate_key("public", "posts", "seq", "id")

    execute FeistelCipher.up_for_v1_trigger("public", "posts", "sequence", "external_id",
      time_bits: 15,               # Must match original
      time_bucket: 86400,          # Must match original
      data_bits: 38,               # Must match original
      key: old_key,                # Key from OLD column names
      rounds: 16,                  # Must match original
      functions_prefix: "public"   # Must match original
    )
  end
end
```

**⚠️ Critical**: When recreating triggers, ALL encryption parameters (`time_bits`, `time_bucket`, `data_bits`, `key`, `rounds`, `functions_prefix`) MUST match the original values. Otherwise:
- Updates will fail with exceptions
- 1:1 mapping breaks (new inserts may produce duplicate encrypted values)

> **⚠️ Warning**: Dropping a trigger removes encryption for that column pair. Only use this when intentionally removing or recreating the trigger.

## Alternative: Display-Only IDs

If you prefer to keep your sequential `id` as the primary key, you can use Feistel cipher for display-only columns. This approach is similar to using [Hashids](https://hashids.org/) or other ID obfuscation libraries, but with database-native encryption.

```elixir
# Migration
create table(:posts) do
  add :disp_id, :bigint    # Encrypted, for public APIs
  add :title, :string
end

create unique_index(:posts, [:disp_id])

execute FeistelCipher.up_for_v1_trigger("public", "posts", "id", "disp_id",
  time_bucket: 86400
)

# Schema
defmodule MyApp.Post do
  use Ecto.Schema

  # Hide internal id in API responses
  @derive {Jason.Encoder, except: [:id]}

  schema "posts" do
    field :disp_id, :id, read_after_writes: true
    field :title, :string
  end
end
```

Then only expose `disp_id` in your APIs while keeping `id` internal.

**Advantages over Hashids:** Database-native (no encoding/decoding).

## Performance

Encrypting 100,000 sequential values:

| Rounds | Total Time | Per Encryption |
|--------|------------|----------------|
| 1      | 180 ms     | ~1.8μs         |
| 2      | 285 ms     | ~2.8μs         |
| 4      | 475 ms     | ~4.7μs         |
| 8      | 824 ms     | ~8.2μs         |
| **16** | **1709 ms**| **~17.1μs**    |
| 32     | 3171 ms    | ~31.7μs        |

**Default is 16 rounds** - provides good security/performance balance with cryptographic HMAC-SHA256. The overhead per INSERT/UPDATE is negligible for most applications.

### Benchmark Environment

- **CPU**: Apple M1 Pro (10 cores)
- **Database**: PostgreSQL (local)
- **OS**: macOS
- **Elixir**: 1.19.4 / OTP 28

### Running Benchmarks

```bash
MIX_ENV=test mix run benchmark/rounds_benchmark.exs
```

Prerequisites:
- Local PostgreSQL reachable at the `config/test.exs` settings (`username: postgres`, `password: postgres`, `database: feistel_cipher_test`)
- Database/user created before running the benchmark command

The benchmark encrypts 100,000 sequential values (1 to 100,000) using a SQL batch function to minimize overhead and measure pure encryption performance.

## How It Works

The Feistel cipher is a symmetric structure used in the construction of block ciphers. This library implements a configurable Feistel network that transforms sequential integers into non-sequential encrypted integers with one-to-one mapping.

<p align="center">
  <img src="assets/feistel-diagram.png" alt="Feistel Cipher Diagram" width="66%">
</p>

> **Note**: The diagram above illustrates a 2-round Feistel cipher for simplicity. By default, this library uses **16 rounds** for better security. The number of rounds is configurable (see [Trigger Options](#trigger-options)).

### Self-Inverse Property

The Feistel cipher is **self-inverse**: applying the same function twice returns the original value. This means encryption and decryption use the exact same algorithm.

**Mathematical Proof:**

Let's denote the input as $(L_1, R_1)$ and the round function as $F(x)$.

**First application (Encryption):**

$$
\begin{aligned}
L_2 &= R_1, & R_2 &= L_1 \oplus F(R_1) \\
L_3 &= R_2, & R_3 &= L_2 \oplus F(R_2) \\
\text{Output} &= (R_3, L_3)
\end{aligned}
$$

**Second application (Decryption) - Starting with $(R_3, L_3)$:**

$$
\begin{aligned}
L_2' &= L_3, & R_2' &= R_3 \oplus F(L_3) \\
&= L_3, & &= R_3 \oplus F(R_2) \\
&= L_3, & &= (L_2 \oplus F(R_2)) \oplus F(R_2) \\
&= L_3, & &= L_2 = R_1 \quad \text{(XOR cancellation)} \\
\\
L_3' &= R_2' = R_1, & R_3' &= L_2' \oplus F(R_2') \\
&= R_1, & &= L_3 \oplus F(R_1) \\
&= R_1, & &= R_2 \oplus F(R_1) \\
&= R_1, & &= (L_1 \oplus F(R_1)) \oplus F(R_1) \\
&= R_1, & &= L_1 \quad \text{(XOR cancellation)} \\
\\
\text{Output} &= (R_3', L_3') = (L_1, R_1) \quad \checkmark
\end{aligned}
$$

**Key Insight:** The XOR operation's property $a \oplus b \oplus b = a$ ensures that each transformation is reversed when applied twice.

**Database Implementation:**

In the database trigger implementation, this means:
```sql
-- Encryption: seq → data part of id
data_component = feistel_cipher_v1(seq, data_bits, key, rounds)

-- Decryption: data part of id → seq (using the same function!)
seq = feistel_cipher_v1(data_component, data_bits, key, rounds)
```

### Key Properties

- **Deterministic**: Same input always produces same output
- **Non-sequential**: Sequential inputs produce seemingly random outputs
- **Collision-free**: One-to-one mapping within the bit range

## License

MIT