# Persistence
ExDataSketch provides five persistence backends for storing and recovering
sketch state. All backends serialize sketches using the EXSK v2 binary format
with CRC32C checksum integrity.
## Supported Backends
| Backend | Module | Distribution | Durability | Transactional |
|----------|-----------------------------------|--------------|-----------------|---------------|
| ETS | `ExDataSketch.Storage.ETS` | Per-node | Process lifetime| No (RMW) |
| DETS | `ExDataSketch.Storage.DETS` | Per-node | Disk | No (file lock)|
| CubDB | `ExDataSketch.Storage.CubDB` | Per-node | Disk | Yes (MVCC) |
| Mnesia | `ExDataSketch.Storage.Mnesia` | Multi-node | Disk+RAM | Yes (ACID) |
| Ecto | `ExDataSketch.Storage.Ecto` | Multi-node | Database | Yes (DB) |
> **Note:** ETS `merge/3` uses a read-modify-write cycle without table-level
> locking, so concurrent writers may overwrite each other. For atomic merge
> guarantees, use Mnesia or Ecto. DETS provides file-lock serialization for
> single-node atomicity only.
## Unified API
Every backend implements the same operations:
```elixir
# Save a sketch under a key
:ok = Backend.save(sketch, storage, key)
# Load a sketch by key and module
{:ok, sketch} = Backend.load(SketchModule, storage, key)
# Merge into persisted sketch
:ok = Backend.merge(sketch, storage, key)
# Delete a sketch by key
:ok = Backend.delete(storage, key)
```
The `storage` argument varies by backend:
- ETS/DETS: table name (atom)
- CubDB: CubDB pid or name
- Mnesia: table name (atom)
- Ecto: Ecto repo module
## ETS
ETS provides fast in-memory storage. It is always available (no extra
dependencies required).
```elixir
# Create the table (application concern)
:ets.new(:sketches, [:set, :public, :named_table])
# Save
:ok = ExDataSketch.Storage.ETS.save(sketch, :sketches, "cardinality:2024-01")
# Load
{:ok, loaded} = ExDataSketch.Storage.ETS.load(ExDataSketch.HLL, :sketches, "cardinality:2024-01")
# Merge (read-modify-write, not truly atomic under concurrency)
:ok = ExDataSketch.Storage.ETS.merge(partial, :sketches, "cardinality:2024-01")
# Delete
:ok = ExDataSketch.Storage.ETS.delete(:sketches, "cardinality:2024-01")
```
ETS tables must be `:set` or `:ordered_set` type.
## DETS
DETS provides disk-backed storage that survives process and node restarts.
```elixir
# Open the table (application concern)
{:ok, _} = :dets.open_file(:sketches, [type: :set])
# Save, load, merge, delete -- same API as ETS
:ok = ExDataSketch.Storage.DETS.save(sketch, :sketches, "cardinality:2024-01")
{:ok, loaded} = ExDataSketch.Storage.DETS.load(ExDataSketch.HLL, :sketches, "cardinality:2024-01")
:ok = ExDataSketch.Storage.DETS.merge(partial, :sketches, "cardinality:2024-01")
:ok = ExDataSketch.Storage.DETS.delete(:sketches, "cardinality:2024-01")
# Close when done
:ok = :dets.close(:sketches)
```
DETS tables must be `:set` type. `:ordered_set` and `:bag` are not supported.
DETS has a practical 2GB file size limit.
## CubDB
CubDB provides disk-backed key-value storage with MVCC transactions. It
requires the `:cubdb` dependency.
Dependencies:
```elixir
{:cubdb, "~> 2.0"}
```
```elixir
# Start CubDB (application concern)
{:ok, db} = CubDB.start_link(data_dir: "/path/to/data")
# Save
:ok = ExDataSketch.Storage.CubDB.save(sketch, db, "cardinality:2024-01")
# Load
{:ok, loaded} = ExDataSketch.Storage.CubDB.load(ExDataSketch.HLL, db, "cardinality:2024-01")
# Atomic merge (uses CubDB transaction)
:ok = ExDataSketch.Storage.CubDB.merge(partial, db, "cardinality:2024-01")
# Delete
:ok = ExDataSketch.Storage.CubDB.delete(db, "cardinality:2024-01")
```
## Mnesia
Mnesia provides distributed, transactional storage across BEAM cluster nodes.
It is always available (no extra dependencies required).
```elixir
# Setup the table (once per node)
:ok = ExDataSketch.Storage.Mnesia.setup(:sketches)
# Or with disc copies:
:ok = ExDataSketch.Storage.Mnesia.setup(:sketches, disc_copies: [node()])
# Save
:ok = ExDataSketch.Storage.Mnesia.save(sketch, :sketches, "cardinality:2024-01")
# Load
{:ok, loaded} = ExDataSketch.Storage.Mnesia.load(ExDataSketch.HLL, :sketches, "cardinality:2024-01")
# Atomic merge (uses Mnesia transaction)
:ok = ExDataSketch.Storage.Mnesia.merge(partial, :sketches, "cardinality:2024-01")
# Delete
:ok = ExDataSketch.Storage.Mnesia.delete(:sketches, "cardinality:2024-01")
```
### Distributed Mnesia
For multi-node setups, create the table on all nodes before use:
```elixir
:ok = ExDataSketch.Storage.Mnesia.setup(:sketches, disc_copies: [node(), :other@host])
```
Mnesia transactions ensure atomic merge across all replicas. For operational
concerns including network partition recovery, refer to the Mnesia
documentation.
## Ecto
The Ecto backend stores sketches in a SQL database. It requires `:ecto_sql`.
Dependencies:
```elixir
{:ecto_sql, "~> 3.0"}
```
### Setup
Generate and run the migration:
```bash
mix ex_data_sketch.gen.migration --repo MyApp.Repo
mix ecto.migrate
```
Or add the migration manually:
```elixir
defmodule MyApp.Repo.Migrations.AddExDataSketchSketches do
use Ecto.Migration
def up do
ExDataSketch.Storage.Ecto.Migration.up()
end
def down do
ExDataSketch.Storage.Ecto.Migration.down()
end
end
```
### Usage
```elixir
# Save
:ok = ExDataSketch.Storage.Ecto.save(sketch, MyApp.Repo, "cardinality:2024-01")
# Load
{:ok, loaded} = ExDataSketch.Storage.Ecto.load(ExDataSketch.HLL, MyApp.Repo, "cardinality:2024-01")
# Atomic merge (uses Ecto transaction with SELECT FOR UPDATE)
:ok = ExDataSketch.Storage.Ecto.merge(partial, MyApp.Repo, "cardinality:2024-01")
# Delete
:ok = ExDataSketch.Storage.Ecto.delete(MyApp.Repo, "cardinality:2024-01")
```
The Ecto backend uses `SELECT ... FOR UPDATE` to ensure atomic merge in
concurrent environments.
## Choosing a Backend
| Use Case | Recommended Backend |
|----------|---------------------|
| Fast in-memory cache | ETS |
| Survive process restarts | DETS or CubDB |
| Simple disk persistence | CubDB |
| Distributed cluster | Mnesia |
| Existing Ecto app | Ecto |
| SQL database required | Ecto |
| Need ACID across nodes | Mnesia or Ecto |
## Configuration
Backends can be enabled or disabled via application config:
```elixir
config :ex_data_sketch,
persistence_backends: [
ets: [enabled: true],
dets: [enabled: true],
cubdb: [enabled: true],
mnesia: [enabled: true],
ecto: [enabled: true]
]
```
When not explicitly configured, a backend defaults to enabled if its runtime
dependency is available. Set `enabled: false` to disable a backend regardless
of dependency availability.
## See Also
- [Streaming Sketches](streaming_sketches.md)
- [Integration Guide](integrations.md)