README.md

# Goblin

A lightweight, embedded database for Elixir.

## Features

- **ACID transactions** with serial write execution
- **Concurrent readers** that do not block each other
- **Crash recovery** via write-ahead logging
- **Automatic background compaction**
- **Bloom filters** for fast negative lookups
- **Any Elixir term** as key or value

## Usage

Install by adding `:goblin` as a dependency:
```elixir
def deps do
  [
    {:goblin, "~> 0.6.0"}
  ]
end
```
Then run `mix deps.get`.

### Starting a database

```elixir
{:ok, db} = Goblin.start_link(
  name: MyApp.DB,
  data_dir: "/path/to/db"
)
```

Options:
- `:name` - Registered name for the database (optional, defaults to `Goblin`)
- `:data_dir` - Directory path for database files (required)
- `:mem_limit` - Bytes to buffer in memory before flushing to disk (default: 64 MB)
- `:bf_fpp` - Bloom filter false positive probability (default: 0.01)

### Basic operations

```elixir
Goblin.put(db, :alice, "Alice")
# => :ok

Goblin.get(db, :alice)
# => "Alice"

Goblin.get(db, :nonexistent)
# => nil

Goblin.get(db, :nonexistent, default: :not_found)
# => :not_found

Goblin.remove(db, :alice)
# => :ok
```

> #### Note {: .warning}
>
> A key can be any Elixir term, but avoid mixing floats and integers as keys
> (e.g. both `1` and `1.0`). This can cause unexpected reads.

### Batch operations

```elixir
Goblin.put_multi(db, [{:alice, "Alice"}, {:bob, "Bob"}, {:charlie, "Charlie"}])
# => :ok

Goblin.get_multi(db, [:alice, :bob, :charlie])
# => [{:alice, "Alice"}, {:bob, "Bob"}, {:charlie, "Charlie"}]

Goblin.remove_multi(db, [:bob, :charlie])
# => :ok
```

### Range queries

```elixir
Goblin.select(db) |> Enum.to_list()
# => [{:alice, "Alice"}, {:bob, "Bob"}, {:charlie, "Charlie"}]

Goblin.select(db, min: :bob) |> Enum.to_list()
# => [{:bob, "Bob"}, {:charlie, "Charlie"}]

Goblin.select(db, min: :alice, max: :bob) |> Enum.to_list()
# => [{:alice, "Alice"}, {:bob, "Bob"}]
```

### Transactions

There are two types of transactions: **write** and **read**.

Write transactions (`Goblin.transaction/2`) are executed serially.
The function must return `{:commit, tx, reply}` to commit or `:abort` to abort.

```elixir
Goblin.transaction(db, fn tx ->
  counter = Goblin.Tx.get(tx, :counter, default: 0)
  tx = Goblin.Tx.put(tx, :counter, counter + 1)
  {:commit, tx, :ok}
end)
# => :ok

Goblin.transaction(db, fn tx ->
  counter = Goblin.Tx.get(tx, :counter, default: 0)

  if counter < 100 do
    tx = Goblin.Tx.put(tx, :counter, counter + 1)
    {:commit, tx, counter + 1}
  else
    :abort
  end
end)
# => 2 (if committed) or {:error, :aborted} (if aborted)
```

Read transactions (`Goblin.read/2`) take a snapshot and do not block each other.
Attempting to write within a read transaction raises.

```elixir
Goblin.read(db, fn tx ->
  alice = Goblin.Tx.get(tx, :alice)
  bob = Goblin.Tx.get(tx, :bob)
  {alice, bob}
end)
# => {"Alice", "Bob"}
```

### Tags

Keys can be namespaced under a tag. Tagged and untagged data are separate:
reading without a tag returns only untagged data, and reading with a tag
returns only data under that tag.

```elixir
Goblin.put(db, :alice, "Alice", tag: :admins)
# => :ok

Goblin.get(db, :alice)
# => nil

Goblin.get(db, :alice, tag: :admins)
# => "Alice"

Goblin.select(db, tag: :admins) |> Enum.to_list()
# => [{:alice, "Alice"}]

# Removing without a tag does not affect tagged data
Goblin.remove(db, :alice)
Goblin.get(db, :alice, tag: :admins)
# => "Alice"

# Remove with the matching tag
Goblin.remove(db, :alice, tag: :admins)
Goblin.get(db, :alice, tag: :admins)
# => nil
```

### PubSub

Processes can subscribe to write notifications. On each committed write,
subscribers receive `{:put, key, value}` or `{:remove, key}` messages.

```elixir
Goblin.subscribe(db)
# => :ok

Goblin.put(db, :alice, "Alice")
# subscriber receives {:put, :alice, "Alice"}

Goblin.remove(db, :alice)
# subscriber receives {:remove, :alice}

Goblin.unsubscribe(db)
# => :ok
```

### Backups

A snapshot of the database can be exported as a `.tar.gz` archive.
The archive can be unpacked and used as the `data_dir` for a new instance.

```elixir
Goblin.export(db, "/backups")
# => {:ok, "/backups/goblin_20260220T120000Z.tar.gz"}
```

### Supervision tree

```elixir
defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    children = [
      {Goblin, name: MyApp.DB, data_dir: "/var/lib/myapp/db"}
    ]

    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

Goblin.put(MyApp.DB, :alice, "Alice")
# => :ok

Goblin.get(MyApp.DB, :alice)
# => "Alice"
```

## Disk table format

Database files use the `<number>.goblin` naming convention.

| SST | SEPARATOR | FOOTER |
| --- | --- | --- |
| n * 1024 bytes | 16 bytes  | variable size |

### Sorted String Table (SST)

The SST consists of multiple blocks.
Each block is a multiple of 1024 bytes and contains:
- **Block ID** (16 bytes): `"GOBLINBLOCK00000"`
- **Span** (2 bytes): Number of blocks this entry spans
- **Data** (variable): Erlang term encoding of `{sequence, key, value}`
- **Padding** (variable): Zero-filled to reach n * 1024 bytes

### Footer

The footer contains metadata for efficient lookups:
- **Separator** (16 bytes): `"GOBLINSEP0000000"`
- **Bloom filter** (variable): Encoded bloom filter for membership testing
- **Key range** (variable): Min and max keys in the SST
- **Sequence range** (variable): Lowest and highest sequence numbers
- **Metadata** (57 bytes): Positions and sizes of footer components
- **CRC** (32-bit): Checksum over SST, bloom filter, key range, sequence range and metadata
- **Magic** (16 bytes): `"GOBLINFILE000000"`

### Migration

If a previous version of the disk table format is detected on startup,
the file is automatically migrated to the current version.

## Too many open files?

If you see `:emfile` errors, the file descriptor limit has been reached.
Increase it with `ulimit -n <limit>`.

## References

- [CubDB](https://github.com/lucaong/cubdb) - Embedded COW B+Tree database
- [RocksDB Documentation](https://github.com/facebook/rocksdb/wiki) - Facebook's LSM key-value store