README.md

# RaRegistry

A distributed Registry for Elixir GenServers using [Ra](https://github.com/rabbitmq/ra) (RabbitMQ's Raft implementation).

## Overview

RaRegistry provides similar functionality to Elixir's built-in [Registry](https://hexdocs.pm/elixir/Registry.html) module, but with distributed consensus via Ra, making it suitable for distributed applications across multiple nodes.

Key features:
- Support for both `:unique` and `:duplicate` registration modes
- Automatic process monitoring and cleanup
- Built on Ra, RabbitMQ's implementation of the Raft consensus protocol
- Regular operations with strong consistency during normal cluster operation
- Familiar API similar to Elixir's built-in Registry
- Enhanced recovery mechanisms for handling abrupt node down scenarios like SIGKILL
- Seamless integration with GenServer via the `:via` tuple registration

## Installation

The package can be installed by adding `ra_registry` to your list of dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:ra_registry, "~> 0.1"}
  ]
end
```

## Usage

The most common and recommended way to use RaRegistry is with GenServer via the `:via` tuple registration. This ensures your GenServers can be discovered across all nodes in your cluster:

```elixir
defmodule MyApp do
  # Add RaRegistry to your application supervision tree
  def start(_type, _args) do
    children = [
      # Start RaRegistry before any services that depend on it
      # You can configure any configuration related with the :ra cluster under ra_config.
      # wait for nodes range ms is a random range between two milliseconds values to ensure nodes are properly connected
      {
        RaRegistry,
        keys: :unique,
        name: MyApp.Registry,
        ra_config: %{data_dir: ~c"/tmp/ra"}, wait_for_nodes_range_ms: 3000..5000
      },
      
      # Other children in your supervision tree...
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

defmodule MyApp.Server do
  use GenServer
  
  def start_link(opts) do
    GenServer.start_link(__MODULE__, [], name: {:via, RaRegistry, {MyApp.Registry, opts[:id]}})
  end
  
  def call(id, message) do
    GenServer.call(via_tuple(id), message)
  end

  defp via_tuple(id), do: {:via, RaRegistry, {MyApp.Registry, id}}
  
  # GenServer implementation
  def init(state), do: {:ok, state}
  def handle_call(:ping, _from, state), do: {:reply, :pong, state}
  def handle_call({:get, key}, _from, state), do: {:reply, Map.get(state, key), state}
  def handle_call({:set, key, value}, _from, state), do: {:reply, :ok, Map.put(state, key, value)}
end

# Then, in your application code:
{:ok, pid} = MyApp.Server.start_link(id: "user_123")

# This call will work from any node in the cluster
MyApp.Server.call("user_123", {:set, :name, "John"})
MyApp.Server.call("user_123", {:get, :name}) # => "John"

# Should return already started regardless of the node you try to start the Server
{:error, {:already_started, ^pid}} = MyApp.Server.start_link(id: "user_123")
```

## Direct API Usage

RaRegistry can also be used directly for more complex scenarios:

```elixir
# Start registries (typically done in your application supervision tree)
RaRegistry.start_link(keys: :unique, name: MyRegistry)
RaRegistry.start_link(keys: :duplicate, name: DuplicateRegistry)

# Register processes
RaRegistry.register(MyRegistry, "unique_key", :some_value)
RaRegistry.register(DuplicateRegistry, "shared_key", :some_value)

# Look up processes
RaRegistry.lookup(MyRegistry, "unique_key")
# => [{#PID<0.123.0>, :some_value}]

RaRegistry.lookup(DuplicateRegistry, "shared_key")
# => [{#PID<0.123.0>, :some_value}, {#PID<0.124.0>, :other_value}]

# Count registered processes
RaRegistry.count(MyRegistry, "unique_key") # => 1
RaRegistry.count(DuplicateRegistry, "shared_key") # => 2

# Unregister processes
RaRegistry.unregister(MyRegistry, "unique_key")

# Using update value, first register it
:ok = RaRegistry.register(MyRegistry, "key_update", 1)

# Now update it
{:ok, 2} = RaRegistry.update_value(MyRegistry, "key_update", fn val -> val + 1 end)
```

## Debugging

You can manage the RaRegistry cluster using these functions:

```elixir
# Get current cluster members
RaRegistry.Manager.get_members(MyApp.Registry)
```

## Consistency and Recovery

### Consistency Model

RaRegistry offers these consistency guarantees:

- **Normal Operation**: Operations use the Raft consensus protocol via Ra, providing strong consistency when a majority of nodes are available
- **State Machine Atomicity**: Operations within the Ra state machine are atomic and either fully succeed or have no effect
- **Best-Effort Recovery**: During failure scenarios like SIGKILL of the leader, our implementation employs aggressive recovery mechanisms that prioritize availability and eventual recovery

It's important to understand that:
- The custom recovery mechanisms we've implemented extend beyond the standard Raft protocol
- During severe failures, the implementation might briefly prioritize availability over strict consistency
- After recovery, the system returns to a consistent state, though some in-flight operations might be lost

### Recovery Capabilities

RaRegistry includes specialized recovery mechanisms to handle various failure scenarios:

- Automatic leader election after clean node failures
- Emergency recovery procedures for SIGKILL scenarios
- Self-healing mechanisms when nodes rejoin the cluster
- Cleanup of dead process registrations

For critical systems, we recommend running at least 3 nodes to ensure quorum is maintained even if one node fails. This allows the system to continue operating consistently during most types of failures.

## Benchmarks

The project includes comprehensive benchmarks for performance evaluation:

```elixir
# Run all benchmarks
mix run benchmarks/run_all.exs

# Or run individual benchmarks:
mix run benchmarks/registry_bench.exs
mix run benchmarks/comparison_bench.exs
```

These benchmarks test various operations like register, lookup, and unregister with different workloads and concurrency levels. The comparison benchmarks help understand the performance tradeoffs between RaRegistry and Elixir's built-in Registry.

Note that RaRegistry is optimized for distributed consistency rather than raw performance. The built-in Registry will typically perform faster in a single-node environment, while RaRegistry provides the benefit of strong consistency across a cluster.

## License

Apache License 2.0