README.md

# RaftFleet

Elixir library to run multiple [Raft](https://raft.github.io/) consensus groups in a cluster of ErlangVMs

- [API Documentation](http://hexdocs.pm/raft_fleet/)
- [Hex package information](https://hex.pm/packages/raft_fleet)

[![Hex.pm](http://img.shields.io/hexpm/v/raft_fleet.svg)](https://hex.pm/packages/raft_fleet)
[![Build Status](https://travis-ci.org/skirino/raft_fleet.svg)](https://travis-ci.org/skirino/raft_fleet)
[![Coverage Status](https://coveralls.io/repos/github/skirino/raft_fleet/badge.svg?branch=master)](https://coveralls.io/github/skirino/raft_fleet?branch=master)

## Feature & Design

- Easy hosting of multiple "cluster-wide state"s
- Reasonably scalable placement of processes for multiple Raft consensus groups
    - consensus member processes are distributed to ErlangVMs in a data center-aware manner using [rendezvous hashing](https://en.wikipedia.org/wiki/Rendezvous_hashing)
    - automatic rebalancing on adding/removing nodes
- Each consensus group leader is accessible using the name of the consensus group (which must be an atom)
    - Actual pids of consensus leader processes are cached in a local ETS table for fast access
- Flexible data model (defined by [rafted_value](https://github.com/skirino/rafted_value))
- Decentralized architecture and fault tolerance

## Notes on backward compatibility

- Users of `<= 0.6.0` should upgrade to `0.6.1` before upgrading to `0.7.x` due to a change in internal data structure.
  While `<= 0.6.0` and `0.7.x` are not compatible, `0.6.1` should be able to interact with both `<= 0.6.0` and `0.7.x`.

## Example

Suppose we have a cluster of 4 erlang nodes:

```ex
$ iex --sname 1 -S mix
iex(1@skirino-Manjaro)>

$ iex --sname 2 -S mix
iex(2@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")

$ iex --sname 3 -S mix
iex(3@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")

$ iex --sname 4 -S mix
iex(4@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")
```

Load the following module that implements `RaftedValue.Data` behaviour on all nodes in the cluster.

```ex
defmodule JustAnInt do
  @behaviour RaftedValue.Data
  def new(), do: 0
  def command(i, {:set, j}), do: {i, j    }
  def command(i, :inc     ), do: {i, i + 1}
  def query(i, :get), do: i
end
```

Call `RaftFleet.activate/1` on all nodes.

```ex
iex(1@skirino-Manjaro)> RaftFleet.activate("zone1")

iex(2@skirino-Manjaro)> RaftFleet.activate("zone2")

iex(3@skirino-Manjaro)> RaftFleet.activate("zone1")

iex(4@skirino-Manjaro)> RaftFleet.activate("zone2")
```

Create 5 consensus groups each of which replicates an integer and has 3 consensus members.

```ex
iex(1@skirino-Manjaro)> rv_config = RaftedValue.make_config(JustAnInt)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus1, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus2, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus3, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus4, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus5, 3, rv_config)
```

Now we can run query/command from any node in the cluster:

```ex
iex(1@skirino-Manjaro)> RaftFleet.query(:consensus1, :get)
{:ok, 0}

iex(2@skirino-Manjaro)> RaftFleet.command(:consensus1, :inc)
{:ok, 0}

iex(3@skirino-Manjaro)> RaftFleet.query(:consensus1, :get)
{:ok, 1}
```

Activating/deactivating a node in the cluster triggers rebalancing of consensus member processes.

## Deployment notes

To run `raft_fleet` within an ErlangVM cluster, the followings are our general recommendations.

Cluster should consist of at least 3 nodes to tolerate 1 node failure.
Similarly cluster nodes should span 3 (or more) data centers, so that the system keeps on functioning in the face of 1 data center failure.

When you add new ErlangVM nodes, each node should run the following initialization steps:

1. establish connections to other running nodes,
2. call `RaftFleet.activate/1`.

These steps are typically done within `start/2` of the main OTP application.
Information of other running nodes should be available from e.g. IaaS API.

When terminating a node you should proceed as follows
(although `raft_fleet` tolerates failures that don't break quorums,
it's much better to tell `raft_fleet` to make preparations beforehand):

1. call `RaftFleet.deactivate/0` within the node-to-be-terminated,
2. wait for a while (say, 10 min) so that existing consensus group members are migrated to the other nodes, then
3. finally shutdown the node.

## Links

- [Raft official website](https://raft.github.io/)
- [The original paper](http://ramcloud.stanford.edu/raft.pdf)
- [The thesis](https://ramcloud.stanford.edu/~ongaro/thesis.pdf)
- [`rafted_value`](https://github.com/skirino/rafted_value) : Elixir implementation of the Raft consensus algorithm
- [My slides to introduce rafted_value and raft_fleet](https://skirino.github.io/slides/raft_fleet.html#/)