README.md

## mstore [![Build Status](https://travis-ci.org/dalmatinerdb/mstore.svg?branch=master)](https://travis-ci.org/dalmatinerdb/mstore)

MStore is a experimental metric store build in erlang, the primary functions are `open`, `new`, `get` and `put`.

A datastore is defined by:
 * The size of the consistant hashing ring.
 * The number of entreis per metrics.
 * The initial offset.

For each chunk a index is created (defining the position of the metrics) and a datafile which holds the values. This makes reading a number of metrics as simple as a calculation and a sequential read.

For a store holding 1000 metrics writing to the the numbers 0-999 would be in the file 0, 1000-1999 would be in the second file etc.

## Idea

The basic idea is to take advantage of the special characteristics metrics have and modern filesystems. The following assumptions about metrics and filesystems are taken:

* Metrics occour in a regular interval (i.e. every second) skips happen but are rare
* Metrics are immutable. (i.e. once the cpu temperature was recorded for a measurement period it won't ever change again).
* Reads are highly sequential, 'give me the values between X and Y'.
* Metrics are written nearly sequentially, the delta of time between two metrics written will propably be small, this allows to limit the amount of open files.
* Metrics can be represented as 64bit integers. (this might change!)
* The filesystem uses checksums for data, this means we don't need to cehcksum values.
* The filesystem allows compression. This means longer stratches of non written metrics don't have a big impact since a bunch fo 0's on the FS will easiely be compressed away.
* The filesystem has a decent cacheing strategy (no need for mmap nonsense).
* The filesystem actually is ZFS.

## File Layout

### Set

A set allows to group metrics into a hash ring, this limits the size of single files open. The directory layout will be like this:

```
<base dir>/<chash key>/<offset>.{mstore,idx} - data and store index files
<base dir>/mstore - set index file
```

#### Index File (for a set)

The index file is simply a Erlang file that can be read via consult:

```
{FileSize, CHashSize, Seed, Metrics}.
```
* FileSize: The number of points per metric stored in the file.
* CHashSize: The number of elements in the CHash ring.
* Seed: A seed used to hash the metric keys, this is needed to allow putting a set behind another CHash ring (i.e. riak core). W/o the seed the distribution would not be even.
* Metrics: A list of all metrics stored in this set, used for looking up metrics.

### Store
#### Data File

Currently data is fixed to 64 bit (8 byte) integers this means a data file is layed out like this:

```
<metric 1:FileSize*8><metric 2:FileSize*8><metric 2:FileSize*8>
```

#### Index File (for a metric)

The index file is simply a Erlang file that can be read via consult:
```
{Offse, FileSize, [{Metric, Index}]}.
```

* Offset: the base offset of the file.
* FileSize: The number of points per metric stored in the file.
* Metric and Index: A list of metricses and their indexes in the file.