# k-ordered unique identity

The library develops an identity schema suitable for rough, lexicographical objects ordering in distributed systems.

[![Build Status](]( 
[![Coverage Status](]( 

## Inspiration

The library provide interface to generate k-ordered unique identifiers in lock-free and decentralized manner for Erlang application. 

> An array `A[1..n]` is said to be k-ordered if `A[i − k] ≤ A[i] ≤ A[i + k]` for all `i` such that `k < i ≤ n−k`.

The library aims following objectives:

* use _64-bit_ for _storage friendly_ identity schema. It allows to reduce indexes footprints and optimize lookup latency.

* generated values are suitable for partial event ordering in distributed environment and helps on detection of causality violation.

* values are roughly sortable by time of allocation.

* the concept is similar to Twitter's snowflake identity.

## Background

The library has developed a dual schema: 64-bit unique identity to be used within Erlang virtual machine and 96-bit unique identity for distributed environments. Client applications benefits from short 64-bit identity using it locally and dynamically switches schema while communicating with remote peers. 

The k-ordered value consists of time-stamp with millisecond resolution (50-bit) that is used to roughly sort events. The time-stamp ensures distinct sorting within virtual machine where system clock is controlled by operation system. A locally monotonic padding (14-bits) prevents the collisions (Note: 14-bits allows to have about 16K allocations per millisecond). 

The time-stamp based sorting fails on distributed systems unless it implements precises time synchronization protocol. Therefore, it was decided to uses location aspect (node fingerprint) as component of k-ordered value. This component allows to keep ordering consistent even if clocks on other node is skewed. The developed schema allows to control the quality of the ordering (precision) depending on the length of _neighborhood interval_. The neighborhood interval is a time span where the location has higher priority then time. The usage of this time interval relaxes a clock synchronization requirements while preserving an ordering.  

The library represents k-ordered values as tuples but implements binary and url friendly encoding similar to base64. The serialization protocol has been changed after library version `1.2.0`. New serialization improves allocation performance of k-order values. It uses Erlang OTP/18 feature `erlang:unique_integer(...)` to generate locally monotonic value. The library allocates about 13M k-ordered values per second on reference hardware.

### 64-bit, local

        20 bit         20 bit       10 bit        14 bit
          A              B             C           Seq
   A, B, C - time stamp in micro seconds (casted to 50 bits)
   Seq     - sequence value generated by erlang:unique_integer([monotonic])

   -type l() :: {uid, t(), seq()}.

* 50-bit time-stamp with millisecond resolution. The library uses Erlang native representation of time-stamp: `A` mega seconds 10⁶, `B` seconds, `C` micro seconds 10⁻⁶.

* `Seq` is 14-bit monotonic sequence identifier obtained from `erlang:unique_integer(...)`.

### 96-bit, global

        20 bit        20 - I bit      32 bit       I bit      10 bit       14 bit
          A              B0            Node          B1         C           Seq

   A, B, C - time stamp in microseconds (casted to 50 bits)
   Seq     - sequence value generated by erlang:unique_integer([monotonic])
   Node    - node fingerprint erlang:phash(erlang:node(), 1 bsl 32).
   I       - time neighborhood interval 

   -type g() :: {uid, id(), t(), seq()}.

* 50-bit time-stamp with millisecond resolution. The library uses Erlang native representation of time-stamp: `A` mega seconds 10⁶, `B` seconds, `C` micro seconds 10⁻⁶.

* `Seq` is 14-bit monotonic sequence identifier obtained from `erlang:unique_integer(...)`.

* `Node` 32-bit (optionally 64-bit) represent unique node identifier allocated randomly to each node (derived as hash from long node name). Node id has higher sorting priority then seconds fraction of time stamp. This allows to build reliable sorting without time synchronization.

* `I` is length of neighborhood interval that is configurable by application according to its needs. Its maximum value 10 bits that roughly corresponds to 15 minutes (default one 8 bits - 5 minutes).

## Getting Started

The latest version of the library is available at its `master` branch. All development, including new features and bug fixes, take place on the `master` branch using forking and pull requests as described in contribution guidelines.

The stable library release is available via hex packages, add the library as dependency to `rebar.config`

{deps, [{uid}]}.

Libraries implements a simple [api](src/uid.erl).  

%% generate local identity
%%   {uid,{1539,625275,432128},1}
A = uid:l().

%% convert local identity to global one
%%   {uid,'uid@',{1539,625275,432128},1}
B = uid:g(A).

%% encode value to binary
%%   <<0,96,57,138,3,235,39,50,123,105,128,1>>

%% encode value to base64
%%   <<".5.tXVEf8n8vPN.0">>

## How To Contribute

The library is Apache 2.0 licensed and accepts contributions via GitHub pull requests:

1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Added some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request

The build process requires [Erlang/OTP]( version 19.0 or later and essential build tools.

**Build** and **run** service in your development console. The following command boots Erlang virtual machine and opens Erlang shell.

git clone
cd uid
make run

### commit message

The commit message helps us to write a good release note, speed-up review process. The message should address two question what changed and why. The project follows the template defined by chapter [Contributing to a Project]( of Git book.

> Short (50 chars or less) summary of changes
> More detailed explanatory text, if necessary. Wrap it to about 72 characters or so. In some contexts, the first line is treated as the subject of an email and the rest of the text as the body. The blank line separating the summary from the body is critical (unless you omit the body entirely); tools like rebase can get confused if you run the two together.
> Further paragraphs come after blank lines.
> Bullet points are okay, too
> Typically a hyphen or asterisk is used for the bullet, preceded by a single space, with blank lines in between, but conventions vary here

### bugs

If you experience any issues with the library, please let us know via [GitHub issues]( We appreciate detailed and accurate reports that help us to identity and replicate the issue. 

* **Specify** the configuration of your environment. Include which operating system you use and the versions of runtime environments. 

* **Attach** logs, screenshots and exceptions, in possible.

* **Reveal** the steps you took to reproduce the problem, include code snippet or links to your project.

## License