# k-ordered unique identity
The library develops an identity schema suitable for rough, lexicographical objects ordering.
## Inspiration
The library provide interface to generate k-ordered unique identifiers in lock-free and decentralized manner for Erlang application.
> An array `A[1..n]` is said to be k-ordered if `A[i − k] ≤ A[i] ≤ A[i + k]` for all `i` such that `k < i ≤ n−k`.
The library aims following objectives:
* use _64-bit_ for _storage friendly_ identity schema. It allows to reduce indexes footprints and optimize lookup latency.
* generated values are suitable for partial event ordering in distributed environment and helps on detection of causality violation.
* values are roughly sortable by time of allocation.
* the concept is similar to Twitter's snowflake identity.
## Background
The library has developed a dual schema: 64-bit unique identity to be used within Erlang virtual machine and 96-bit unique identity for distributed environments. Client applications benefits from short 64-bit identity using it locally and dynamically switches schema while communicating with remote peers.
The k-ordered value consists of time-stamp with millisecond resolution (50-bit) that is used to roughly sort events. The time-stamp ensures distinct sorting within virtual machine where system clock is controlled by operation system. A locally monotonic padding (14-bits) prevents the collisions (Note: 14-bits allows to have about 16K allocations per millisecond).
The time-stamp based sorting fails on distributed systems unless it implements precises time synchronization protocol. Therefore, it was decided to uses location aspect (node fingerprint) as component of k-ordered value. This component allows to keep ordering consistent even if clocks on other node is skewed. The developed schema allows to control the quality of the ordering (precision) depending on the length of _neighborhood interval_. The neighborhood interval is a time span where the location has higher priority then time.
The library represents k-ordered values as tuples but implements binary and url friendly base64 encoding. The serialization protocol has been changed after library version `1.2.0`. New serialization improves allocation performance of k-order values. It uses Erlang OTP/18 feature `erlang:unique_integer(...)` to generate locally monotonic value. The library allocates about 13M k-ordered values per second on reference hardware.
### 64-bit, local
```
20 bit 20 bit 10 bit 14 bit
|--------------|--------------|----------|-----------------|
A B C Seq
A, B, C - time stamp in micro seconds (casted to 50 bits)
Seq - sequence value generated by erlang:unique_integer([monotonic])
-type l() :: {uid, t(), seq()}.
```
* 50-bit time-stamp with millisecond resolution. The library uses Erlang native representation of time-stamp: `A` mega seconds 10⁶, `B` seconds, `C` micro seconds 10⁻⁶.
* `Seq` is 14-bit monotonic sequence identifier obtained from `erlang:unique_integer(...)`.
### 96-bit, global
```
20 bit 20 - I bit 32 bit I bit 10 bit 14 bit
|--------------|--------------|--------------|----------|----------|-----------------|
A B0 Node B1 C Seq
A, B, C - time stamp in microseconds (casted to 50 bits)
Seq - sequence value generated by erlang:unique_integer([monotonic])
Node - node fingerprint erlang:phash(erlang:node(), 1 bsl 32).
I - time neighborhood interval
-type g() :: {uid, id(), t(), seq()}.
```
* 50-bit time-stamp with millisecond resolution. The library uses Erlang native representation of time-stamp: `A` mega seconds 10⁶, `B` seconds, `C` micro seconds 10⁻⁶.
* `Seq` is 14-bit monotonic sequence identifier obtained from `erlang:unique_integer(...)`.
* `Node` 32-bit (optionally 64-bit) represent unique node identifier allocated randomly to each node (derived as hash from long node name). Node id has higher sorting priority then seconds fraction of time stamp. This allows to build reliable sorting without time synchronization.
* `I` is length of neighborhood interval that is configurable by application according to its needs. Its maximum value 10 bits that roughly corresponds to 15 minutes (default one 8 bits - 5 minutes).
## Getting Started
The latest version of the library is available at its `master` branch. All development, including new features and bug fixes, take place on the `master` branch using forking and pull requests as described in contribution guidelines.
The stable library release is available via hex packages, add the library as dependency to `rebar.config`
```erlang
{deps, [{uid}]}.
```
Libraries implements a simple [api](src/uid.erl).
```erlang
%%
%% generate local identity
%% {uid,{1539,625275,432128},1}
A = uid:l().
%%
%% convert local identity to global one
%% {uid,'uid@127.0.0.1',{1539,625275,432128},1}
B = uid:g(A).
%%
%% encode value to binary
%% <<0,96,57,138,3,235,39,50,123,105,128,1>>
uid:encode(B).
%%
%% encode value to base64
%% <<"AGA5igPrJzJ7aYAB">>
uid:encode64(B).
```
## How To Contribute
The library is Apache 2.0 licensed and accepts contributions via GitHub pull requests:
* Fork the repository on GitHub
* Read build instructions
* Make a pull request
The build process requires [Erlang/OTP](http://www.erlang.org/downloads) version 19.0 or later and essential build tools.
**Build** and **run** service in your development console. The following command boots Erlang virtual machine and opens Erlang shell.
```bash
git clone https://github.com/fogfish/uid
cd uid
make
make run
```
### commit message
The commit message helps us to write a good release note, speed-up review process. The message should address two question what changed and why. The project follows the template defined by chapter [Contributing to a Project](http://git-scm.com/book/ch5-2.html) of Git book.
>
> Short (50 chars or less) summary of changes
>
> More detailed explanatory text, if necessary. Wrap it to about 72 characters or so. In some contexts, the first line is treated as the subject of an email and the rest of the text as the body. The blank line separating the summary from the body is critical (unless you omit the body entirely); tools like rebase can get confused if you run the two together.
>
> Further paragraphs come after blank lines.
>
> Bullet points are okay, too
>
> Typically a hyphen or asterisk is used for the bullet, preceded by a single space, with blank lines in between, but conventions vary here
>
>
### bugs
If you experience any issues with the library, please let us know via [GitHub issues](https://github.com/fogfish/uid/issue). We appreciate detailed and accurate reports that help us to identity and replicate the issue.
* **Specify** the configuration of your environment. Include which operating system you use and the versions of runtime environments.
* **Attach** logs, screenshots and exceptions, in possible.
* **Reveal** the steps you took to reproduce the problem, include code snippet or links to your project.
## License
Copyright 2012 Dmitry Kolesnikov
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.