README.md

# erlmld

This application allows Kinesis and DynamoDB streams to be processed using Erlang (by way
of the KCL MultiLangDaemon).


## Erlang - MultiLangDaemon concept

The [Kinesis Client Library](https://github.com/awslabs/amazon-kinesis-client) abstracts
away the distribution of work among a group of workers which are processing data found on
a Kinesis stream, including lease management and the assignment of specific workers to
specific shards.

A component of the KCL is the MultiLangDaemon, which allows non-Java applications to be
more easily integrated with the KCL by way of a simple JSON protocol: it launches one
subprocess per owned shard and communicates on stdin/stdout.

AdRoll needed the ability to perform Kinesis data processing in an Erlang system, and thus
this project was born.  We launch one `erlmld` supervision tree per Kinesis/DynamoDB
stream being processed in an Erlang node.  This results in an instance of the
MultiLangDaemon being launched as a port program for each stream being processed.

Each MLD process in turn launches one subprocess for each owned shard; these are simply
netcat or socat processes which map I/O back to the Erlang node via TCP, allowing us to do
all data processing in Erlang:

![Erlang - MultiLangDaemon concept](img/beam-mld.png)

A `gen_statem` worker implements the MLD protocol on the Erlang side; one of these runs in
a process for each shard owned by the current worker.

A [MultiLangDaemon
adapter](https://github.com/awslabs/dynamodb-streams-kinesis-adapter/blob/master/src/main/java/com/amazonaws/services/dynamodbv2/streamsadapter/StreamsMultiLangDaemon.java)
supports DynamoDB streams using the same MLD protocol.

AdRoll uses this to implement a high-performance global cache population system which
supports a real-time bidding platform also implemented in Erlang.