# GenStage
GenStage is a specification for exchanging events between producers and consumers.
This project currently provides the following functionality:
* `Experimental.GenStage` ([docs](https://hexdocs.pm/gen_stage/Experimental.GenStage.html)) - a behaviour for implementing producer and consumer stages
* `Experimental.DynamicSupervisor` ([docs](https://hexdocs.pm/gen_stage/Experimental.DynamicSupervisor.html)) - a supervisor designed for starting children dynamically. Besides being a replacement for the `:simple_one_for_one` strategy in the regular `Supervisor`, a `DynamicSupervisor` can also
be used as a stage consumer, making it straight-forward to spawn a new process for every
event in a stage pipeline
The module names are marked as `Experimental` to avoid conflicts as they are meant to be included in future Elixir releases. In your code, you may add `alias Experimental.{DynamicSupervisor, GenStage}` to the top of your files and use the relative names from then on.
You can find examples on how to use the modules above in the [examples](examples) directory:
* [ProducerConsumer](examples/producer_consumer.exs) - a simple example of setting up a pipeline of `A -> B -> C` stages and having events flowing through
* [DymamicSupervisor](examples/dynamic_supervisor.exs) - an example of how to use one or more `DynamicSupervisor` as a consumer to a producer that works as a counter
* [GenEvent](examples/gen_event.exs) - an example of how to use `GenStage` to implement a `GenEvent` replacement that leverages concurrency and provides more flexibility regarding buffer size and back-pressure
## Installation
GenStage requires Elixir v1.3.
1. Add `:gen_stage` to your list of dependencies in mix.exs:
def deps do
[{:gen_stage, "~> 0.1"}]
end
2. Ensure `:gen_stage` is started before your application:
def application do
[applications: [:gen_stage]]
end
## Future research
Here is a list of potential topics to be explored by this project (in no particular order or guarantee):
* Provide examples with delivery guarantees and/or checkpoints
* Consider using DynamicSupervisor to implement Task.Supervisor (as a consumer)
* TCP and UDP acceptors as producers
* Ability to attach filtering to producers - today, if we need to filter events from a producer, we need an intermediary process which incurs extra copying
* Connecting stages across nodes - because there is no guarantee demand is delivered, the demand driven approach in `GenStage` won't perform well in a distributed setup
* Integration with streams - how the `GenStage` foundation integrates with Elixir streams? In particular, streams composition is still purely functional while stages introduce asynchronicity.
* Exploit parallelism - how the GenStage foundation can help us implement conveniences like pmap, chunked map, farming and so on
* Explore different windowing strategies - the ideas behind the Apache Beam project are interesting, specially the mechanism that divides operations between what/where/when/how (1, 2) as well as windowing from the perspective of aggregation (3)
* Introduce key-based functions - after a `group_by` is performed, there are many operations that can be inlined like `map_by_key`, `reduce_by_key` and so on. The Spark project, for example, provides many functions for key-based functionality (4)
Other research topics include the Titan (5), Naiad's Differential Dataflow engine (6) and Lasp (7).
### Links
1. https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
2. http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
3. http://www.vldb.org/pvldb/vol8/p702-tangwongsan.pdf
4. http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs
5. http://asc.di.fct.unl.pt/~nmp/pubs/clouddp-2013.pdf
6. http://research-srv.microsoft.com/pubs/176693/differentialdataflow.pdf
7. https://lasp-lang.org/
## License
Same as Elixir.