Honeydew 💪🏻🍈
========
[![Build Status](https://travis-ci.org/koudelka/honeydew.svg?branch=master)](https://travis-ci.org/koudelka/honeydew)
[![Hex pm](https://img.shields.io/hexpm/v/honeydew.svg?style=flat)](https://hex.pm/packages/honeydew)
Honeydew (["Honey, do!"](http://en.wiktionary.org/wiki/honey_do_list)) is a pluggable job queue + worker pool for Elixir.
- Workers are permanent and hold immutable state (a database connection, for example).
- Workers are issued only one job at a time, a job is only ever removed from the queue when it succeeds.
- Queues can exist locally, on another node in the cluster, in your Ecto database, or on a remote queue server (rabbitmq, etc...).
- If a worker crashes while processing a job, the job is recovered and a "failure mode" (abandon, move, retry, etc) is executed.
- Jobs are enqueued using `async/3` and you can receive replies with `yield/2`, somewhat like [Task](http://elixir-lang.org/docs/stable/elixir/Task.html).
- Queues, workers, dispatch strategies and failure/success modes are all plugable with user modules.
- Can optionally heal your cluster after a disconnect or downed node.
Honeydew attempts to provide "at least once" job execution, it's possible that circumstances could conspire to execute a job, and prevent Honeydew from reporting that success back to the queue. I encourage you to write your jobs idempotently.
Honeydew isn't intended as a simple resource pool, the user's code isn't executed in the requesting process. Though you may use it as such, there are likely other alternatives that would fit your situation better.
### tl;dr
- Check out the [examples](https://github.com/koudelka/honeydew/tree/master/examples).
- Enqueue and receive responses with `async/3` and `yield/2`.
- Emit job progress with `progress/1`
- Queue/Worker status with `Honeydew.status/1`
- Suspend and resume with `Honeydew.suspend/1` and `Honeydew.resume/1`
- List jobs with `Honeydew.filter/2`
- Cancel jobs with `Honeydew.cancel/2`
### Queue API Support
| | async/2 + yield/2 | filter/2 | status/1 | cancel/2 | suspend/1 + resume/1 |
|------------------------|:-----------------:|:------------------:|:--------------:|:--------------:|:--------------------:|
| ErlangQueue (`:queue`) | ✅ | ✅<sup>1</sup> | ✅ | ✅<sup>1</sup>| ✅ |
| Mnesia | ✅ | ✅<sup>1</sup> | ✅<sup>1</sup> | ✅ | ✅ |
| Ecto Poll Queue | ❌ | ❌ | ✅ | ✅<sup>2</sup>| ✅ |
[1] this is "slow", O(num_job)
[2] can't return `{:error, :in_progress}`, only `:ok` or `{:error, :not_found}`
### Queue Comparison
| | disk-backed<sup>1</sup> | replicated<sup>2</sup> | datastore-coordinated | auto-enqueue |
|------------------------|:-----------------------:|:----------------------:|----------------------:|-------------:|
| ErlangQueue (`:queue`) | ❌ | ❌ |❌ |❌ |
| Mnesia | ✅ (dets) | ❌ |❌ |❌ |
| Ecto Poll Queue | ✅ | ✅ |✅ |✅ |
[1] survives node crashes
[2] assuming you chose a replicated database to back ecto (tested with cockroachdb and postgres).
Mnesia replication may require manual intevention after a significant netsplit
### Ecto Poll Queue
The Ecto Poll Queue is an experimental queue designed to painlessly turn an already-existing Ecto schema into a queue, using your repo as the backing store. This eliminates the possiblity of your database and work queue becoming out of sync, as well as eliminating the need to run a separate queue node.
Check out the included [example project](https://github.com/koudelka/honeydew/tree/master/examples/ecto_poll_queue), and its README.
## Getting Started
In your mix.exs file:
```elixir
defp deps do
[{:honeydew, "~> 1.1.5"}]
end
```
You can run honeydew on a single node, or distributed over a cluster. Please see the README files included with the [examples](https://github.com/koudelka/honeydew/tree/master/examples).
### Suspend and Resume
You can suspend a queue (halt the distribution of new jobs to workers), by calling `Honeydew.suspend(:my_queue)`, then resume with `Honeydew.resume(:my_queue)`.
### Cancelling Jobs
To cancel a job that hasn't yet run, use `Honeydew.cancel/2`. If the job was successfully cancelled before execution, `:ok` will be returned. If the job wasn't present in the queue, `nil`. If the job is currently being executed, `{:error, :in_progress}`.
### Job Progress
Your jobs can emit their current status, i.e. "downloaded 10/50 items", using the `progress/1` function given to your job module by `use Honeydew.Progress`
Check out the [simple example](https://github.com/koudelka/honeydew/tree/master/examples/local/simple.exs).
### Queue Options
There are various options you can pass to `queue_spec/2` and `worker_spec/3`, see the [Honeydew](https://github.com/koudelka/honeydew/blob/master/lib/honeydew.ex) module.
### Failure Modes
When a worker crashes, a monitoring process runs the `handle_failure/3` function from the selected module on the queue's node. Honeydew ships with two failure modes, at present:
- `Honeydew.FailureMode.Abandon`: Simply forgets about the job.
- `Honeydew.FailureMode.Move`: Removes the job from the original queue, and places it on another.
- `Honeydew.FailureMode.Retry`: Re-attempts the job on its original queue a number of times, then calls another failure mode after the final failure.
See `Honeydew.queue_spec/2` to select a failure mode.
### Success Modes
When a job completes successfully, the monitoring process runs the `handle_success/2` function from the selected module on the queue's node. You'll likely want to use this callback for monitoring purposes. You can use a job's `:enqueued_at`, `:started_at` and `:completed_at` fields to calculate various time intervals.
See `Honeydew.queue_spec/2` to select a success mode.
## The Dungeon
### Job Lifecycle
In general, a job goes through the following stages:
```
- The requesting process calls `async/2`, which packages the task tuple/fn up into a "job" then sends
it to a member of the queue group.
- The queue process will enqueue the job, then take one of the following actions:
├─ If there is a worker available, the queue will dispatch the job immediately to the waiting
| worker via the selected dispatch strategy.
└─ If there aren't any workers available, the job will remain in the queue until a worker announces
that it's ready
- Upon dispatch, the queue "reserves" the job (marks it as in-progress), then spawns a local Monitor
process to watch the worker. The monitor starts a timer after which the job will be returned to the queue.
This is done to avoid blocking the queue waiting for confirmation from a worker that it has received the job.
└─ When the worker receives the job, it informs the monitor associated with the job. The monitor
then watches the worker in case the job crashes.
├─ When the job succeeds:
| ├─ If the job was enqueued with `reply: true`, the result is sent.
| ├─ The worker sends an acknowledgement message to the monitor.
| |─ The monitor sends an acknowledgement to the queue to remove the job.
| |─ The monitor executes the selected success mode
| └─ The worker informs the queue that it's ready for a new job. The queue checks the worker in with the
| dispatcher.
└─ If the worker crashes, the monitor executes the selected failure mode and terminates.
```
### Queues
Queues are the most critical location of state in Honeydew, a job will not be removed from the queue unless it has either been successfully executed, or been dealt with by the configured failure mode.
Honeydew includes a few basic queue modules:
- A simple FIFO queue implemented with the `:queue` and `Map` modules, this is the default.
- An Mnesia queue, configurable in all the ways mnesia is, for example:
* Run with replication (with queues running on multiple nodes)
* Persist jobs to disk (dets)
* Follow various safety modes ("access contexts").
- An Ecto-backed queue that automatically enqueues jobs when a new row is inserted.
If you want to implement your own queue, check out the included queues as a guide. Try to keep in mind where exactly your queue state lives, is your queue process(es) where jobs live, or is it a completely stateless connector for some external broker? Or a hybrid? I'm excited to see what you come up with, please open a PR! <3
### Dispatchers
Honeydew provides the following dispatchers:
- `Honeydew.Dispatcher.LRUNode` - Least Recently Used Node (sends jobs to the least recently used worker on the least recently used node, the default for global queues)
- `Honeydew.Dispatcher.LRU` - Least Recently Used Worker (FIFO, the default for local queues)
- `Honeydew.Dispatcher.MRU` - Most Recently Used Worker (LIFO)
You can also use your own dispatching strategy by passing it to `Honeydew.queue_spec/2`. Check out the [built-in dispatchers](https://github.com/koudelka/honeydew/tree/master/lib/honeydew/dispatcher) for reference.
### Worker State
Worker state is immutable, the only way to change it is to cause the worker to crash and let the supervisor restart it.
Your worker module's `init/1` function must return `{:ok, state}`. If anything else is returned or the function raises an error, the worker will die and restart after a given time interval (by default, five seconds).
### TODO:
- let the user decide if they want to `:ignore` during their init/1, to allow errors to bubble up the supervision tree.
- statistics?
- `yield_many/2` support?
- benchmark mnesia queue's dual filter implementations, discard one?
### Acknowledgements
Thanks to Marcelo Gornstein (@marcelo), for his [failing worker restart strategy](https://web.archive.org/web/20170929101642/http://inaka.net/blog/2012/11/29/every-day-erlang/).