# Director
Flexible, fast and powerful supervisor library for Erlang processes.
## synopsis
##### What is process supervisor?
According to the [**erlang.org**](http://erlang.org/doc/man/supervisor.html):
A supervisor is a process that supervises other processes called child processes.
A child process can either be another supervisor or a worker process.
Supervisors are used to build a hierarchical process structure called a supervision tree, a nice way to structure a fault-tolerant application.
##### Can we use Director instead of standard OTP/supervisor?
Yes, you can also call OTP/supervisor API for your `director` process !
##### What is advantage of using Director instead of OTP/supervisor?
For example:
* You can have 5 restarts with 1000 mili-seconds interval between them for specific child(ren).
* You can tell `director` to delete specific child(ren) from supervision tree after 10th restart.
* You can tell `director` to stop and exit supervisor with crash reason of specific child(ren) process.
* You can tell `director` to restart specific child if it crashed with reason `oops` and restart it after 5000 mili-second if it crashed with reason `damn` and finally delete from supervisor if it crashed with reason `bye`.
You can make it like OTP/supervisor `simple_one_for_one` in efficient and flexible approach.
...
## Download
```sh
~ $ git clone https://github.com/Pouriya-Jahanbakhsh/director.git
```
## Compile
Note that **OTP>=19** required (if you want to upgrade it using `release_handler`).
Go to `director` and use `rebar` or `rebar3`.
```sh
~ $ cd director
```
rebar
```sh
~/director $ rebar compile
==> director_test (compile)
Compiled src/director.erl
~/director $
```
rebar3
```sh
~/director $ rebar3 compile
===> Verifying dependencies...
===> Compiling director
~/director $
```
## How it works
**director** needs a callback module (like OTP/supervisor).
In callback module you should export function `init/1`.
What `init/1` should return? wait, i'll explain step by step.
```erlang
-module(foo).
-export([init/1]).
init(_InitArg) ->
{ok, []}.
```
Save above code in `foo.erl` in **director** directory and go to the Erlang shell.
Use `erl -pa ./ebin` if you used `rebar` to compile it and use `rebar3 shell` if you used `rebar3`.
```erlang
Erlang/OTP 19 [erts-8.3] [source-d5c06c6] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V8.3 (abort with ^G)
1> c(foo).
{ok,foo}
2> Mod = foo.
foo
3> InitArg = undefined. %% i don't need it yet.
undefined
4> {ok, Pid} = director:start_link(Mod, InitArg).
{ok,<0.112.0>}
5>
```
Now we have a supervisor without children.
Good news is that **director** comes with full OTP/supervisor API and it has its advanced features and specific approachs too.
```erlang
5> director:which_children(Pid). %% You can use supervisor:which_children(Pid) too :)
[]
6> director:count_children(Pid). %% You can use supervisor:count_children(Pid) too :)
[{specs,0},{active,0},{supervisors,0},{workers,0}]
7> director:get_pids(Pid). %% You can NOT use supervisor:get_pids(Pid) because it hasn't :D
[]
```
OK, I'll make simple `gen_server` and give it to our **director**.
```erlang
-module(bar).
-behaviour(gen_server).
-export([start_link/0
,init/1
,terminate/2]). %% i am not going to use handle_call, handle_cast ,etc.
start_link() ->
gen_server:start_link(?MODULE, undefined, []).
init(_GenServerInitArg) ->
{ok, state}.
terminate(_Reason, _State) ->
ok.
```
Save above code in `bar.erl` and got back to the Shell.
```erlang
8> c(bar).
bar.erl:2: Warning: undefined callback function code_change/3 (behaviour 'gen_server')
bar.erl:2: Warning: undefined callback function handle_call/3 (behaviour 'gen_server')
bar.erl:2: Warning: undefined callback function handle_cast/2 (behaviour 'gen_server')
bar.erl:2: Warning: undefined callback function handle_info/2 (behaviour 'gen_server')
{ok,bar}
%% You should define unique id for your process.
9> Id = bar_id.
bar_id
%% You should tell director about start module and function for your process.
%% Should be tuple {Module, Function, Args}.
%% If your start function doesn't need arguments (like our example)
%% just use {Module, function}.
10> start = {bar, start_link}.
{bar,start_link}
%% What is your plan for your process?
%% Plan should be an empty list or list with n elemenst.
%% Every element can be one of
%% 'restart'
%% 'delete'
%% 'wait'
%% 'stop'
%% {'stop', Reason::term()}
%% {'restart', Time::pos_integer()}
%% for example my plan is:
%% [restart, {restart, 5000}, delete]
%% In first crash director will restart my process,
%% after next crash director will restart it after 5000 mili-seconds
%% and after third crash director will not restart it and will delete it
11> Plan = [restart, {restart, 5000}, delete].
[restart,{restart,5000},delete]
%% What if i want to restart my process 500 times?
%% Do i need a list with 500 'restart's?
%% No, you just need a list with one element, I'll explain it later.
12> Childspec = #{id => Id
,start => Start
,plan => Plan}.
#{id => bar_id,
plan => [restart,{restart,5000},delete],
start => {bar,start_link}}
13> director:start_child(Pid, Childspec). %% You can use supervisor:start_child(Pid, ChildSpec) too :)
{ok,<0.160.0>}
14>
```
Lets check it
```erlang
14> director:which_children(Pid).
[{bar_id,<0.160.0>,worker,[bar]}]
15> director:count_children(Pid).
[{specs,1},{active,1},{supervisors,0},{workers,1}]
%% What was get_pids/1?
%% It will returns all RUNNING ids with their pids.
16> director:get_pids(Pid).
[{bar_id,<0.160.0>}]
%% We can get Pid for specific RUNNING id too
17> {ok, BarPid1} = director:get_pid(Pid, bar_id).
{ok,<0.160.0>}
%% I want to kill that process
18> erlang:exit(BarPid1, kill).
true
%% Check all running pids again
19> director:get_pids(Pid).
[{bar_id,<0.174.0>}] %% pid was changed (restarted)
%% I want to kill that process again
%% and i will check children before spending time
20> {ok, BarPid2} = director:get_pid(Pid, bar_id), erlang:exit(BarPid2, kill).
true
21> director:get_pids(Pid).
[]
22> director:which_children(Pid).
[{bar_id,restarting,worker,[bar]}] %% restarting
23> director:get_pid(Pid, bare_id).
{error,not_found}
%% after 5000 ms
24> director:get_pids(Pid).
[{bar_id,<0.181.0>}]
25> %% Yoooohoooooo
```
I mentioned **advanced features**, what are they?
Lets see other acceptable keys for `Childspec` map.
```erlang
-type childspec() :: #{'id' => id()
,'start' => start()
,'plan' => plan()
,'count' => count()
,'terminate_timeout' => terminate_timeout()
,'type' => type()
,'modules' => modules()
,'append' => append()}.
%% 'id' is mandatory and can be any Erlang term
-type id() :: term().
%% Sometimes 'start' is optional ! just wait and read carefully
-type start() :: {module(), function()} % default Args is []
| mfa().
%% I explained 'restart' and {'restart', MiliSeconds}
%%
%% 'stop': director will crash with crash reason of child process.
%%
%% {'stop', Reason}: director exactly will crash with reason Reason.
%%
%% 'wait': director will not restart process,
%% but you can restart it using director:restart_child/2 and you can use supervisor:restart_child/2 too.
%%
%% fun/2: director will execute fun with 2 arguments.
%% First argument is crash reason for process and second argument is restart count for process.
%% Fun should return one of previous terms.
%% Default plan is:
%% [fun
%% (normal, _RestartCount) ->
%% delete;
%% (shutdown, _RestartCount) ->
%% delete;
%% ({shutdown, _Reason}, _RestartCount) ->
%% delete;
%% (_Reason, _RestartCount) ->
%% restart
%% end]
-type plan() :: [plan_element()] | [].
-type plan_element() :: 'restart'
| {'restart', pos_integer()}
| 'wait'
| 'stop'
| 'delete'
| {'stop', Reason::term()}
| fun((Reason::term()
,RestartCount::pos_integer()) ->
'restart'
| {'restart', pos_integer()}
| 'wait'
| 'stop'
| 'delete'
| {'stop', Reason::term()}).
%% How much time you want to run plan?
%% Default value is 1.
%% What if i want to restart my process 500 times?
%% Do i need a list with 500 'restart's?
%% You just need plan ['restart'] and 'count' 500 :)
-type count() :: 'infinity' | non_neg_integer().
%% How much time director should wait for process termination?
%% 0 means brutal kill and director will kill your process using erlang:exit(YourProcess, kill).
%% For workers default value is 1000 mili-seconds and for supervisors default value is 'infinity'.
-type terminate_timeout() :: 'infinity' | non_neg_integer().
%% default is 'worker'
-type type() :: 'worker' | 'supervisor'.
%% Default is first element of 'start' (process start module)
-type modules() :: [module()] | 'dynamic'.
%% :)
%% Default value is 'false'
%% I'll explan it later.
-type append() :: boolean().
```
Edit `foo` module:
```erlang
-module(foo).
-export([start_link/0
,init/1]).
start_link() ->
director:start_link({local, foo_sup}, ?MODULE, undefined).
init(_InitArg) ->
Childspec = #{id => bar_id
,plan => [wait]
,start => {bar,start_link}
,count => 1
,terminate_timeout => 2000},
{ok, [Childspec]}.
```
Go to the Erlang shell again:
```erlang
1> c(foo).
{ok,foo}
2> foo:start_link().
{ok,<0.121.0>}
3> director:get_childspec(foo_sup, bar_id).
{ok,#{append => false,count => 1,id => bar_id,
modules => [bar],
plan => [wait],
start => {bar,start_link,[]},
terminate_timeout => 2000,type => worker}}
4> {ok, Pid} = director:get_pid(foo_sup, bar_id), erlang:exit(Pid, kill).
true
5> director:which_children(foo_sup).
[{bar_id,undefined,worker,[bar]}] %% undefined
6> director:count_children(foo_sup).
[{specs,1},{active,0},{supervisors,0},{workers,1}]
7> director:get_plan(foo_sup, bar_id).
{ok,[wait]}
%% I can change process plan
%% I killed process one time.
%% If i kill it again, entire supervisor will crash with reason {reached_max_restart_plan... because 'count' is 1
%% But after changing plan, its counter will restart from 0.
8> director:change_plan(foo_sup, bar_id, [restart]).
ok
9> director:get_childspec(foo_sup, bar_id).
{ok,#{append => false,count => 1,id => bar_id,
modules => [bar],
plan => [restart], %% here
start => {bar,start_link,[]},
terminate_timeout => 2000,type => worker}}
10> director:get_pids(foo_sup).
[]
11> director:restart_child(foo_sup, bar_id).
{ok,<0.111.0>}
12> {ok, Pid2} = director:get_pid(foo_sup, bar_id), erlang:exit(Pid2, kill).
true
13> director:get_pid(foo_sup, bar_id).
{ok,<0.113.0>}
14> %% Hold on
```
##### Finally what the `append` key is?
actually always we have one `DefaultChildspec`.
```erlang
14> director:get_default_childspec(foo_sup).
#{count => 0,modules => [],plan => [],terminate_timeout => 0}
15>
```
`DefaultChildspec` is like normal childspecs except that it can't accept `id` and `append` keys.
If i change `append` value to `true` in my `Childspec`:
My `terminate_timeout` will be added to `terminate_timeout` of `DefaultChildspec`.
My `count` will be added to `count` of `DefaultChildspec`.
My `modules` will be added to `modules` of `DefaultChildspec`.
My `plan` will be added to `plan` of `DefaultChildspec`.
And if i have `start` key with value `{ModX, FuncX, ArgsX}` in `DefaultChildspec` and `start` key with value `{ModY, FunY, ArgsY}` in `Childspec`, final value will be `{ModY, FuncY, ArgsX ++ ArgsY}`.
And finally if i have `start` key with value `{Mod, Func, Args}` in `DefaultChildspec`, `start` key in `Childspec` is optional for me.
You can return your own `DefaultChildspec` as third element of tuple in `init/1`.
Edit `foo.erl`:
```erlang
-module(foo).
-behaviour(director). %% Yes, this is a behaviour
-export([start_link/0
,init/1]).
start_link() ->
director:start_link({local, foo_sup}, ?MODULE, null).
init(_InitArg) ->
Childspec = #{id => bar_id
,plan => [wait]
,start => {bar,start_link}
,count => 1
,terminate_timeout => 2000},
DefaultChildspec = #{start => {bar, start_link}
,terminate_timeout => 1000
,plan => [restart]
,count => 5},
{ok, [Childspec], DefaultChildspec}.
```
Restart the shell:
```erlang
1> c(foo).
{ok,foo}
2> foo:start_link().
{ok,<0.111.0>}
3> director:get_pids(foo_sup).
[{bar_id,<0.112.0>}]
4> director:get_default_childspec(foo_sup).
#{count => 5,
plan => [restart],
start => {bar,start_link,[]},
terminate_timeout => 1000}
5> Childspec1 = #{id => 1, append => true},
%% Default 'plan' is [Fun], so 'plan' will be [restart] ++ [Fun] or [restart, Fun].
%% Default 'count' is 1, so 'count' will be 1 + 5 or 6.
%% Args in above Childspec is [], so Args will be [] ++ [] or [].
%% Default 'terminate_timeout' is 1000, so 'terminate_timeout' will be 1000 + 1000 or 2000.
%% Default 'modules' is [bar], so 'modules' will be [bar] ++ [] or [bar].
5> director:start_child(foo_sup, Childspec1).
{ok,<0.116.0>}
%% Test
6> director:get_childspec(foo_sup, 1).
{ok,#{append => true,
count => 6,
id => 1,
modules => [bar],
plan => [restart,#Fun<director.default_plan_element_fun.2>],
start => {bar,start_link,[]},
terminate_timeout => 2000,
type => worker}}
7> director:get_pids(foo_sup).
[{bar_id,<0.112.0>},{1,<0.116.0>}]
%% I want to have 9 more children like that
8> [director:start_child(foo_sup
,#{id => Count, append => true})
|| Count <- lists:seq(2, 10)].
[{ok,<0.126.0>},
{ok,<0.127.0>},
{ok,<0.128.0>},
{ok,<0.129.0>},
{ok,<0.130.0>},
{ok,<0.131.0>},
{ok,<0.132.0>},
{ok,<0.133.0>},
{ok,<0.134.0>}]
10> director:count_children(foo_sup).
[{specs,11},{active,11},{supervisors,0},{workers,11}]
11>
```
You can change `defaultChildspec` dynamically using `change_default_childspec/2` !
And you can change `Childspec` of children dynamically too and set their `append` to `true` !
But with changing them in different parts of code, you will make [**spaghetti code**](https://en.wikipedia.org/wiki/Spaghetti_code)
### Can i debug director?
Yessssss, **diorector** has its own debug and accepts standard `sys:dbg_opt/0`.
**director** sends valid logs to `sasl` and `error_logger` in different states too.
```erlang
1> Name = {local, dname},
Mod = foo,
InitArg = undefined,
DbgOpts = [trace],
Opts = [{debug, DbgOpts}].
[{debug,[trace]}]
2> director:start_link(Name, Mod, InitArg, Opts).
{ok,<0.106.0>}
3>
3> director:count_children(dname).
*DBG* director "dname" got request "count_children" from "<0.102.0>"
*DBG* director "dname" sent "[{specs,1},
{active,1},
{supervisors,0},
{workers,1}]" to "<0.102.0>"
[{specs,1},{active,1},{supervisors,0},{workers,1}]
4> director:change_plan(dname, bar_id, [{restart, 5000}]).
*DBG* director "dname" got request "{change_plan,bar_id,[{restart,5000}]}" from "<0.102.0>"
*DBG* director "dname" sent "ok" to "<0.102.0>"
ok
5> {ok, Pid} = director:get_pid(dname, bar_id).
*DBG* director "dname" got request "{get_pid,bar_id}" from "<0.102.0>"
*DBG* director "dname" sent "{ok,<0.107.0>}" to "<0.102.0>"
{ok,<0.107.0>}
%% Start SASL
6> application:start(sasl).
ok
... %% Log about starting SASL
7> erlang:exit(Pid, kill).
*DBG* director "dname" got exit signal for pid "<0.107.0>" with reason "killed"
*DBG* director "dname" is running plan "{restart, 5000}" for id "bar_id"
true
=SUPERVISOR REPORT==== 4-May-2017::12:37:41 ===
Supervisor: dname
Context: child_terminated
Reason: killed
Offender: [{id,bar_id},
{pid,<0.107.0>},
{plan,[{restart,5000}]},
{count,1},
{count2,0},
{restart_count,0},
{mfargs,{bar,start_link,[]}},
{plan_element_index,1},
{plan_length,1},
{timer_reference,undefined},
{terminate_timeout,2000},
{extra,undefined},
{modules,[bar]},
{type,worker},
{append,false}]
8>
%% After 5000 mili-seconds
*DBG* director "dname" got restart event for child-id "bar_id" with timer reference "#Ref<0.0.1.176>"
=PROGRESS REPORT==== 4-May-2017::12:37:46 ===
supervisor: dname
started: [{id,bar_id},
{pid,<0.122.0>},
{plan,[{restart,5000}]},
{count,1},
{count2,1},
{restart_count,1},
{mfargs,{bar,start_link,[]}},
{plan_element_index,1},
{plan_length,1},
{timer_reference,#Ref<0.0.1.176>},
{terminate_timeout,2000},
{extra,undefined},
{modules,[bar]},
{type,worker},
{append,false}]
8>
```
## Warnings
* If you have plans like:
```erlang
Childspec1 = #{id => foo
,start => {bar, baz}
,plan => [restart,restart,delete,wait,wait, {restart, 4000}]
,count => infinity}.
Childspec2 = #{id => foo
,start => {bar, baz}
,plan => [restart,restart,stop,wait, {restart, 20000}, restart]
,count => infinity}.
Childspec3 = #{id => foo
,start => {bar, baz}
,plan => [restart,restart,stop,wait, {restart, 20000}, restart]
,count => 0}.
Childspec4 = #{id => foo
,start => {bar, baz}
,plan => []
,count => infinity}.
```
The rest of `delete` element in `Childspec1` and the rest of `stop` element in `Childspec2` will never evaluate!
In `Childspec3` you want to run your plan 0 times!
In `ChildSpec4` you have not any plan to run `infinity` times!
* When you upgrade a release using `release_handler`, `release_handler` calls `supervisor:get_callback_module/1` for fetching its callback module.
In OTP<19 `get_callback_module/1` uses supervisor internal state record for giving its callback module. Our **director** does not know about supervisor internal state record, then `supervisor:get_callback_module/1` does not work with **director**s.
Good news is that in OTP>=19 `supervisor:get_callback_module/1` works perfectly with **director**s :).
```erlang
1> foo:start_link().
{ok,<0.105.0>}
2> supervisor:get_callback_module(foo_sup).
foo
3>
```
### License
`BSD 3-Clause`
### Links
[**Github**](https://github.com/Pouriya-Jahanbakhsh/director)
This documentation is availible in [http://docs.codefather.org/director](http://docs.codefather.org/director) too.