README.org

#+TITLE: Lee
[[https://travis-ci.org/k32/Lee.svg?branch=master]]

* User stories

- As a power user I want to configure tools without looking into their
  code. I want a useful error message instead of a BEAM dump when I
  make an error in the config. I want documentation about all
  configurable parameters, their purpose and type.

- As a software designer I want to focus on the business logic instead
  of dealing with the boring configuration-related stuff. I want to
  have a =?magic:get(Key)= function that returns a value that is
  guaranteed safe.

- As a designer I want to work with native Erlang data types

There are a few approaches to this conflict:

[[file:doc/images/explanation.png]]

This library /does/ provide =?magic:get/1= function. The below
document explains how. (If you just want to check what the API looks
like, skip to [[#gathering-it-all-together][Gathering it all together]])

* Introduction

/Lee/ helps creating type-safe, self-documenting configuration for
Erlang applications. It is basically a data modeling DSL, vaguely
inspired by [[https://tools.ietf.org/html/rfc7950][YANG]], however scaled down /a lot/.

Software configuration is a solved problem. The solution is to gather
all information about the user-facing commands and parameters in one
place called /data model/ and generate all kinds of code and
documentation from it, instead of spending a lot of time trying to
keep everything in sync and inevitably failing in the end.

This approach has been widely used in telecom where the number of
configurable parameters per device can easily reach
thousands. Unfortunately the existing solutions are extremely heavy
and difficult to deal with, also almost none of them is open
source. One doesn't want to mess with YANG compilers and proprietary
libraries for a mere small tool, and it's understandable. /Lee/
attempts to implement a /reasonably useful/ data modeling embedded
DSL, some bare-bones libraries for CLI and config file parsing,
together with the model validation routines in under 3000 LOC or
so. And be fully Erlang-native too.

The below document explains Lee from the bottom-up.

* Type reflections

As advertized, Lee configuration is fully aware of the Dialyzer
types. In order to make use of them to the runtime, Lee relies on
[[https://github.com/k32/typerefl][typerefl]] library.

* Defining models

/Model/ is the central concept in Lee. Generally speaking, Lee model
can be seen as a schema of the configuration. The model is a tree-like
structure where each node represent some entity.

Lee models are made of two basic building blocks: /namespaces/ and
/mnodes/. Namespace is a regular Erlang map where values are either
mnodes or other namespaces.

Mnode is a tuple that looks like this:

#+BEGIN_SRC erlang
{ MetaTypes      :: [MetaType :: atom()]
, MetaParameters :: #{atom() => term()}
, Children       :: lee:namespace()
}
#+END_SRC

or this:

#+BEGIN_SRC erlang
{ MetaTypes      :: [atom()]
, MetaParameters :: #{atom() => term()}
}
#+END_SRC

(The latter is just a shortcut where =Children= is an empty map.)

=MetaTypes= is a list of /metatype/ IDs which are applicable to the
mnode. Example metatypes:

 - =value= metatype means the mnode denotes some runtime data

 - =list= metatype denotes that mnode is a container for some data

=MetaParameters= field contains data relevant to the metatypes
assigned to the mnode. There are no strict rules about it. For
example, =value= metatype requires =type= metaparameter and optional
=default= one.

Finally, =Children= field allows nesting of models.

Any mnode can be identified by /model key/. Model key is a list of
namespace keys or =$children= atoms denoting mnode children.

The following example shows how to define a /Lee model module/:

#+BEGIN_SRC erlang
-spec model() -> lee:lee_module().
model() ->
    #{ foo =>
           {[value],
            #{ type => boolean()
             , oneliner => "This value controls fooing"
             }}
     , bar =>
           #{ baz =>
                  {[value],
                   #{ type => integer()
                    , oneliner => "This value controls bazing"
                    , default => 42
                    }}
            , quux =>
                  {[value],
                   #{ type => nonempty_list(atom())
                    , oneliner => "This value controls quuxing"
                    , default => [foo]
                    }}
            }
     }.
#+END_SRC

=[foo]=, =[bar, baz]= and =[bar, quux]= are valid /model keys/ in the
above model.

Model modules have a nice property: they are /composable/ as long as
their keys do not clash. One or many model modules make up a
/model/. Note: technically there is absolutely no difference between
/model module/ and /model/. The latter term denotes something that is
complete from the application point of view. Therefore in the rest of
the document both terms are used interchangeably.

Model modules should be merged and compiled to a machine-friendly form
before use. =lee_model:compile/2= function does that. Note that it
takes two arguments, both are lists of Lee models. The second argument
is application model (or *the* model), and the first one is a
/metamodel/, where all metatypes used in the application model are
defined.

* Data storage

Now what about actual data described by the models? Lee provides an
abstraction called =lee_storage= to keep track of it. Essentially any
key-value storage (from proplist to a mnesia table) can serve as a
=lee_storage=. There are a few prepackaged implementations:

 - =lee_map_storage= the most basic one storing data in a regular map
 - =lee_mnesia_storage= uses mnesia as storage, reads are transactional
 - =lee_dirty_mnesia_storage= same, but reads are dirty

Storage contents can be modified via /patches/. The following example
illustrates how to create a new storage and populate it with some
data:

#+BEGIN_SRC erlang
-spec data() -> lee:data().
data() ->
    %% Create am empty storage:
    Data0 = lee_storage:new(lee_map_storage),
    %% Define a patch:
    Patch = [ %% Set some values:
              {set, [foo], false}
            , {set, [bar, quux], [quux]}
              %% Delete a value (if present):
            , {rm, [bar, baz]}
            ],
    %% Apply the patch:
    lee_storage:patch(Data0, Patch).
#+END_SRC

** Data validation
It is possible to verify the entire storage of data against a model:

#+BEGIN_SRC erlang
main() ->
    {ok, Model} = lee_model:compile( [lee:base_metamodel()]
                                   , [lee:base_model(), model()]
                                   ),
    Data = data(),
    {ok, _Warnings} = lee:validate(Model, Data),
    ...
#+END_SRC

Successful validation ensures the following properties of =Data=:

 - All values described in the model are either present in =Data=, or
   =Model= declares their default values
 - All values present in =Data= have correct types

** Getting the data

Now when we know that data is complete and type-safe, getting values
becomes extremely simple:

#+BEGIN_SRC erlang
    [quux] = lee:get(Model, Data, [bar, quux]),
    false = lee:get(Model, Data, [foo]),
#+END_SRC

Note that =lee:get= returns plain values rather than something like
={ok, Value} | undefined=. This is perfectly safe, as long as the data
is validated using =lee:validate=.

Complete code of the example can be found [[file:doc/example/example_model.erl][here]].

* Creating patches

Creating patches can be model-driven too. Lee comes with a few modules
for reading data:

 - =lee_cli= read data by parsing CLI arguments
 - =lee_consult= read data from files via =file:consult=
 - =lee_os_env= read data from environment variables

In order to utilize these modules one should extend the model with new
metatypes and metaparameters:

#+BEGIN_SRC erlang
-spec model() -> lee:lee_module().
model() ->
    #{ foo =>
           {[value, cli_param], %% This value is read from CLI
            #{ type => boolean()
             , oneliner => "This value controls fooing"
             , cli_short => $f
             , cli_operand => "foo"
             }}
     , bar =>
           #{ baz =>
                  {[value, os_env], %% This value is read from environment variable
                   #{ type => integer()
                    , oneliner => "This value controls bazing"
                    , default => 42
                    , os_env => "BAZ"
                    }}
            , quux =>
                  {[value, cli_param, os_env],  %% This value is read from both CLI and environment
                   #{ type => nonempty_list(atom())
                    , oneliner => "This value controls quuxing"
                    , default => [foo]
                    , cli_operand => "quux"
                    , os_env => "QUUX"
                    }}
            }
     }.
#+END_SRC

Reading data is done like this:

#+BEGIN_SRC erlang
%% Test data:
-spec data(lee:model(), [string()]) -> lee:data().
data(Model, CliArgs) ->
    %% Create an empty storage:
    Data0 = lee_storage:new(lee_map_storage),
    %% Read environment variables:
    Data1 = lee_os_env:read_to(Model, Data0),
    %% Read CLI arguments and return the resulting data:
    lee_cli:read_to(Model, CliArgs, Data1).
#+END_SRC

Full code of the example can be found [[file:doc/example/example_model2.erl][here]].

* Extracting documentation from the model

It is possible to extract user manuals from a Lee model. First, one
has to annotate the model with =oneliner= and =doc= metaparameters,
like in the following example:

#+BEGIN_SRC erlang
#{ foo =>
     {[value],
      #{ oneliner => "This value controls fooing"
       , type     => integer()
       , default  => 41
       , doc      => "<para>This is a long and elaborate description of
                      the parameter using docbook markup.</para>
                      <para>It just goes on and on...</para>"
       }}
 }.
#+END_SRC

=oneliner= is a one sentence summary. =doc= is a more elaborate
description formatted using [[https://docbook.org/][DocBook]] markup. Also element with key
=['$doc_root']= should be added to the model, it contains general
information about the application:

#+BEGIN_SRC erlang
#{ '$doc_root' =>
     {[doc_root],
       #{ oneliner  => "This is a test model for doc extraction"
        , app_name  => "Lee Test Application"
        , doc       => "<para>Long and elaborate description of this
                        application</para>"
          %% Name of executable:
        , prog_name => "lee_test"
        }}
 }.
#+END_SRC

Then Lee does the job of assembling an intermediate DocBook file from
the fragments. Finally, [[https://pandoc.org/][pandoc]] is used to transform DocBook to HTML
([[https://k32.github.io/Lee/Lee%20Test%20Application.html][example]]), manpages, texinfo and what not.

Export of documentation is triggered like this:

#+BEGIN_SRC erlang
%% List of metatypes that should be mentioned in the documentation,
%% together with metatype-specific options affecting extraction
MTs = [ os_env
      , cli_param
      , consult
      , {consult, #{ filter      => [foo]
                   , config_name => "foo.conf"
                   }}
      , {consult, #{ filter      => [bar]
                   , config_name => "bar.conf"
                   }}
      , value
      ],
Config = #{ metatypes => MTs
          , run_pandoc => true
          },
lee_doc:make_docs(model(), Config)
#+END_SRC

** Why DocBook

DocBook is not the most popular and concise markup language, however
it was chosen because of the following properties:

 + It's the easiest format to assemble from small fragments
 + It's a supported source format in pandoc
 + It's whitespace insensitive. Given that the docstrings come from
   literals embedded into Erlang code, formatting of the source code
   should not affect the resulting documents. Also it generally
   focuses on structure rather than representation

* Gathering it all together

All Lee APIs that we discussed so far were stateless. Although being
stateless makes Lee extremely flexible, passing state around is hardly
practical for a configuration library. Lee comes with a simple
reference implementation of configuration server, that holds all data
in Mnesia and ensures that the data is always sound by validating
patches. The following example briefly shows how it can be used:

#+BEGIN_SRC erlang
main() ->
    application:ensure_all_started(lee),
    %% Apply patches in a transaction (invalid ones won't be applied).
    ok = lee_server:patch(fun(Model, Data) ->
                             Quux = lee:get(Model, Data, [quux]),
                             ...
                             %% Return a patch:
                             {ok, [ {set, [foo], Val}
                                  , {rm, [bar]}
                                  , ...
                                  ]}
                          end),
    %% Get data by dirty read:
    Foo = lee_server:get_d([foo]),
    %% Or get a consistent snapshot of data in a transaction:
    mnesia:transaction(
      fun() ->
        Foo = lee_server:get([foo]),
        Keys = lee_server:list([foo, ?children]),
        ...
      end),
   ...
#+END_SRC

* TODO Documentation

Note that the model already contains the docstrings which can be
easily transformed to manpages and what not. TBD

* Name?

This library is named after Tsung-Dao Lee, a physicist who predicted
P-symmetry violation together with Chen-Ning Yang.

* Design goals
** Composability

Be a library rather than framework. Don't enforce ways of
working. Some example use cases:

 - Safe and consistent configs. Lee should be able to interwork with
   mnesia-like DBs
 - On the other side configuration of the database itself may use Lee,
   so Lee itself should not rely on transactions after all

** Speed

Tl;dr: getting values from config should be very fast, but updating
and validating config may be slow.

It should be possible to use =lee:get= in hotspots. It means any call
to =lee:get= should be theoretically possible to implement using at
most (N + 1) hashtable lookups (N for the number of configuration
overlays and 1 for the defaults).