# BencheeAsync
`Benchee` plugin for benchmarking multi-process performance for async work.
This plugin allows optimization of systems that are spread out over multiple processes. Benchee only allows benchmarking of a singular function within a singular executing process, and cannot keep track of cross-process work performed. **This plugin allows us to measure async units of work done, thereby allowing us to optimize our async pipelines.**
The goal of this library is to **approximately** track units of async work done and the rate of completion.
## Installation
```elixir
def deps do
[
# benchee is used internally
{:benchee, "~> 1.0", only: [:dev, :test]},
{:benchee_async, "~> 0.1.0", only: [:dev, :test]}
]
end
```
## Usage
The following **must be configured**:
1. Start the `BencheeAsync.Reporter` GenServer.
2. Benchmark functions must call `BencheeAsync.Reporter.record/0` to record a unit of work completed.
3. Set the `extended_statistics: true` option for `Benchee.Formatters.Console`
### Example
This shows an example of running Benchee from within a ExUnit test suite.
```elixir
defmodule MyAppTest do
use ExUnit.Case, async: false
test "measure async work!" do
# start the reporter process
start_supervised!(BencheeAsync.Reporter)
# use BencheeAsync instead of Benchee
BencheeAsync.run(
%{
"case_100_ms" => fn ->
Task.start(fn ->
:timer.sleep(100)
BencheeAsync.Reporter.record()
end)
:timer.sleep(2500)
end,
"case_1000_ms" => fn ->
Task.start(fn ->
:timer.sleep(1000)
BencheeAsync.Reporter.record()
end)
:timer.sleep(1500)
end
},
time: 1,
warmup: 3,
# use extended_statistics to view units of work done
formatters: [{Benchee.Formatters.Console, extended_statistics: true}]
)
end
end
```
The resulting console output will be as follows:
```
Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.15.5
Erlang 26.1.2
Benchmark suite executing with the following configuration:
warmup: 3 s
time: 1 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 8 s
Benchmarking case_1000_ms ...
Benchmarking case_100_ms ...
Name ips average deviation median 99th %
case_100_ms 9.92 0.101 s ±0.20% 0.101 s 0.101 s
case_1000_ms 1.00 1.00 s ±0.04% 1.00 s 1.00 s
Comparison:
case_100_ms 9.92
case_1000_ms 1.00 - 9.93x slower +0.90 s
Extended statistics:
Name minimum maximum sample size mode
case_100_ms 0.101 s 0.101 s 3 None
case_1000_ms 1.00 s 1.00 s 1 None
```
Interpretation differences from `Benchee` are as follows:
- `ips`: The maximum iterations per second of the async process(es) if the async logic was repeatedly executed in isolation.
- `average`, `deviation`, `median`, `99th %`: The statistics of execution time between each reported unit work done.
- `sample size`: The amount of reported units of work done, which will correspond to the number of `BencheeAsync.Reporter.report/1` calls.
### Usage with Inputs
Inputs work as well with no additional configuration needed.
```
Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.15.5
Erlang 26.1.2
Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: Bigger, Medium, Small
Estimated total run time: 18 s
Benchmarking case_faster with input Bigger ...
Benchmarking case_faster with input Medium ...
Benchmarking case_faster with input Small ...
Benchmarking case_slower with input Bigger ...
Benchmarking case_slower with input Medium ...
Benchmarking case_slower with input Small ...
##### With input Bigger #####
Name ips average deviation median 99th %
case_faster 1.08 M 0.00092 ms ±36.87% 0.00092 ms 0.00154 ms
case_slower 0.00001 M 75.90 ms ±0.22% 75.94 ms 76.03 ms
Comparison:
case_faster 1.08 M
case_slower 0.00001 M - 82215.44x slower +75.90 ms
Extended statistics:
Name minimum maximum sample size mode
case_faster 0.00013 ms 0.00154 ms 39 0.00088 ms
case_slower 75.27 ms 76.03 ms 20 None
##### With input Medium #####
Name ips average deviation median 99th %
case_faster 982.25 K 0.00102 ms ±151.32% 0.00083 ms 0.0123 ms
case_slower 0.0196 K 51.04 ms ±0.76% 51.00 ms 52.96 ms
Comparison:
case_faster 982.25 K
case_slower 0.0196 K - 50138.38x slower +51.04 ms
Extended statistics:
Name minimum maximum sample size mode
case_faster 0.00013 ms 0.0123 ms 58 0.00075 ms
case_slower 50.49 ms 52.96 ms 30 None
##### With input Small #####
Name ips average deviation median 99th %
case_faster 1.68 M 0.00059 ms ±38.29% 0.00058 ms 0.00108 ms
case_slower 0.00009 M 11.00 ms ±1.08% 11.01 ms 11.61 ms
Comparison:
case_faster 1.68 M
case_slower 0.00009 M - 18489.07x slower +11.00 ms
Extended statistics:
Name minimum maximum sample size mode
case_faster 0.00013 ms 0.00275 ms 272 0.00063 ms
case_slower 10.44 ms 11.69 ms 14311.02 ms, 11.04 ms, 11.01
```
### Usage in a Real World Application
It is advised to mock your async functions using [`:meck`](https://hexdocs.pm/meck/meck.html) or [`Mimic`](https://hexdocs.pm/mimic/Mimic.html). The mocked function would be where you trigger `BencheeAsync.Reporter.report/0`.
### Internals and Behavior
This library injects hooks into the `Benchee.run/1` in order to achieve async work benchmarking.
`BencheeAsync` utilizes the `Benchee` public APIs only to achieve the hook injections. All user provided hooks will be executed **after** the injected hooks.
Global hooks need to be injected in order to initiate tracking of post warmup timing and post-scenario timings.
To allow `BencheeAsync.Reporter.record/0` to work without specifying scenario name or input name, the input is used in the local `:before_scenario` hook in order to identify the scenario-input combination being benchmarked. The input is then hashed using `:erlang.phash2/2` for internal referencing.
### Limitations
The `memory_time` and `reduction_time` Benchee options will extend the execution time, hence the sample size will include counts beyond set run time value.