Skip to main content

README.md

# Rx

Drive R from Elixir.

Rx runs your R code in a persistent external `Rscript` process and moves data
back and forth across the boundary: JSON for scalars and vectors, Apache Arrow
IPC for data frames. Because R lives in a separate OS process, **R can't crash
the BEAM** — a misbehaving model or a segfaulting package takes down its own
process, not your application.

The public API covers init, eval, decode, print, capture mode, plot capture,
optional Kino plot rendering, an R `plotly` → `plotly_ex` handoff, data-frame
conversion with or without Arrow, and Explorer integration. An experimental
embedded native (NIF) backend also exists for opt-in, high-throughput
workflows — see [Experimental native backend](#experimental-native-backend) —
but the external process backend is the default and the one to reach for first.

## Why this exists

Rx was inspired by [Pythonx](https://github.com/livebook-dev/pythonx), which
embeds Python in the BEAM. The goal here is the same in spirit: let Elixir reach
into another language's ecosystem instead of reimplementing it.

To be clear up front — Elixir's own numerical and data tooling is genuinely good
now. [`Nx`](https://hex.pm/packages/nx) handles tensors and gets you to the GPU,
[`Explorer`](https://hex.pm/packages/explorer) gives you fast Polars-backed data
frames, [`Scholar`](https://hex.pm/packages/scholar) covers a growing slice of
classical machine learning, [`Statistics`](https://hex.pm/packages/statistics)
fills in common distributions and descriptive stats, and
[`Tucan`](https://hex.pm/packages/tucan) makes plotting pleasant. For a lot of
work, you don't need R at all, and you shouldn't reach for it reflexively.

But R and CRAN are the product of decades of statisticians shipping code, and
that's a deep well: 20,000+ packages covering survival analysis, mixed-effects
models, econometrics, Bayesian inference, bioinformatics, spatial statistics,
psychometrics, and countless niche methods that simply don't have an Elixir
equivalent, yet. When you need `lme4`, `survival`, `forecast`,
or some specialized package your domain depends on, rewriting it is rarely the
right call.

Rx is a bridge for exactly those moments: keep your application, orchestration,
and most of your data work in Elixir, and call out to R for the specific things
R does best.

## What ships

- `Rx.eval/3` — evaluate R source with Elixir-supplied globals; returns a result
  handle plus the R variables the code assigned.
- `Rx.decode/1` — bring R values back into Elixir (scalars, vectors, typed
  `NA`s, named lists, tables).
- `Rx.print/2` — render an R object's console output (fitted models, summaries)
  as text.
- `Rx.encode!/1` — turn an Elixir value into a reusable R-side handle.
- **Capture mode** — collect stdout, messages, and warnings instead of routing
  them to IO.
- `Rx.plot/3` — capture base and `ggplot2` plots as PNG.
- `Rx.Kino` *(optional)* — render captured plots in Livebook.
- `Rx.Plotly` *(optional)* — convert R `plotly` objects to `plotly_ex` figures.
- `Rx.DataFrame` / `Rx.Explorer` / `Rx.decode_arrow/1` — data-frame exchange,
  with or without the R `arrow` package.
- `Rx.renv_init/2` — run a session inside a reproducible `renv` project.

## Installation

```elixir
def deps do
  [
    {:rx, "~> 0.1.0"}
  ]
end
```

Optional integrations need their matching Elixir dependencies:

```elixir
{:explorer, "~> 0.11"}
{:kino, "~> 0.19.0"}
{:plotly_ex, "~> 0.1"}
```

## Requirements

- `Rscript` on your `PATH`.
- R packages:
  - `jsonlite` (**required** — used for all scalar and vector exchange)
  - `arrow` (optional — required only for Arrow/Explorer data-frame exchange)
  - `ggplot2` (optional — required for ggplot examples)
  - `plotly` (optional — required for `Rx.Plotly`)
  - `renv` (optional — required only for `Rx.renv_init/2` workflows)

Rx validates required R packages through the process backend. It fails early if
`jsonlite` is unavailable; `arrow` is checked only when a data-frame exchange API
is used, and `plotly` is checked only when an `Rx.Plotly` API is used.

Linux and macOS are supported. Windows is not currently supported.

## Basic usage

```elixir
:ok = Rx.init()

{result, globals} =
  Rx.eval(
    """
    y <- 10
    x + y
    """,
    %{"x" => 1}
  )

Rx.decode(result)
#=> 11.0

globals["y"]
#=> #Rx.Object<...>
```

`Rx.eval/3` does not reuse the previous call's R local variables automatically.
The backend keeps one persistent `Rscript` process, but each eval runs in a
fresh R environment populated from the `globals` argument. Variables your R code
assigns come back in the returned `globals` map; pass those handles into a later
eval when you want separate calls (or notebook cells) to share R objects.

Raw scalar globals — `nil`, booleans, finite numbers, and strings — can be
passed directly. To pass a vector, or to reuse an encoded value, use
`Rx.encode!/1` or a `%Rx.Object{}` handle from a previous eval:

```elixir
numbers = Rx.encode!([1, 2, 3])
{total, _} = Rx.eval("sum(numbers)", %{"numbers" => numbers})
Rx.decode(total)
#=> 6
```

## Capture mode

```elixir
%Rx.EvalResult{} =
  Rx.eval("print('hello'); warning('careful'); 1", %{}, capture: true)
```

Capture mode collects stdout, messages, and warnings into the result struct
instead of routing them to IO devices.

## Plot capture

`Rx.plot/3` evaluates R source with a temporary PNG graphics device and returns
every PNG page produced. Base plots render directly, and visible `ggplot2` plot
objects returned by top-level expressions are printed automatically.

```elixir
plots =
  Rx.plot(
    """
    plot(1:5, (1:5)^2, type = "b", main = "Rx plot")
    """,
    %{},
    width: 640,
    height: 420
  )

[%Rx.Plot{format: :png, data: png, page: 1} | _] = plots
byte_size(png)
```

`ggplot2` plots work without an explicit `print(p)` call:

```elixir
[ggplot] =
  Rx.plot(
    """
    library(ggplot2)
    ggplot(mtcars, aes(wt, mpg)) + geom_point()
    """,
    %{}
  )
```

With `capture: true`, `Rx.plot/3` returns `%Rx.PlotResult{plots, stdout,
messages, warnings}` instead of a bare list. Plot options include `width`,
`height`, `res`, `pointsize`, `max_pages`, and `max_bytes`.

## Livebook plot rendering with Kino

`Rx.Kino` is an optional bridge for rendering captured PNG plots in Livebook.
Add Kino only where you render plots:

```elixir
{:rx, "~> 0.1.0"},
{:kino, "~> 0.19.0"}
```

```elixir
[plot] = Rx.plot("plot(1:5)", %{})
Rx.Kino.image(plot)
```

Or capture and render in one call (`:columns` controls the Kino grid only and is
not passed to `Rx.plot/3`):

```elixir
Rx.Kino.plot(
  """
  plot(1:3)
  plot(3:1)
  """,
  %{},
  width: 640,
  height: 420,
  columns: 2
)
```

## R plotly → plotly_ex

The optional `Rx.Plotly` module bridges R `plotly` objects to
[`plotly_ex`](https://hex.pm/packages/plotly_ex) `%Plotly.Figure{}` structs.

```elixir
{:rx, "~> 0.1.0"},
{:plotly_ex, "~> 0.1"},
{:kino, "~> 0.19.0"}   # only if you want Plotly.show/1 in Livebook
```

```r
install.packages("plotly")
```

```elixir
{r_plot, _} =
  Rx.eval(
    """
    plotly::plot_ly(x = c(1, 2, 3), y = c(2, 4, 8), type = "scatter", mode = "lines")
    """,
    %{}
  )

{:ok, fig} = Rx.Plotly.from_r(r_plot)
Plotly.show(fig)
```

Outside Livebook, pass the resulting `%Plotly.Figure{}` to the relevant
`plotly_ex` Phoenix component or serialize it with `Plotly.Figure.to_json/1`.
`Rx.Plotly.json_from_r/1` returns the raw Plotly.js JSON string when you'd
rather work with that directly.

## Data frames without Arrow

`Rx.DataFrame` provides an explicit data-frame conversion path that does not
require the R `arrow` package.

```elixir
{r_df, _} =
  Rx.eval("""
  data.frame(
    x = c(1L, NA_integer_, 3L),
    label = c("a", NA_character_, "c"),
    stringsAsFactors = FALSE
  )
  """, %{})

{:ok, df} = Rx.DataFrame.from_r(r_df, engine: :no_arrow)
df.names
#=> ["x", "label"]
```

Arrow is the default data-frame engine because it is faster for larger frames
when Explorer and the R `arrow` package are available. Use `engine: :no_arrow`
when portability matters or installing R `arrow` is undesirable. The no-Arrow
path supports logical, integer, double, character, and typed `%Rx.NA{}` columns;
it rejects factors, dates, POSIX values, list/matrix columns, custom row names,
and non-finite doubles.

## Arrow IPC data frames

Requires the R `arrow` package. Returns raw Arrow IPC stream bytes that any
Arrow-capable library can read.

```elixir
{df_object, _} = Rx.eval("data.frame(x = 1:3, y = c('a','b','c'))", %{})
{:ok, arrow_ipc_bytes} = Rx.decode_arrow(df_object)
```

## Explorer.DataFrame integration

The optional `Rx.Explorer` module bridges R data frames and
[`Explorer.DataFrame`](https://hexdocs.pm/explorer). Requires the R `arrow`
package (`install.packages("arrow")`).

```elixir
{:rx, "~> 0.1.0"},
{:explorer, "~> 0.11"}
```

```elixir
{obj, _} = Rx.eval("data.frame(x = 1:3, y = c('a','b','c'))", %{})
{:ok, df} = Rx.Explorer.from_r(obj)
Explorer.DataFrame.n_rows(df)
#=> 3

df = Explorer.DataFrame.new(%{"x" => [1, 2, 3]})
{:ok, r_obj} = Rx.Explorer.to_r(df)
{result, _} = Rx.eval("sum(df$x)", %{"df" => r_obj})
Rx.decode(result)
#=> 6.0
```

## Object printing

Most classed R objects (fitted models, summaries) stay opaque when decoded, but
their R print methods are available through `Rx.print/2`. R `table` values are
the explicit structured exception and decode to `%Rx.Table{}`.

```elixir
{model, _} =
  Rx.eval(
    """
    x <- c(1, 2, 3, 4, 5)
    y <- c(2.1, 4.0, 6.2, 7.9, 10.1)
    lm(y ~ x)
    """,
    %{}
  )

%Rx.Object{} = Rx.decode(model)

Rx.print(model)
#=> "\nCall:\nlm(formula = y ~ x)\n..."
```

`Rx.print/2` honors a temporary `width:` (10–10000) and `max_print:` for that
call, then restores the previous R options.

## Supported data

Rx's decode support is intentionally narrow:

- `nil` / R `NULL`
- Booleans / logical
- Integers in R's non-`NA` 32-bit integer range
- Doubles and strings
- Flat homogeneous atomic vectors
- Typed R missing values as `%Rx.NA{}`
- R named lists as maps; unnamed/partial lists as `%Rx.RList{}`
- R `table` values as `%Rx.Table{}`
- Data frames via `Rx.DataFrame` (no-Arrow) or Arrow IPC (`decode_arrow/1`)
- Everything else stays an opaque `%Rx.Object{}`; use `Rx.print/2` for its
  console-style display

## Reproducible R packages with renv

`renv` is optional. Ordinary `Rx.init/1`, `Rx.eval/3`, `Rx.plot/3`, and the
data-frame APIs do not search for or activate an `renv.lock` file. Use
`Rx.renv_init/2` when a session should run inside a specific `renv` project.

Validate and load an already-restored project without installing packages:

```elixir
:ok = Rx.renv_init("path/to/project")
```

Restore packages explicitly when the project library should be populated from
the lockfile:

```elixir
:ok = Rx.renv_init("path/to/project", restore: true)
```

The first argument can be a project directory containing `renv.lock` or an
explicit lockfile path. Restore writes to the `renv` project library and may use
the configured `renv` cache; Rx does not mutate your global R library directly.

Changing the `renv` project, lockfile path, lockfile content, resolved project
library, or selected `renv` environment resets the Rscript process and
invalidates previously returned `%Rx.Object{}` handles. Recreate those objects in
the new session before passing them back to R.

Native `renv` activation is not supported. Use the process backend for
reproducible package environments.

## Experimental native backend

Alongside the default external process backend, Rx ships an **experimental,
opt-in** embedded native backend that loads R directly into the BEAM through a
NIF. It exists for high-throughput workflows where the cost of crossing the
process boundary dominates — and that cost can be substantial.

In a 100,000-row regression benchmark (build a data frame, transfer it to R, fit
`stats::lm`, extract the summary and printed output, capture a plot), the native
backend ran the end-to-end path in roughly **0.41 s versus ~14 s** for the
process backend — about **34× faster overall**. The boundary crossings dominated:
data transfer was ~200× faster and summary extraction ~270× faster, while the raw
model fit itself was a wash.

The trade-off is real: because embedded R shares the BEAM's address space, a
crash in R *can* take down the BEAM. That's why this backend is opt-in and not
production-hardened — the external process backend remains the safe default.

### Building

The native backend is not built or loaded by default. Set **exactly one** build
gate; never set both, since both implementations load as `priv/rx_nif.so`:

- `RX_BUILD_NIF=1` — build the **C** NIF.
- `RX_BUILD_RUST_NIF=1` — build the **Rust** NIF (needs Rust/Cargo from
  `rustup`).

You'll also need R's headers and the embedded R shared library (`libR.so` on
Linux, `libR.dylib` on macOS), plus `make` and a C compiler for the C path.

### Enabling

Either select it via the `RX_BACKEND` environment variable for auto-init:

- `RX_BACKEND=native` — use the native backend strictly (raises if unavailable).
- `RX_BACKEND=native_fallback` — try native, fall back to the process backend
  only on a retryable pre-R init failure.

…or initialize it explicitly:

```elixir
r_home = System.cmd("R", ["RHOME"], stderr_to_stdout: true) |> elem(0) |> String.trim()

lib_r_path =
  [Path.join([r_home, "lib", "libR.so"]), Path.join([r_home, "lib", "libR.dylib"])]
  |> Enum.find(&File.exists?/1)

:ok = Rx.system_init(backend: :native, r_home: r_home, lib_r_path: lib_r_path)
```

The public `Rx.eval/3`, `Rx.decode/1`, `Rx.print/2`, capture mode, and data-frame
APIs work the same on the native backend. Note that there's no in-BEAM
shutdown: once native R has initialized, switch backends or get a clean R state
by restarting the BEAM (or the Livebook runtime). On macOS/arm64 the native path
is validated for both the C and Rust gates, including direct Arrow data-frame
exchange — but validate package-heavy native workflows in your target
environment before relying on them.

## Licensing and R

Rx itself is released under the
[MIT License](https://github.com/pklonowski/rx/blob/main/LICENSE). R is
distributed under
`GPL-2 | GPL-3`, so keep the backend boundary in mind when you distribute.

The default process backend starts a user-provided `Rscript` executable and
talks to it over stdin/stdout. Rx does not bundle R, link against R, or ship R
binaries in that mode — which makes it the simplest license boundary for normal
package use. You're still responsible for complying with the licenses of your
installed R runtime and R packages.

The native backend is different: it loads an embedded R shared library into the
BEAM process. Distributors who ship native builds, prebuilt artifacts,
containers, or appliances that include or link R should evaluate R's GPL terms
for that combined distribution. Rx ships no prebuilt native R-linked binaries.
This is engineering guidance, not legal advice.

## Learn more

- [HexDocs](https://hexdocs.pm/rx) — full API reference.

Runnable Livebook notebooks (each installs Rx with `Mix.install/1`):

- [`notebooks/rx_tour.livemd`](https://github.com/pklonowski/rx/blob/main/notebooks/rx_tour.livemd)
  — an API tour covering eval, decode, capture, plots, Arrow, Explorer, and
  Plotly on the default process backend.
- [`notebooks/iris_classification_r_guide.livemd`](https://github.com/pklonowski/rx/blob/main/notebooks/iris_classification_r_guide.livemd)
  — an Iris classification walkthrough that runs on either the process backend
  or the experimental native backend.
- [`notebooks/renv_process_backend_smoke.livemd`](https://github.com/pklonowski/rx/blob/main/notebooks/renv_process_backend_smoke.livemd)
  — builds an isolated `renv` project and runs a small `datasets::airquality`
  analysis through `Rx.renv_init/2` for reproducible package sets.
- [`notebooks/port_arrow_native_benchmark.livemd`](https://github.com/pklonowski/rx/blob/main/notebooks/port_arrow_native_benchmark.livemd)
  — a Benchee head-to-head comparing the process and native backends across
  data transfer, model fitting, summary extraction, and plot capture.