README.md

# SafeNIF

![Elixir CI](https://github.com/probably-not/safe-nif/actions/workflows/pipeline.yaml/badge.svg)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Hex version badge](https://img.shields.io/hexpm/v/safe_nif.svg)](https://hex.pm/packages/safe_nif)

<!-- README START -->

<!-- HEX PACKAGE DESCRIPTION START -->

Wrap your untrusted NIFs so that they can never crash your node.

<!-- HEX PACKAGE DESCRIPTION END -->

## Motivation

NIFs are great - sometimes... when they're written in a safe way, have been in use for a very long time, and are trusted by the community, then they have likely been through the process of finding most bugs that are in their underlying source. However, sometimes new libraries come out, and have not been as battle tested as you'd like. Some may have bugs, and when a NIF has a bug, it can crash your entire BEAM node! Code running inside of a NIF does not provide the same safety guarantees that the BEAM gives.

But... what if it could?

I recently ran into this issue, using a library based on a NIF, and the NIF's underlying source was having sporadic crashes. I don't own the library, nor do I own the underlying C source, so while I can submit PRs to them to get it fixed, I still need some way to guarantee safety in the meantime. And thus, SafeNIF was born!

SafeNIF allows you to wrap your NIFs to run on an isolated peer node raised on the same machine. If the NIF crashes, only this peer node dies.
The guarantees of the BEAM continue, and you get fault tolerance and crash isolation, even for NIFs, all in native Elixir (with a touch of Erlang's standard library).

## Benchmarks

Benchmarks can be found in the `bench` directory.

As of v0.2.0, SafeNIF has implemented a lazy pool of reusable nodes which scale down when idle.
On cold starts, a startup cost is incurred to initialize the peer node, which can take anywhere from 100ms to over a second,
depending on how much code needs to be loaded onto the peer node. It should be noted that pooling also incurs costs around memory and CPU since it spins up a node on the same machine.

The benchmarks show that a CLI based Port is slower than SafeNIF. However, different types of workloads and Ports may yield different results.
For example, Ports that communicate over `:stdio` and use a protocol so they are constantly alive and responding may perform better than how a CLI based port may perform.

Ports have both upsides and downsides just like NIFs, so your mileage may vary as you work with them.
SafeNIF's main concern is allowing any consumers to simply wrap any NIF by calling `SafeNIF.wrap/1` and immediately having the safety and isolation that the BEAM natively provides.

**The following information was generated by Claude and Reviewed by @probably-not. If issues in this README are found, feel free to open up a PR to fix them!**

## Usage

### Basic Usage

SafeNIF provides a single function: `SafeNIF.wrap/2`. Pass it an MFA (module, function, arguments) tuple and it runs on an isolated peer node:

```elixir
# Successful execution returns {:ok, result}
{:ok, 6} = SafeNIF.wrap({Kernel, :+, [2, 4]})

# Complex return values work fine
{:ok, %{name: "test"}} = SafeNIF.wrap({Map, :put, [%{}, :name, "test"]})
```

### Wrapping Potentially Dangerous NIFs

The primary use case is wrapping NIFs that might crash:

```elixir
defmodule MyApp.ImageProcessor do
  def safe_process(image_binary) do
    # UntrustedNIF.process/1 might crash the BEAM
    case SafeNIF.wrap({UntrustedNIF, :process, [image_binary]}) do
      {:ok, processed} -> 
        {:ok, processed}
      {:error, :noconnection} -> 
        # The NIF crashed the peer node
        {:error, :nif_crashed}
      {:error, :timeout} -> 
        {:error, :processing_timeout}
      {:error, reason} -> 
        {:error, reason}
    end
  end
end
```

### Timeouts

The default timeout is 5 seconds. Specify a custom timeout as the second argument using `to_timeout/1`:

```elixir
# 30 second timeout for long-running operations
SafeNIF.wrap({HeavyComputation, :run, [data]}, to_timeout(second: 30))

# 2 minute timeout for very long operations
SafeNIF.wrap({BatchJob, :process, [items]}, to_timeout(minute: 2))

# 500ms timeout for quick operations
SafeNIF.wrap({QuickCheck, :validate, [input]}, to_timeout(millisecond: 500))
```

When a timeout occurs, the peer node is killed and `{:error, :timeout}` is returned.

### Anonymous Functions

Anonymous functions are supported but with an important caveat: the module that defines the function must be loadable on the peer node.

```elixir
# Works
SafeNIF.wrap(fn -> 1 + 1 end)

# Works (application modules are loaded on the peer)
SafeNIF.wrap(fn -> MyApp.Worker.do_work() end)

# May fail if defined inside a code path that is not part of the application.
defmodule MyTest do
  def run_test do
    SafeNIF.wrap(fn -> :test_result end)
  end
end
```

For maximum reliability, prefer MFA tuples over anonymous functions.

### Error Handling

SafeNIF returns tagged tuples to distinguish between successful results and failures:

```elixir
case SafeNIF.wrap({SomeModule, :some_function, [arg]}) do
  {:ok, result} ->
    # Function executed successfully, result is the return value
    handle_success(result)
    
  {:error, :timeout} ->
    # Function exceeded the timeout
    handle_timeout()
    
  {:error, :noconnection} ->
    # Peer node crashed (NIF crash, :erlang.halt, etc.)
    handle_crash()
    
  {:error, :not_alive} ->
    # Current node isn't running in distributed mode
    handle_not_distributed()
    
  {:error, reason} ->
    # Function raised/exited with reason
    handle_error(reason)
end
```

Note that if your wrapped function returns an error tuple, it's wrapped in `{:ok, ...}`:

```elixir
# Function returns {:error, :not_found}
{:ok, {:error, :not_found}} = SafeNIF.wrap({MyModule, :find, [123]})
```

This follows the same convention as `Task.async_stream/5`.

## Requirements

### Distributed Mode

SafeNIF requires your node to be running in distributed mode. If you call `SafeNIF.wrap/2` on a non-distributed node, you'll get `{:error, :not_alive}`.

For development, start IEx with a node name:

```bash
iex --sname myapp -S mix
```

For production releases, ensure your node is started with distribution enabled.

### Running Tests

Tests require distribution. Add this to your `test/test_helper.exs`:

```elixir
{:ok, _} = Node.start(:"test@127.0.0.1", :shortnames)
ExUnit.start()
```

Or run tests with:

```bash
mix test --sname test
```

## How It Works

When you call `SafeNIF.wrap/2`:

1. A new BEAM node is started as a hidden peer using OTP's `:peer` module
2. All code paths and application configuration are copied to the peer
3. Applications are started on the peer
4. Your function executes on the peer node
5. The result is sent back via Erlang distribution
6. The peer node shuts down

### Hidden Nodes

Peer nodes are started with the `-hidden` flag. This means they:

- Don't appear in `Node.list/0`
- Don't trigger `:net_kernel.monitor_nodes/1` callbacks
- Won't be discovered by clustering libraries (libcluster, Horde, etc.)

This prevents SafeNIF's ephemeral peers from interfering with your cluster topology.

## Performance Considerations

Since v0.2.0, SafeNIF now creates a lazy pool of ready peer nodes for use.

This does not mean, however, that SafeNIF is without overhead.
There is still overhead in sending messages between the nodes, and wrapping the function in a way that can communicate with the caller.

SafeNIF is designed for "performant-enough" isolation, ensuring that functions, specifically NIFs which are untrusted, can run without affecting the current node, and not high performance.
Use it for:

- Untrusted or potentially crashy NIFs
- Operations where safety trumps speed

Don't use it for:

- Trusted code that won't crash the node

## Installation

[SafeNIF is available on Hex](https://hex.pm/packages/safe_nif).

To install, add it to you dependencies in your project's `mix.exs`.

```elixir
def deps do
  [
    {:safe_nif, ">= 0.0.1"}
  ]
end
```

Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc)
and published on [HexDocs](https://hexdocs.pm). Once published, the docs can
be found at <https://hexdocs.pm/safe_nif>.

<!-- README END -->