README.md

# SafeNIF

![Elixir CI](https://github.com/probably-not/safe-nif/actions/workflows/pipeline.yaml/badge.svg)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Hex version badge](https://img.shields.io/hexpm/v/safe_nif.svg)](https://hex.pm/packages/safe_nif)

<!-- README START -->

<!-- HEX PACKAGE DESCRIPTION START -->

Wrap your untrusted NIFs so that they can never crash your node.

<!-- HEX PACKAGE DESCRIPTION END -->

> ### SafeNIF Is Experimental {: .warning}
> SafeNIF is in early development and subject to changes in the behaviour and the API.
>
> For right now, it can be used to wrap any function or MFA that might cause some sort of crash on the BEAM node in order to keep that function safe and isolated.
> It currently carries performance penalties of peer node startup and code loading, however warm node pooling is in development to optimize this performance penalty.

## Benchmarks

Benchmarks can be found in the `bench` directory. As of v0.1.0, SafeNIF has not implemented pooling of peer nodes.
This means that it currently incurs the high cost of starting up a peer node for every call, which can take anywhere from 100ms to over a second,
depending on how much code needs to be loaded onto the peer node. You can see from the benchmarks that out of the three methods benchmarked (CLI+Port, NIF, SafeNIF),
SafeNIF is currently the slowest due to this incurred cost.

Adding pooling will be implemented in v0.2.0 and should make this far more efficient as we will only need to incur the cost once per node created.
It should be noted that pooling will incur different costs - namely memory and CPU since it spins up a node on the same machine.

**The following information was generated by Claude and Reviewed by @probably-not. If issues in this README are found, feel free to open up a PR to fix them!**

## The Problem

NIFs (Native Implemented Functions) are powerful but dangerous. A buggy or malicious NIF can crash your entire BEAM node, taking down all processes and connections with it. There's no way to catch or recover from a NIF crash - your node simply dies.

SafeNIF solves this by running untrusted code on isolated peer nodes. If the NIF crashes, only the peer dies. Your main node continues running, and you get a clean error tuple back.

## Usage

### Basic Usage

SafeNIF provides a single function: `SafeNIF.wrap/2`. Pass it an MFA (module, function, arguments) tuple and it runs on an isolated peer node:

```elixir
# Successful execution returns {:ok, result}
{:ok, 6} = SafeNIF.wrap({Kernel, :+, [2, 4]})

# Complex return values work fine
{:ok, %{name: "test"}} = SafeNIF.wrap({Map, :put, [%{}, :name, "test"]})
```

### Wrapping Potentially Dangerous NIFs

The primary use case is wrapping NIFs that might crash:

```elixir
defmodule MyApp.ImageProcessor do
  def safe_process(image_binary) do
    # UntrustedNIF.process/1 might crash the BEAM
    case SafeNIF.wrap({UntrustedNIF, :process, [image_binary]}) do
      {:ok, processed} -> 
        {:ok, processed}
      {:error, :noconnection} -> 
        # The NIF crashed the peer node
        {:error, :nif_crashed}
      {:error, :timeout} -> 
        {:error, :processing_timeout}
      {:error, reason} -> 
        {:error, reason}
    end
  end
end
```

### Timeouts

The default timeout is 5 seconds. Specify a custom timeout as the second argument using `to_timeout/1`:

```elixir
# 30 second timeout for long-running operations
SafeNIF.wrap({HeavyComputation, :run, [data]}, to_timeout(second: 30))

# 2 minute timeout for very long operations
SafeNIF.wrap({BatchJob, :process, [items]}, to_timeout(minute: 2))

# 500ms timeout for quick operations
SafeNIF.wrap({QuickCheck, :validate, [input]}, to_timeout(millisecond: 500))
```

When a timeout occurs, the peer node is killed and `{:error, :timeout}` is returned.

### Anonymous Functions

Anonymous functions are supported but with an important caveat: the module that defines the function must be loadable on the peer node.

```elixir
# Works
SafeNIF.wrap(fn -> 1 + 1 end)

# Works (application modules are loaded on the peer)
SafeNIF.wrap(fn -> MyApp.Worker.do_work() end)

# May fail if defined inside a code path that is not part of the application.
defmodule MyTest do
  def run_test do
    SafeNIF.wrap(fn -> :test_result end)
  end
end
```

For maximum reliability, prefer MFA tuples over anonymous functions.

### Error Handling

SafeNIF returns tagged tuples to distinguish between successful results and failures:

```elixir
case SafeNIF.wrap({SomeModule, :some_function, [arg]}) do
  {:ok, result} ->
    # Function executed successfully, result is the return value
    handle_success(result)
    
  {:error, :timeout} ->
    # Function exceeded the timeout
    handle_timeout()
    
  {:error, :noconnection} ->
    # Peer node crashed (NIF crash, :erlang.halt, etc.)
    handle_crash()
    
  {:error, :not_alive} ->
    # Current node isn't running in distributed mode
    handle_not_distributed()
    
  {:error, reason} ->
    # Function raised/exited with reason
    handle_error(reason)
end
```

Note that if your wrapped function returns an error tuple, it's wrapped in `{:ok, ...}`:

```elixir
# Function returns {:error, :not_found}
{:ok, {:error, :not_found}} = SafeNIF.wrap({MyModule, :find, [123]})
```

This follows the same convention as `Task.async_stream/5`.

## Requirements

### Distributed Mode

SafeNIF requires your node to be running in distributed mode. If you call `SafeNIF.wrap/2` on a non-distributed node, you'll get `{:error, :not_alive}`.

For development, start IEx with a node name:

```bash
iex --sname myapp -S mix
```

For production releases, ensure your node is started with distribution enabled.

### Running Tests

Tests require distribution. Add this to your `test/test_helper.exs`:

```elixir
{:ok, _} = Node.start(:"test@127.0.0.1", :shortnames)
ExUnit.start()
```

Or run tests with:

```bash
mix test --sname test
```

## How It Works

When you call `SafeNIF.wrap/2`:

1. A new BEAM node is started as a hidden peer using OTP's `:peer` module
2. All code paths and application configuration are copied to the peer
3. Applications are started on the peer
4. Your function executes on the peer node
5. The result is sent back via Erlang distribution
6. The peer node shuts down

### Hidden Nodes

Peer nodes are started with the `-hidden` flag. This means they:

- Don't appear in `Node.list/0`
- Don't trigger `:net_kernel.monitor_nodes/1` callbacks
- Won't be discovered by clustering libraries (libcluster, Horde, etc.)

This prevents SafeNIF's ephemeral peers from interfering with your cluster topology.

## Performance Considerations

Starting a peer node is expensive. Each call to `SafeNIF.wrap/2` incurs:

- BEAM VM startup time
- Code path initialization
- Application configuration transfer
- Application startup

This can take 500ms-2s depending on your application size. SafeNIF is designed for isolation, not performance. Use it for:

- Untrusted or potentially crashy NIFs
- User-submitted code execution
- Operations where safety trumps speed

Don't use it for:

- High-frequency calls
- Latency-sensitive operations
- Trusted code that won't crash

> **Note:** Warm node pooling is planned for a future release to amortize startup costs across multiple calls.

## Installation

[SafeNIF is available on Hex](https://hex.pm/packages/safe_nif).

To install, add it to you dependencies in your project's `mix.exs`.

```elixir
def deps do
  [
    {:safe_nif, ">= 0.0.1"}
  ]
end
```

Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc)
and published on [HexDocs](https://hexdocs.pm). Once published, the docs can
be found at <https://hexdocs.pm/safe_nif>.

<!-- README END -->