README.md

# Handoff

Handoff is a library for building and executing Directed Acyclic Graphs (DAGs) of functions in Elixir.

## Features

- **Graph-based computation**: Define and execute complex computational graphs with dependencies.
- **Resource-aware scheduling**: Optimize computation based on available resources.
- **Distributed execution**: Run graph workloads across multiple nodes.
- **Fault tolerance**: Task retries on failure up to a configured maximum number of retries.

## Installation

The package can be installed by adding `handoff` to your list of dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:handoff, "~> 0.1.0"}
  ]
end
```

## Usage

### Basic DAG Construction

Here's a simple example of how to create and validate a DAG.
We'll assume we have a module `MyTasks` with a few helper functions:

```elixir
defmodule MyTasks do
  def format_output(value), do: "The final answer is: #{value}"
end
```

Now, let's build the DAG:

```elixir
alias Handoff.{DAG, Function}

# Create a new empty DAG
dag = DAG.new() # Handoff.DAG.new() is also valid

# Define a source function with no dependencies
source_fn = %Function{
  id: :source,
  args: [],
  code: &Elixir.Function.identity/1,
  extra_args: [42]
}

# Define a function that depends on the source function
transform_fn = %Function{
  id: :transform,
  args: [:source], # Result of :source is passed to &*/2
  code: &*/2,
  extra_args: [2]
}

# Define a sink function that depends on the transform function
sink_fn = %Function{
  id: :sink,
  args: [:transform], # Result of :transform is passed to MyTasks.format_output/1
  code: &MyTasks.format_output/1
}

# Add functions to the DAG
dag =
  dag
  |> DAG.add_function(source_fn)
  |> DAG.add_function(transform_fn)
  |> DAG.add_function(sink_fn)

# Validate the DAG to ensure it has no cycles and all dependencies exist
case DAG.validate(dag) do
  :ok ->
    IO.puts("DAG is valid!")
  {:error, reason} ->
    IO.puts("DAG is invalid: #{inspect(reason)}")
end
```

### Executing a DAG

Once a DAG is constructed and validated, you can execute it. Handoff's `DistributedExecutor` can run DAGs locally or across multiple nodes. For local execution:

```elixir
# (Assuming dag from the previous example is validated and :ok)

# Ensure Handoff application is started (typically in your application.ex)
# For scripts or Livebooks, you might need:
# {:ok, _pid} = Handoff.start_link() # Or Handoff.Application.start(:normal, [])
# Handoff.DistributedExecutor.register_local_node(%{cpu: 2, memory: 1024}) # Example capabilities

case Handoff.DistributedExecutor.execute(dag) do
  {:ok, results_map} ->
    # results_map is %{dag_id: ..., results: %{source: 42, transform: 84, sink: "The final answer is: 84"}, allocations: %{...}}
    final_message = results_map.results[:sink]
    IO.puts(final_message)
    # => "The final answer is: 84"

  {:error, reason} ->
    IO.puts("Error executing DAG: #{inspect(reason)}")
end
```

Note: The `Handoff.DistributedExecutor.execute/2` function returns a map containing the `:dag_id`, the execution `:results` (a map of function ID to its output), and `:allocations` (a map of function ID to the node it ran on).

<!-- TODO: Add note about starting Handoff application if not already running -->

### Distributed Execution

Handoff is designed to distribute DAG execution across multiple Erlang nodes. To enable this, you need to:

1. Ensure nodes are connected in an Erlang cluster.
2. Start the `Handoff` application on each node.
3. Register each node's capabilities (e.g., CPU, memory) with `Handoff.DistributedExecutor.register_local_node/1`.
4. Optionally, specify node placement preferences or resource costs for your functions.

Here's a conceptual example. We'll reuse `MyTasks.format_output/1` and imagine a `MyDistributedTasks` module:

```elixir
# On all participating nodes:
# Ensure Handoff application is started (e.g., in application.ex or manually)
# Handoff.Application.start(:normal, []) # Or Handoff.start_link()

# Register node capabilities (example for one node)
# This would typically be done on each node with its specific resources.
node_capabilities = %{cpu: 4, memory: 8192, gpu: 1}
Handoff.DistributedExecutor.register_local_node(node_capabilities)

# Example Task Module (must be available on all nodes)
defmodule MyDistributedTasks do
  def load_data(path), do: "Loaded: #{path}" # Simulate data loading
  def process_on_gpu(data), do: "Processed [#{data}] on GPU" # Simulate GPU work
end
```

Then, on the orchestrating node, define and execute the DAG:

```elixir
alias Handoff.{DAG, Function}

dag = DAG.new("distributed_example_dag")

# Function that might run on any node based on availability
load_fn = %Function{
  id: :load_data,
  args: [],
  code: &MyDistributedTasks.load_data/1,
  extra_args: ["/path/to/shared/data.txt"],
  cost: %{cpu: 1, memory: 1000} # Basic cost
}

# Function that explicitly requests GPU resources
gpu_fn = %Function{
  id: :gpu_process,
  args: [:load_data],
  code: &MyDistributedTasks.process_on_gpu/1,
  cost: %{cpu: 2, memory: 2048, gpu: 1} # Requires a GPU
}

# Function to format the output, can run anywhere
format_fn = %Function{
  id: :format_result,
  args: [:gpu_process],
  code: &MyTasks.format_output/1, # Reusing from previous example
  cost: %{cpu: 1, memory: 500}
}

distributed_dag =
  dag
  |> DAG.add_function(load_fn)
  |> DAG.add_function(gpu_fn)
  |> DAG.add_function(format_fn)

# Validate
:ok = DAG.validate(distributed_dag)

# Execute the DAG - Handoff will attempt to distribute based on costs and availability
case Handoff.DistributedExecutor.execute(distributed_dag) do
  {:ok, results_map} ->
    IO.puts("Distributed DAG completed!")
    IO.puts("Final result: #{results_map.results[:format_result]}")
    IO.inspect(results_map.allocations, label: "Function Allocations")
  {:error, reason} ->
    IO.puts("Distributed DAG execution failed: #{inspect(reason)}")
end
```

By default, `Handoff.DistributedExecutor` will try to allocate functions to nodes that satisfy their resource `:cost` and are available. If all functions can be satisfied by the local node and no other nodes are registered or available, execution will be local-only. You can also specify `node: :some_node_name` in the `Handoff.Function` struct to pin a function to a particular node, or use more advanced allocation strategies via options in `execute/2`.

## Interactive Examples

We provide interactive [Livebook](https://livebook.dev/) examples to help you get started:

- **Simple Pipeline** - A basic data processing pipeline showing core concepts
- **Distributed Image Processing** - A complex example with resource-aware distributed execution

Check out the [`livebooks/`](livebooks/) directory for these interactive notebooks.

## Running Tests

To run the test suite, use the following command:

```bash
mix test
```

## Documentation

Official documentation is available at [https://hexdocs.pm/handoff](https://hexdocs.pm/handoff)

### Building Documentation Locally

This project uses [ExDoc](https://github.com/elixir-lang/ex_doc) to generate documentation. To build the documentation locally, run:

```bash
mix docs
```

This will generate documentation in the `doc/` directory. You can open `doc/index.html` in your browser to view it.

## Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details.

## License

This project is licensed under the Apache License - see the [LICENSE](LICENSE) file for details.