README.md

<p align="center">
  <img src="assets/flowstone.svg" alt="FlowStone Logo" width="220" height="248">
</p>

# FlowStone

**Asset-first orchestration for the BEAM.**

FlowStone is a data orchestration framework for Elixir that treats data artifacts (assets) as first-class citizens. Inspired by [Dagster](https://dagster.io/), but built to leverage BEAM's fault tolerance, real-time capabilities, and operational simplicity.

## Why FlowStone?

Traditional workflow orchestration is task-centric: define steps, execute in order, hope nothing breaks. This leads to:

- Implicit data contracts between steps
- Difficult lineage tracking ("where did this data come from?")
- Non-reproducible results
- Painful debugging

**FlowStone inverts this model.** You define *assets* (data artifacts), and the execution graph is derived from their dependencies. Every materialization is tracked, lineage is automatic, and you can answer "what happens if this source changes?" with a single query.

## Quick Example

```elixir
defmodule MyApp.Pipeline do
  use FlowStone.Pipeline

  asset :raw_events do
    description "Raw events from source systems"
    io_manager :s3
    bucket "data-lake"
    partitioned_by :date

    execute fn context, _deps ->
      fetch_events_for_date(context.partition)
    end
  end

  asset :cleaned_events do
    description "Validated and cleaned events"
    depends_on [:raw_events]
    io_manager :postgres
    table "analytics.cleaned_events"

    execute fn context, %{raw_events: events} ->
      {:ok, clean_and_validate(events)}
    end
  end

  asset :daily_report do
    description "Aggregated daily metrics"
    depends_on [:cleaned_events]
    io_manager :s3
    path fn partition -> "reports/#{partition}.json" end

    execute fn context, %{cleaned_events: events} ->
      {:ok, aggregate_metrics(events)}
    end
  end
end

# Materialize an asset
FlowStone.materialize(:daily_report, partition: ~D[2025-01-15])

# Materialize with all dependencies
FlowStone.materialize_all(:daily_report, partition: ~D[2025-01-15])

# Backfill historical data
FlowStone.backfill(:daily_report,
  partitions: Date.range(~D[2024-01-01], ~D[2024-12-31]),
  max_parallel: 10
)

# Query lineage
FlowStone.Lineage.upstream(:daily_report, ~D[2025-01-15])
# => [{:cleaned_events, ~D[2025-01-15]}, {:raw_events, ~D[2025-01-15]}]
```

## Key Features

### Asset-First Architecture
- **Explicit Dependencies**: Assets declare what they depend on
- **Automatic DAG Construction**: Execution order derived from dependencies
- **Compile-Time Validation**: Catch errors before runtime

### Materialization Tracking
- **Every Execution Recorded**: Who ran what, when, with what inputs
- **Lineage Queries**: Trace any output back to its sources
- **Impact Analysis**: Know what breaks before making changes

### Partitioning & Multi-Tenancy
- **Time-Based Partitions**: Daily, hourly, or custom time windows
- **Custom Partitions**: Tenant, region, or any composite key
- **Row-Level Security**: Built-in tenant isolation

### Human-in-the-Loop
- **Checkpoint Gates**: Pause workflows for approval
- **Configurable Timeouts**: Escalation and auto-decision
- **Full Audit Trail**: Who approved what and why

### Real-Time UI
- **LiveView Dashboard**: WebSocket-powered, no polling
- **Asset Graph Visualization**: Interactive DAG explorer
- **Approval Queue**: Review and approve pending checkpoints

### Integrations
- **I/O Managers**: PostgreSQL, S3, Parquet, custom
- **LLM Support**: Rate-limited API calls with retry
- **Python/ML**: Erlport integration for ML pipelines
- **dbt**: Run dbt models as FlowStone assets

## Installation

Add to your `mix.exs`:

```elixir
def deps do
  [
    {:flowstone, "~> 0.1.0"}
  ]
end
```

## Documentation

### Architecture Decision Records

The `/docs/adr/` directory contains comprehensive ADRs documenting every design decision:

| ADR | Topic |
|-----|-------|
| [0001](docs/adr/0001-asset-first-orchestration.md) | Asset-first architecture |
| [0002](docs/adr/0002-dag-engine-persistence.md) | DAG engine and persistence |
| [0003](docs/adr/0003-partitioning-isolation.md) | Partitioning and tenant isolation |
| [0004](docs/adr/0004-io-manager-abstraction.md) | I/O manager abstraction |
| [0005](docs/adr/0005-checkpoint-approval-gates.md) | Checkpoint and approval gates |
| [0006](docs/adr/0006-oban-job-execution.md) | Oban job execution |
| [0007](docs/adr/0007-scheduling-sensors.md) | Scheduling and sensors |
| [0008](docs/adr/0008-resource-injection.md) | Resource injection |
| [0009](docs/adr/0009-error-handling.md) | Error handling |
| [0010](docs/adr/0010-elixir-dsl-not-yaml.md) | Elixir DSL (not YAML) |
| [0011](docs/adr/0011-observability-telemetry.md) | Observability and telemetry |
| [0012](docs/adr/0012-liveview-ui.md) | LiveView UI |
| [0013](docs/adr/0013-testing-strategies.md) | Testing strategies |
| [0014](docs/adr/0014-lineage-reporting.md) | Lineage and audit reporting |
| [0015](docs/adr/0015-external-integrations.md) | External integrations |

## Comparison

### vs. Dagster

| Feature | Dagster | FlowStone |
|---------|---------|-----------|
| Language | Python | Elixir |
| Asset-first | Yes | Yes |
| Lineage | Yes | Yes |
| Real-time UI | Polling (React) | Push (LiveView) |
| Fault tolerance | Process-based | OTP supervision |
| Hot code reload | Restart required | Zero-downtime |
| Memory footprint | ~1.2GB+ | ~300MB |
| Integrations | 100+ | ~15 (growing) |

### vs. Oban

| Feature | Oban | FlowStone |
|---------|------|-----------|
| Job queue | Yes | Uses Oban |
| Asset-first | No | Yes |
| Lineage | No | Yes |
| Partitioning | No | Yes |
| DAG execution | No | Yes |
| Checkpoints | No | Yes |

FlowStone uses Oban for job execution but adds the asset-first layer on top.

## Philosophy

1. **Assets, not tasks**: Define what data you want, not how to compute it
2. **BEAM-native**: Leverage OTP for fault tolerance, LiveView for UI
3. **Explicit over implicit**: Dependencies, resources, and context are always explicit
4. **Compile-time safety**: Catch configuration errors before runtime
5. **Testable by design**: Every component can be tested in isolation

## Status

FlowStone is currently in **design phase**. The ADRs document the complete architecture, and implementation is planned.

### Roadmap

1. **Phase 1 (2 weeks)**: Core MVP - Asset DSL, DAG, in-memory I/O
2. **Phase 2 (2 weeks)**: Persistence - PostgreSQL, S3, Oban integration
3. **Phase 3 (2 weeks)**: UI - LiveView dashboard, asset graph
4. **Phase 4 (1 week)**: Scheduling - Cron, sensors
5. **Phase 5 (ongoing)**: Integrations - LLM, Python, dbt

## Contributing

Contributions welcome! Please read the ADRs first to understand the design philosophy.

## License

MIT