README.md

# SqlParserEx

An Elixir library wrapping the Apache DataFusion SQL parser ([`sqlparser` Rust crate v0.61](https://crates.io/crates/sqlparser)) via Rustler NIFs. Precompiled binaries are provided — no Rust toolchain required for end users.

## Installation

```elixir
def deps do
  [
    {:sql_parser_ex, "~> 0.1"}
  ]
end
```

## Usage

```elixir
# Parse a single statement
{:ok, ast} = SqlParserEx.parse("SELECT id, name FROM users WHERE id = 1")

# Parse with a specific dialect
{:ok, ast} = SqlParserEx.parse("SELECT $1::int", dialect: :postgres)

# Parse multiple statements
{:ok, [stmt1, stmt2]} = SqlParserEx.parse_many("SELECT 1; SELECT 2")

# Reconstruct SQL from AST (round-trip)
{:ok, sql} = SqlParserEx.to_sql(ast)

# List supported dialects
SqlParserEx.dialects()
```

## AST Output

The AST mirrors the full structure of the [`sqlparser` Rust crate](https://docs.rs/sqlparser), including source position `span` fields for every token. The example below omits spans for readability.

```elixir
sql = """
  SELECT DISTINCT u.id, u.name, COUNT(o.id) AS order_count
  FROM users AS u
  JOIN orders AS o ON u.id = o.user_id
  WHERE u.active = true
  GROUP BY u.id, u.name
  HAVING COUNT(o.id) > 0
  ORDER BY order_count DESC
  LIMIT 10
"""
{:ok, ast} = SqlParserEx.parse(sql)
```

Returns:

```json
{
  "Query": {
    "body": {
      "Select": {
        "distinct": "Distinct",
        "projection": [
          {
            "UnnamedExpr": {
              "CompoundIdentifier": [
                { "value": "u", "quote_style": null },
                { "value": "id", "quote_style": null }
              ]
            }
          },
          {
            "UnnamedExpr": {
              "CompoundIdentifier": [
                { "value": "u", "quote_style": null },
                { "value": "name", "quote_style": null }
              ]
            }
          },
          {
            "ExprWithAlias": {
              "expr": {
                "Function": {
                  "name": [{ "Identifier": { "value": "COUNT", "quote_style": null } }],
                  "args": {
                    "List": {
                      "args": [
                        {
                          "Unnamed": {
                            "Expr": {
                              "CompoundIdentifier": [
                                { "value": "o", "quote_style": null },
                                { "value": "id", "quote_style": null }
                              ]
                            }
                          }
                        }
                      ],
                      "clauses": [],
                      "duplicate_treatment": null
                    }
                  },
                  "filter": null,
                  "null_treatment": null,
                  "over": null,
                  "parameters": "None",
                  "within_group": [],
                  "uses_odbc_syntax": false
                }
              },
              "alias": { "value": "order_count", "quote_style": null }
            }
          }
        ],
        "from": [
          {
            "relation": {
              "Table": {
                "name": [{ "Identifier": { "value": "users", "quote_style": null } }],
                "alias": {
                  "name": { "value": "u", "quote_style": null },
                  "columns": [],
                  "explicit": true
                }
              }
            },
            "joins": [
              {
                "relation": {
                  "Table": {
                    "name": [{ "Identifier": { "value": "orders", "quote_style": null } }],
                    "alias": {
                      "name": { "value": "o", "quote_style": null },
                      "columns": [],
                      "explicit": true
                    }
                  }
                },
                "join_operator": {
                  "Join": {
                    "On": {
                      "BinaryOp": {
                        "left": {
                          "CompoundIdentifier": [
                            { "value": "u", "quote_style": null },
                            { "value": "id", "quote_style": null }
                          ]
                        },
                        "op": "Eq",
                        "right": {
                          "CompoundIdentifier": [
                            { "value": "o", "quote_style": null },
                            { "value": "user_id", "quote_style": null }
                          ]
                        }
                      }
                    }
                  }
                },
                "global": false
              }
            ]
          }
        ],
        "selection": {
          "BinaryOp": {
            "left": {
              "CompoundIdentifier": [
                { "value": "u", "quote_style": null },
                { "value": "active", "quote_style": null }
              ]
            },
            "op": "Eq",
            "right": { "Value": { "value": { "Boolean": true } } }
          }
        },
        "group_by": {
          "Expressions": [
            [
              {
                "CompoundIdentifier": [
                  { "value": "u", "quote_style": null },
                  { "value": "id", "quote_style": null }
                ]
              },
              {
                "CompoundIdentifier": [
                  { "value": "u", "quote_style": null },
                  { "value": "name", "quote_style": null }
                ]
              }
            ],
            []
          ]
        },
        "having": {
          "BinaryOp": {
            "left": {
              "Function": {
                "name": [{ "Identifier": { "value": "COUNT", "quote_style": null } }],
                "args": {
                  "List": {
                    "args": [
                      {
                        "Unnamed": {
                          "Expr": {
                            "CompoundIdentifier": [
                              { "value": "o", "quote_style": null },
                              { "value": "id", "quote_style": null }
                            ]
                          }
                        }
                      }
                    ],
                    "clauses": [],
                    "duplicate_treatment": null
                  }
                }
              }
            },
            "op": "Gt",
            "right": { "Value": { "value": { "Number": ["0", false] } } }
          }
        },
        "flavor": "Standard",
        "cluster_by": [],
        "distribute_by": [],
        "sort_by": [],
        "lateral_views": [],
        "named_window": [],
        "connect_by": []
      }
    },
    "order_by": {
      "kind": {
        "Expressions": [
          {
            "expr": { "Identifier": { "value": "order_count", "quote_style": null } },
            "options": { "asc": false, "nulls_first": null },
            "with_fill": null
          }
        ]
      },
      "interpolate": null
    },
    "limit_clause": {
      "LimitOffset": {
        "limit": { "Value": { "value": { "Number": ["10", false] } } },
        "offset": null,
        "limit_by": []
      }
    },
    "with": null,
    "locks": [],
    "fetch": null,
    "pipe_operators": [],
    "for_clause": null,
    "format_clause": null,
    "settings": null
  }
}
```

## Supported Dialects

| Atom | SQL Dialect |
|------|-------------|
| `:generic` | Generic (permissive, default) |
| `:ansi` | ANSI SQL:2011 |
| `:postgres` | PostgreSQL |
| `:mysql` | MySQL |
| `:sqlite` | SQLite |
| `:mssql` | Microsoft SQL Server |
| `:bigquery` | Google BigQuery |
| `:clickhouse` | ClickHouse |
| `:duckdb` | DuckDB |
| `:databricks` | Databricks |
| `:hive` | Apache Hive |
| `:redshift` | Amazon Redshift |
| `:snowflake` | Snowflake |

## Error Handling

All functions return `{:ok, result}` or `{:error, reason}`:

```elixir
# Invalid SQL
{:error, "sql parser error: ..."} = SqlParserEx.parse("SELECT * FROM")

# Unknown dialect
{:error, {:unknown_dialect, :cobol}} = SqlParserEx.parse("SELECT 1", dialect: :cobol)

# Multiple statements passed to parse/2
{:error, "expected exactly one statement; use parse_many/2 for multiple"} =
  SqlParserEx.parse("SELECT 1; SELECT 2")
```

## Building from Source

If you want to compile the NIF locally (requires Rust stable):

```bash
SQL_PARSER_EX_BUILD=1 mix deps.get
SQL_PARSER_EX_BUILD=1 mix compile
```

> **Note for consumers**: When forcing a local build, you must also add `{:rustler, "~> 0.37"}` to your own `deps/0`, as it is an optional dependency of this library:
>
> ```elixir
> def deps do
>   [
>     {:sql_parser_ex, "~> 0.1"},
>     {:rustler, "~> 0.37"}   # only needed when SQL_PARSER_EX_BUILD=1
>   ]
> end
> ```

## License

Copyright 2025 Andre Anastacio. Licensed under the [Apache License, Version 2.0](LICENSE).