README.md

# onigleam

[![Package Version](https://img.shields.io/hexpm/v/onigleam)](https://hex.pm/packages/onigleam)
[![Hex Docs](https://img.shields.io/badge/hex-docs-ffaff3)](https://hexdocs.pm/onigleam/)

A Gleam library for converting [Oniguruma](https://github.com/kkos/oniguruma) regex patterns to patterns compatible with `gleam_regexp`. This is hopefully useful for working with TextMate grammars in Gleam, as TextMate uses Oniguruma's regex syntax for syntax highlighting rules.

> **Attribution:** This library is a Gleam port of [oniguruma-to-es](https://github.com/slevithan/oniguruma-to-es) and [oniguruma-parser](https://github.com/slevithan/oniguruma-parser) by [Steven Levithan](https://github.com/slevithan).
> 
> This port was developed with LLM assistance (Claude).

```sh
gleam add onigleam@1
```

## Quick Start

```gleam
import onigleam
import onigleam/options
import gleam/dict
import gleam/regexp

// Convert a TextMate-style pattern with named capture groups
let assert Ok(result) = onigleam.convert(
  "(?<keyword>fn|let|pub)\\s+(?<name>[a-z_]\\w*)"
)

// Named groups become numbered, with a mapping preserved
result.pattern
// "(fn|let|pub)\\s+([a-z_]\\w*)"

dict.get(result.capture_names, "keyword")  // Ok(1)
dict.get(result.capture_names, "name")     // Ok(2)

// Compile and use directly
let assert Ok(re) = onigleam.to_regexp(
  "(?<num>\\d+)",
  options.default_options(),
)
let assert [match] = regexp.scan(re, "value: 42")
match.content  // "42"
```

## Usage

### Named Capture Groups

Oniguruma's named capture groups `(?<name>...)` are converted to standard numbered groups, since `gleam_regexp` doesn't expose named groups. The name-to-number mapping is returned so you can still reference captures by name:

```gleam
import onigleam
import gleam/dict

let assert Ok(result) = onigleam.convert(
  "(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})"
)

result.pattern
// "(\\d{4})-(\\d{2})-(\\d{2})"

dict.get(result.capture_names, "year")   // Ok(1)
dict.get(result.capture_names, "month")  // Ok(2)
dict.get(result.capture_names, "day")    // Ok(3)
```

### Unicode and Hex Escapes

Oniguruma's various escape formats are converted to their literal characters:

```gleam
import onigleam

// Hex escapes
let assert Ok(r1) = onigleam.convert("\\x41\\x42\\x43")
r1.pattern  // "ABC"

// Unicode escapes
let assert Ok(r2) = onigleam.convert("caf\\u00e9")
r2.pattern  // "café"
```

### TextMate Grammar Patterns

TextMate grammars sometimes reference capture groups that don't exist in the current pattern (orphan backreferences). Use `convert_textmate` to handle these gracefully:

```gleam
import onigleam

// This pattern references \1 but has no capture group
// Normal conversion would fail, but convert_textmate allows it
let assert Ok(result) = onigleam.convert_textmate(
  "(['\"]).*?\\1"  // Match quoted strings
)
// Returns Ok with a warning about the orphan backref
```

### Flags and Options

```gleam
import onigleam
import onigleam/options

// Case-insensitive matching
let assert Ok(result) = onigleam.convert_with_flags(
  "(?<tag>html|body|div)",
  "i"
)
result.regexp_options.case_insensitive  // True

// Full control with options builder
let opts = options.default_options()
  |> options.with_flags("i")
  |> options.allow_orphan_backrefs

let assert Ok(result) = onigleam.to_regexp_details(
  "(?<open><\\w+>).*?(?<close></\\w+>)",
  opts,
)
```

## API Reference

### Main Functions

| Function | Description |
|----------|-------------|
| `convert(pattern)` | Convert with default options |
| `convert_with_flags(pattern, flags)` | Convert with Oniguruma flags |
| `convert_textmate(pattern)` | Convert with TextMate-friendly options |
| `to_regexp(pattern, options)` | Convert and compile to `Regexp` |
| `to_regexp_details(pattern, options)` | Convert with full result details |
| `format_error(error)` | Format error as human-readable string |

### ConversionResult

```gleam
pub type ConversionResult {
  ConversionResult(
    pattern: String,              // Generated pattern string
    regexp_options: Options,      // Options for gleam_regexp
    capture_names: Dict(String, Int),  // Name -> group number mapping
    warnings: List(String),       // Any warnings generated
  )
}
```

## Supported Features

| Feature | Status | Notes |
|---------|--------|-------|
| Literals, escapes | Supported | Direct mapping |
| Character classes `[abc]` | Supported | Including ranges, negation |
| Quantifiers `*`, `+`, `?`, `{n,m}` | Supported | Greedy and lazy |
| Capturing groups `(...)` | Supported | Named groups converted to numbered |
| Non-capturing groups `(?:...)` | Supported | Direct mapping |
| Lookahead `(?=...)`, `(?!...)` | Supported | Both positive and negative |
| Lookbehind `(?<=...)`, `(?<!...)` | Supported | Both positive and negative |
| Anchors `^`, `$`, `\A`, `\z` | Supported | Direct mapping |
| Word boundaries `\b`, `\B` | Supported | Platform differences may apply |
| Character shorthands `\d`, `\w`, `\s` | Supported | Direct mapping |
| Alternation `a\|b` | Supported | Direct mapping |
| Unicode escapes `\uHHHH` | Supported | Converted to literal |
| Hex escapes `\xHH` | Supported | Converted to literal |

### Unsupported Features (Will Error)

| Feature | Why |
|---------|-----|
| Atomic groups `(?>...)` | Cannot emulate in gleam_regexp |
| Possessive quantifiers `*+`, `++` | Cannot emulate in gleam_regexp |
| Recursion `\g<0>` | Not supported by underlying engines |
| Subroutines `\g<name>` | Not supported by underlying engines |
| Search start `\G` | Requires stateful regex |
| Absence functions `(?~...)` | Cannot emulate |

### Partial Support / Workarounds

| Feature | Handling |
|---------|----------|
| Named captures | Converted to numbered; mapping returned |
| dotAll mode | `.` replaced with `[\s\S]` when enabled |
| Flag modifiers `(?i:...)` | Flags applied during transformation |
| `\K` directive | Warning issued; full match returned |

## Platform Compatibility

This library generates patterns compatible with both:
- Erlang's `re` module (PCRE)
- JavaScript's `RegExp`

Run tests on both targets:

```sh
gleam test --target erlang
gleam test --target javascript
```

## Error Handling

```gleam
import onigleam

case onigleam.convert("(?>atomic)") {
  Ok(result) -> use_result(result)
  Error(err) -> {
    let message = onigleam.format_error(err)
    // "Atomic groups are not supported. ..."
  }
}
```

## Development

```sh
gleam test
gleam test --target javascript  # Test on JavaScript target
gleam test --target erlang  # Test on Erlang target
```

Further documentation can be found at <https://hexdocs.pm/onigleam>.

## License

MIT License. See [LICENSE](LICENSE) for details.