README.md

# Expath

[![Hex.pm](https://img.shields.io/hexpm/v/expath.svg)](https://hex.pm/packages/expath)
[![Documentation](https://img.shields.io/badge/docs-hexdocs-blue.svg)](https://hexdocs.pm/expath)
[![CI](https://github.com/your-org/expath/workflows/CI/badge.svg)](https://github.com/wearecococo/expath/actions)

**Lightning-fast XML parsing and XPath querying for Elixir, powered by Rust NIFs.**

Expath provides blazing-fast XML processing through Rust's battle-tested `sxd-document` and `sxd-xpath` libraries, delivering **2-10x performance improvements** compared to existing Elixir XML libraries.

## ✨ Key Features

- **🚀 Blazing Fast**: 2-10x faster than SweetXml with Rust-powered NIFs
- **🔄 Parse-Once, Query-Many**: Efficient document reuse for multiple XPath queries
- **🛡️ Battle-Tested**: Built on proven Rust XML libraries (sxd-document, sxd-xpath)
- **🎯 Simple API**: Clean, intuitive interface with comprehensive documentation
- **⚡ Thread-Safe**: Safe concurrent access to parsed documents
- **🔧 Zero Dependencies**: No external XML parsers required

## 🚀 Quick Start

### Installation

Add `expath` to your list of dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:expath, "~> 0.1.0"}
  ]
end
```

Then run:

```bash
mix deps.get
mix deps.compile
```

### Basic Usage

# Simple XPath query

```elixir
xml = """
<library>
  <book id="1">
    <title>The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
  </book>
  <book id="2">
    <title>1984</title>
    <author>George Orwell</author>
  </book>
</library>
"""

# Extract all book titles
{:ok, titles} = Expath.select(xml, "//title/text()")
# => ["The Great Gatsby", "1984"]

# Find specific book
{:ok, [title]} = Expath.select(xml, "//book[@id='1']/title/text()")
# => ["The Great Gatsby"]

# Count books
{:ok, [count]} = Expath.select(xml, "count(//book)")
# => ["2"]
```

### Parse-Once, Query-Many (Recommended for Multiple Queries)

```elixir
# Parse document once
{:ok, doc} = Expath.new(xml)

# Run multiple queries efficiently
{:ok, titles} = Expath.query(doc, "//title/text()")
{:ok, authors} = Expath.query(doc, "//author/text()")
{:ok, book_count} = Expath.query(doc, "count(//book)")

# Document is automatically cleaned up when out of scope
```

## 📊 Performance Benchmarks

Real-world performance comparison with SweetXml across different document sizes:

| Document Size | Speed Improvement | Use Case |
|---------------|------------------|----------|
| Small (644B)  | **2-3x faster** | API responses, config files |
| Medium (5.6KB) | **2.3x faster** | RSS feeds, small datasets |
| Large (904KB) | **8-10x faster** | Large documents, bulk processing |

### Benchmark Results Summary

```
*** Large XML Performance ***
Expath (Rust NIFs)    78.27 iterations/sec (12.78 ms avg)
SweetXml               7.77 iterations/sec (128.64 ms avg)

Comparison: Expath is 10.07x faster
```

Run your own benchmarks:
```bash
mix run bench/benchmark.exs
```

## 📖 API Reference

### Core Functions

#### `Expath.select/2` - Single Query
Perfect for one-off XPath queries.

```elixir
Expath.select(xml_string, xpath_expression)
# Returns: {:ok, results} | {:error, reason}
```

#### `Expath.new/1` - Parse Document
Creates a reusable document for multiple queries.

```elixir
{:ok, doc} = Expath.new(xml_string)
# Returns: {:ok, %Expath.Document{}} | {:error, reason}
```

#### `Expath.query/2` - Query Parsed Document
Query a previously parsed document.

```elixir
{:ok, results} = Expath.query(document, xpath_expression)
# Returns: {:ok, results} | {:error, reason}
```

### XPath Support

Expath supports the full XPath 1.0 specification:

```elixir
# Node selection
Expath.select(xml, "//book")                    # All book elements
Expath.select(xml, "/library/book[1]")          # First book
Expath.select(xml, "//book[@id='1']")           # Book with id="1"

# Text extraction
Expath.select(xml, "//title/text()")            # All title text
Expath.select(xml, "//book/@id")                # All id attributes

# Functions
Expath.select(xml, "count(//book)")             # Count books
Expath.select(xml, "//book[position()=1]")     # First book
Expath.select(xml, "//book[contains(@class,'fiction')]") # Contains filter

# Complex expressions
Expath.select(xml, "//book[price > 10]/title/text()") # Conditional selection
```

## Error Handling

Expath provides detailed error information:

```elixir
# Invalid XML (detected during query)
{:error, :invalid_xml} = Expath.select("<root><unclosed>", "/*")

# Invalid XPath expression
{:error, :invalid_xpath} = Expath.select(xml, "//[invalid")

# XPath evaluation errors
{:error, :xpath_error} = Expath.query(doc, "unknown-function()")
```

## Performance

Expath is designed for high-performance XML processing:

- **Native Speed**: Rust NIFs provide near-native performance
- **Zero-Copy**: Efficient string handling between Elixir and Rust
- **Resource Caching**: Parse once, query many times without re-parsing
- **Memory Efficient**: Automatic memory management via Erlang garbage collection

### Performance Example

```elixir
# Large XML document
xml = File.read!("large_document.xml")

# Parse once (expensive operation)
{:ok, doc} = Expath.new(xml)

# Multiple queries (very fast - no re-parsing)
Enum.each(1..1000, fn _i ->
  {:ok, _results} = Expath.query(doc, "//some/xpath")
end)
```

## Platform Support

Expath supports all platforms where Rust and Erlang are available:

- **Linux** (x86_64, aarch64)
- **macOS** (Intel, Apple Silicon)
- **Windows** (x86_64)

### Apple Silicon (M1/M2) Setup

Expath includes special configuration for Apple Silicon Macs. If you encounter linking issues, ensure you have:

1. Native Erlang installation (not x86_64 via Rosetta)
2. Native Rust toolchain for aarch64-apple-darwin

The included Cargo configuration handles the necessary linker flags automatically.

## Examples

### RSS Feed Processing

```elixir
defmodule RSSProcessor do
  def process_feed(rss_xml) do
    {:ok, doc} = Expath.new(rss_xml)

    {:ok, titles} = Expath.query(doc, "//item/title/text()")
    {:ok, links} = Expath.query(doc, "//item/link/text()")
    {:ok, descriptions} = Expath.query(doc, "//item/description/text()")

    titles
    |> Enum.zip([links, descriptions])
    |> Enum.map(fn {title, [link, description]} ->
      %{title: title, link: link, description: description}
    end)
  end
end
```

### Configuration File Parsing

```elixir
defmodule ConfigParser do
  def parse_config(xml_config) do
    {:ok, doc} = Expath.new(xml_config)

    {:ok, database_host} = Expath.query(doc, "//database/@host")
    {:ok, database_port} = Expath.query(doc, "//database/@port")
    {:ok, features} = Expath.query(doc, "//features/feature/@name")

    %{
      database: %{host: database_host, port: database_port},
      features: features
    }
  end
end
```

### Data Extraction Pipeline

```elixir
defmodule DataExtractor do
  def extract_products(xml_data) do
    {:ok, doc} = Expath.new(xml_data)

    # Extract in parallel using cached document
    tasks = [
      Task.async(fn -> Expath.query(doc, "//product/@id") end),
      Task.async(fn -> Expath.query(doc, "//product/name/text()") end),
      Task.async(fn -> Expath.query(doc, "//product/price/text()") end),
      Task.async(fn -> Expath.query(doc, "//product/category/text()") end)
    ]

    [ids, names, prices, categories] =
      tasks
      |> Enum.map(&Task.await/1)
      |> Enum.map(fn {:ok, results} -> results end)

    [ids, names, prices, categories]
    |> Enum.zip()
    |> Enum.map(fn {id, name, price, category} ->
      %{id: id, name: name, price: price, category: category}
    end)
  end
end
```

## Development

### Prerequisites

- Elixir 1.18 or later
- Erlang/OTP 27 or later
- Rust 1.70 or later
- C compiler (gcc, clang, or MSVC)

### Building from Source

```bash
git clone https://github.com/yourusername/expath.git
cd expath
mix deps.get
mix compile
```

### Running Tests

```bash
mix test
```

### Building Documentation

```bash
mix docs
```

### Docker Development

For cross-platform testing or if you prefer containerized development, Expath includes comprehensive Docker support:

#### Quick Start with Docker

```bash
# Run all tests in Linux container
./scripts/docker-test.sh

# Or use docker-compose for specific tasks
docker-compose run test
docker-compose run benchmark
docker-compose run quality
```

#### Available Docker Services

- **`dev`**: Development environment with all dependencies
- **`test`**: Run the full test suite
- **`benchmark`**: Execute performance benchmarks
- **`quality`**: Run code quality checks (Credo)

#### Docker Commands

```bash
# Build and test everything
docker-compose up test

# Run interactive development shell
docker-compose run dev iex -S mix

# Execute benchmarks
docker-compose run benchmark

# Check code quality
docker-compose run quality

# Clean up containers
docker-compose down --volumes
```

#### Multi-Architecture Testing

The Docker setup supports testing on different architectures:

```bash
# Test on current architecture
docker-compose run test

# Build for specific platform (requires BuildKit)
DOCKER_PLATFORM=linux/amd64 docker-compose run test
```

This is particularly useful for ensuring your NIFs work correctly across different platforms before deployment.

## Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Write tests for your changes
4. Ensure all tests pass (`mix test`)
5. Commit your changes (`git commit -am 'Add some feature'`)
6. Push to the branch (`git push origin my-new-feature`)
7. Create a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Built on top of the excellent [sxd-document](https://crates.io/crates/sxd-document) and [sxd-xpath](https://crates.io/crates/sxd-xpath) Rust crates
- Uses [Rustler](https://github.com/rusterlium/rustler) for safe Elixir-Rust interoperability
- Inspired by the need for high-performance XML processing in Elixir applications

## Changelog

### v0.1.0 (Initial Release)

- High-performance XML parsing via Rust NIFs
- Full XPath 1.0 support
- Parse-once, query-many Document resource API
- Comprehensive error handling
- Apple Silicon support
- Complete test suite and documentation