README.md

# Stephen

[![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fgithub.com%2Fgeorgeguimaraes%2Fstephen%2Fblob%2Fmain%2Flivebook%2Fgetting_started.livemd)

ColBERT-style neural retrieval for Elixir.

Stephen implements late interaction retrieval using per-token embeddings and MaxSim scoring. Instead of compressing text into a single vector, it keeps one embedding per token, enabling fine-grained semantic matching.

## Installation

```elixir
def deps do
  [
    {:stephen, "~> 0.1.0"},
    {:exla, "~> 0.9"}  # optional, for GPU acceleration
  ]
end
```

For GPU acceleration:

```elixir
# config/config.exs
config :nx, default_backend: EXLA.Backend
```

## Quick Start

```elixir
# Load encoder (downloads model on first use)
{:ok, encoder} = Stephen.load_encoder()

# Create index and add documents
index = Stephen.new_index(encoder)
index = Stephen.index(encoder, index, [
  {"colbert", "Stephen Colbert hosted The Colbert Report before The Late Show"},
  {"conan", "Conan O'Brien is known for his self-deprecating humor and tall hair"},
  {"seth", "Seth Meyers was head writer at SNL before hosting Late Night"}
])

# Search
results = Stephen.search(encoder, index, "late night comedy")
# => [%{doc_id: "colbert", score: 15.2}, ...]

# Save/load
:ok = Stephen.save_index(index, "my_index")
{:ok, index} = Stephen.load_index("my_index")
```

## Why ColBERT?

Traditional dense retrieval compresses each text into a single vector. ColBERT keeps per-token embeddings and matches query tokens to document tokens individually:

1. Each token gets its own embedding
2. For each query token, find the best-matching document token (MaxSim)
3. Sum these maximum similarities

This captures nuanced relevance that single-vector methods miss.

## Reranking

Use Stephen to rerank candidates from a faster first-stage retriever:

```elixir
# From indexed documents
results = Stephen.rerank(encoder, index, "query", ["doc1", "doc5", "doc12"])

# From raw text (no index needed)
candidates = [
  {"colbert", "Stephen Colbert interviews politicians with satirical wit"},
  {"conan", "Conan O'Brien traveled the world for his travel show"}
]
results = Stephen.rerank_texts(encoder, "political satire comedy", candidates)
```

## Query Expansion (PRF)

Improve recall with pseudo-relevance feedback:

```elixir
results = Stephen.search_with_prf(encoder, index, "late night hosts")

# Tune expansion parameters
results = Stephen.search_with_prf(encoder, index, query,
  feedback_docs: 5,
  expansion_tokens: 15,
  expansion_weight: 0.3
)
```

PRF uses top-ranked documents to expand the query with related terms, finding documents that may not match the exact query.

## Debugging Scores

Understand why a document scored the way it did:

```elixir
explanation = Stephen.explain(encoder, "satirical comedy", "Colbert is a satirical host")

# Print formatted explanation
explanation |> Stephen.Scorer.format_explanation() |> IO.puts()
# Score: 15.20
#
# Query Token          -> Doc Token            Similarity
# --------------------------------------------------------
# satirical            -> satirical            0.95
# comedy               -> host                 0.72
# ...
```

This shows which query tokens matched which document tokens and their similarity scores.

## Index Types

| Index | Use Case |
|-------|----------|
| `Stephen.Index` | Small-medium collections, fast updates |
| `Stephen.Plaid` | Larger collections, sub-linear search |
| `Stephen.Index.Compressed` | Memory-constrained, 4-32x compression |

## Documentation

Try the interactive [Livebook tutorial](livebook/getting_started.livemd) to explore Stephen hands-on.

See the [guides](guides/) for detailed documentation:

- [Architecture](guides/architecture.md): how ColBERT and Stephen work
- [Index Types](guides/index_types.md): choosing and configuring indexes
- [Compression](guides/compression.md): residual quantization for memory efficiency
- [Chunking](guides/chunking.md): handling long documents
- [Configuration](guides/configuration.md): encoder and index options

## License

Copyright (c) 2025 George Guimarães

Licensed under the MIT License. See [LICENSE](LICENSE) for details.