Skip to main content

README.md

# hoopdb

[![Project Logo][logo]][logo-large]

*A lightweight, bring-your-own-boards semantic search substrate for Erlang RAG systems*

> ⚠️ **Status: early research preview.** hoopdb is at the inception stage. Core
> architecture decisions are still being settled by measurement (see
> [Status & roadmap](#status--roadmap)), and the API will change without notice
> before 1.0. This is a placeholder release — not yet production-ready.

---

## The pitch

If you want the full barrel, use Benoit Chesneau's
[barrel-db](https://github.com/barrel-db) — a complete, clustered vector database
with RocksDB persistence, embedders, a gateway, and a Raft cluster story.

**If bring-your-own-boards fits your needs, we've got the hoops.**

The metaphor is load-bearing. A barrel is staves (boards) bound by hoops around a
chosen volume. A full vector database ships the assembled cask: its own storage,
its own embedding boundary, its own clustering. **hoopdb ships the hoops** — the
retrieval algorithms and the seams between them — and lets *you* supply the
boards: your persistence (DETS/Mnesia), your embeddings (offline or sidecar), your
process architecture.

## What it aims to be

A small, honest retrieval core for building graph/RAG systems **in Erlang**, with
three retrievers that share one result shape so they fuse cleanly:

- **BM25** — pure-Erlang lexical search. No ML dependency, no embedding runtime.
- **Vector search** — semantic recall over embeddings (HNSW and/or exact
  brute-force k-NN).
- **Hybrid** — reciprocal-rank or linear fusion of the two.

Around that core, hoopdb owns the parts a vector database usually hides:
structure-aware Markdown **chunking**, a **fusion** layer, and a thin
**persistence seam** that treats the index as one opaque, rebuildable blob.

## Hard constraints (the spine of the project)

- **Pure Erlang at runtime**, optionally accelerated by a *prebuilt* plain-C SIMD
  NIF. The accelerator is optional; the pure-Erlang path always works.
- **No Rust, Elixir, or Python at the user's build or runtime.** Embedding a
  corpus is treated as an offline, build-time step (like running a compiler) — the
  tool never ships. For live text queries, BM25 needs no embedding runtime at all.
- **A C compiler at build time is fine; a prebuilt `.so`/`.dll` is preferred.**

## Who it's for

Erlang/OTP developers building retrieval or RAG over **small, curated corpora**think a handful of books, manuals, internal docs, or a knowledge base
(thousands of chunks, not millions). If your corpus is small enough that you'd
rather own a few hundred lines of transparent Erlang than operate a separate
vector-database service, hoopdb is aimed at you.

It is **not** trying to be a clustered, planet-scale vector store. For that, use
the full barrel.

## Status & roadmap

hoopdb is being built measurement-first. Two questions are open and under active
investigation, and the answers will shape the defaults:

1. **Vector path:** at this scale, does an approximate index (HNSW) earn its
   complexity over exact brute-force k-NN, or is brute-force the better default?
   HNSW is treated as a *hypothesis to validate*, not a settled choice.
2. **Retrieval quality:** how should textbook-style Markdown be chunked, and is the
   vector path even necessary as the *primary* retriever — or is BM25 (plus
   hybrid) enough for technical corpora?

Deferred for later milestones: a knowledge-graph layer over chunks, quantization
tuning, and any clustering/serving concerns (use the full barrel if you need
those).

## Built on `barrel_vectordb`

hoopdb will likely builds upon portions of
[`barrel_vectordb`](https://github.com/barrel-db/barrel_vectordb) (Apache-2.0),
reusing its well-engineered, storage-agnostic pure-Erlang HNSW, distance, and BM25
modules. Attribution is a feature, not a footnote. hoopdb is a deliberately
narrowed assembly of those modules, re-bound around different boards — not a fork
that hides its origins.

## License

Apache-2.0. See [LICENSE.md](LICENSE.md).

[//]: ---Named-Links---

[logo]: priv/images/logo.png
[logo-large]: priv/images/logo-large.png