Skip to main content

README.md

# SchemaOrg

A strictly-typed builder for generating SEO [Schema.org](https://schema.org)
JSON-LD in Elixir and Phoenix applications.

You should not have to memorise the Schema.org vocabulary. This library ships
**1000+ generated struct modules** (`SchemaOrg.Product`, `SchemaOrg.Offer`, …),
one per Schema.org Class. Build a graph with ordinary struct literals — your
editor auto-completes the valid fields and the compiler rejects the rest — then
serialise it with `to_json_ld/1`.

```elixir
%SchemaOrg.Product{
  name: "MacBook Pro",
  offers: %SchemaOrg.Offer{price: 1999.00, price_currency: "USD"}
}
|> SchemaOrg.to_json_ld()
```

```json
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "MacBook Pro",
  "offers": { "@type": "Offer", "price": 1999.0, "priceCurrency": "USD" }
}
```

## Quick Start

```bash
mix deps.get          # fetch jason + ex_doc
mix compile           # compile the library and its generated types
mix test              # run the suite
```

## Commands

| Command | Description |
|---|---|
| `mix compile` | Compile the library and the generated type modules |
| `mix test` | Run the test suite |
| `mix test --failed` | Re-run only previously failing tests |
| `mix precommit` | Format check + compile (warnings as errors) + full test suite |
| `mix schema_org.build_types` | **Maintainer only** — regenerate `lib/schema_org/types/` from the vendored Schema.org graph |
| `mix docs` | Generate HTML documentation with ExDoc |

## Architecture

Two layers, one hand-written and one generated:

- **Runtime API** (`lib/schema_org.ex`, `lib/schema_org/thing.ex`) — hand-written.
  `SchemaOrg.to_json_ld/1` serialises any generated struct (recursively) into a
  `@context`/`@type`-annotated JSON-LD map and encodes it with Jason.
- **Generated types** (`lib/schema_org/types/`) — 1000+ files, one per Schema.org
  Class. Each is a plain struct (every valid property, direct and inherited, is a
  field) plus a `new/0` constructor. Build values with struct literals; a field
  is untyped, so it accepts Schema.org's loose value model directly (a scalar or
  a nested struct; a single value or a list). **Never edit these by hand** — they
  are produced by the code-generation task and overwritten on every run. (See
  [ADR-002](https://github.com/mike-kostov/schema_org/blob/main/docs/decisions/ADR-002-struct-literal-api-over-pipe-setters.md) for
  why building is struct-literal rather than pipe-setter based.)
- **Code generation** (`lib/mix/tasks/schema_org.build_types.ex`) — a
  maintainer-only Mix task that ingests the official Schema.org JSON-LD graph
  (`priv/schemaorg-current-https.jsonld`), maps Properties onto Classes via
  `domainIncludes` and `subClassOf` inheritance, and renders each module through
  an EEx template (`priv/templates/type.ex.eex`).

```
priv/schemaorg-current-https.jsonld ──▶ mix schema_org.build_types ──▶ lib/schema_org/types/*.ex
                                              (EEx template)
```

## Documentation

| Path | Contents |
|---|---|
| `docs/specs/` | Feature specs — one file per capability, updated in place as scope evolves |
| `docs/plans/` | Implementation plans — one file per spec, task-by-task breakdown with acceptance criteria |
| `docs/decisions/` | Architecture Decision Records (ADRs) — immutable; record the *why* behind significant choices |
| `docs/ideas/` | Early problem framing, refined before a spec is written |

Current docs:

- [`docs/specs/01-type-generation.md`](https://github.com/mike-kostov/schema_org/blob/main/docs/specs/01-type-generation.md) — Code-generation pipeline: JSON-LD parsing, EEx template, type module layout — **Implemented**
- [`docs/plans/01-type-generation.md`](https://github.com/mike-kostov/schema_org/blob/main/docs/plans/01-type-generation.md) — Task-by-task breakdown for the above
- [`docs/decisions/ADR-001-build-time-codegen-committed-artifacts.md`](https://github.com/mike-kostov/schema_org/blob/main/docs/decisions/ADR-001-build-time-codegen-committed-artifacts.md) — Build-time generation, committed as artifacts
- [`docs/decisions/ADR-002-struct-literal-api-over-pipe-setters.md`](https://github.com/mike-kostov/schema_org/blob/main/docs/decisions/ADR-002-struct-literal-api-over-pipe-setters.md) — Struct-literal building API (performance)

## Schema.org in Plain Terms

Schema.org is a shared vocabulary that search engines (Google, Bing, Yandex)
understand. When you embed it as JSON-LD in a page, you are telling the crawler
"this page is about a *Product* called *MacBook Pro* that has an *Offer* of
*$1999*". That structured data is what powers rich results — star ratings,
price snippets, breadcrumbs, FAQ accordions.

The vocabulary is a **graph of Types (Classes) and Properties**:

- A **Type** is a thing you can describe — `Product`, `Person`, `Recipe`, `Event`.
- A **Property** is an attribute — `name`, `price`, `author`, `startDate`.
- Each property declares which types it is valid on (`domainIncludes`) and what
  kind of value it expects (`rangeIncludes`).
- Types form an inheritance tree (`subClassOf`): every type ultimately descends
  from `Thing`, so every type has `name`, `description`, `url`, and so on.

Writing this JSON by hand is error-prone — there are ~800 types and ~1500
properties, and putting a property on the wrong type produces silently-invalid
markup. This library turns that graph into typed Elixir so the compiler and your
editor catch mistakes before a crawler ever sees them.

---

## Abbreviations

| Abbreviation | Full name | What it means in this project |
|---|---|---|
| **JSON-LD** | JSON for Linking Data | The JSON-based serialisation of Schema.org that search engines read. The output format this library produces |
| **SEO** | Search Engine Optimisation | The reason structured data exists — richer, higher-ranking search results |
| **DX** | Developer Experience | The guiding goal of this package: typed, autocompletable struct APIs |
| **Class** | Schema.org Class | A describable type (`Product`, `Offer`). Becomes one generated Elixir module |
| **Property** | Schema.org Property | An attribute (`name`, `price`). Becomes one struct field |
| **domainIncludes** | — | The Schema.org relation declaring which Classes a Property is valid on. Drives field→module mapping |
| **rangeIncludes** | — | The Schema.org relation declaring what value types a Property accepts |
| **EEx** | Embedded Elixir | The templating language used to render the generated `.ex` files |