Skip to main content

doc/ccxt-structure-schema.md

# CCXT Structure Schema Layer

This document defines the first database-facing contract layer for the Elixir
CCXT target. The goal is to derive database schemas from CCXT unified
structures without coupling exchange runtime code to a specific database.

## Layering

The schema path has three layers:

1. Raw exchange payloads
   - Store original websocket messages, REST responses, and WS API responses.
   - Use JSONB or the target database equivalent.
   - Raw retention is mandatory because unified structures intentionally omit
     exchange-specific fields.
2. CCXT unified structures
   - Store normalized `ticker`, `trade`, `order_book`, `ohlcv`, `balance`,
     `order`, `position`, and `liquidation` structures.
   - The contract lives in
     `priv/ccxt_structures/binance_pro_structures.json`.
   - Numeric values are database decimals even when current runtime output is an
     Elixir number or numeric string.
3. Application state
   - Strategy state, risk state, reconciliation markers, and workflow status
     belong outside the CCXT structure layer.

## Manifest

The first manifest is:

```text
priv/ccxt_structures/binance_pro_structures.json
```

It records:

- `sourceMethods`: Elixir methods that emit the structure.
- `canonicalTable`: suggested current-state table.
- `historyTable`: suggested event/history table when the structure is mutable.
- `fields`: CCXT unified keys, runtime accepted types, nullable policy, and
  recommended database type.
- `primaryKey`: suggested logical identity, not a generated migration.

The manifest is intentionally database-agnostic. Ecto schema and migration
templates are generated from it, while runtime metadata is exposed through
`Ccxt.StructureSchema`.

## State And Event Split

Use separate state and event tables for mutable websocket domains:

| Structure | Current State | Event/History |
| --- | --- | --- |
| ticker | `ccxt_tickers` | `ccxt_ticker_events` |
| order book | `ccxt_order_books` | `ccxt_order_book_events` |
| balance | `ccxt_balances` | `ccxt_balance_events` |
| order | `ccxt_orders` | `ccxt_order_events` |
| position | `ccxt_positions` | `ccxt_position_events` |
| trade | none required | `ccxt_trades` |
| ohlcv | none required | `ccxt_ohlcvs` |
| liquidation | none required | `ccxt_liquidations` |

State tables should be upserted by logical identity. Event tables should be
append-oriented and should keep enough exchange metadata to rebuild state.

## Numeric Policy

The runtime currently emits a mix of Elixir numbers and numeric strings,
depending on the parser path and how close the field is to CCXT TypeScript
semantics. The database contract is stricter:

```text
price, amount, cost, volume, fee, pnl, contract values -> decimal
timestamp -> bigint
datetime -> utc_datetime_usec
raw payload / info / nested fees / nested trades -> jsonb
```

Do not store prices, amounts, or balances as floats in the database. Convert at
the boundary into `Decimal` or the database driver's decimal type.

## Raw Payload Retention

Each normalized row should be traceable to raw data.

- If the parser output has `info`, store it in the structure table.
- Always keep a raw event/response table for websocket messages and WS API
  responses.
- Balance structures may not expose a single `info` field for all paths, so raw
  balance responses must be retained in the raw layer.

`Ccxt.RawPayload` provides the runtime contract for that raw layer:

```elixir
{:ok, raw_attrs} =
  Ccxt.RawPayload.normalize_ws_event(payload,
    exchange_id: "binance",
    env: "prod",
    market_type: "spot",
    stream: "btcusdt@miniTicker"
  )

{:ok, %{raw: raw, raw_plan: raw_plan, attrs: attrs, plans: plans}} =
  Ccxt.RawPayload.normalize_structure_with_raw(:ticker, ticker, payload,
    exchange_id: "binance",
    env: "prod",
    market_type: "spot"
  )

attrs.raw_ref_id == raw.id
```

Suggested raw tables:

| Kind | Table | Operation |
| --- | --- | --- |
| websocket event | `ccxt_raw_ws_events` | append-only insert |
| websocket API response | `ccxt_raw_ws_api_responses` | append-only insert |

Raw attrs include:

```text
id
exchange_id
env
market_type
account_type
transport
channel
stream
request_id
message_hash
url
payload
received_at
```

`message_hash` is a deterministic SHA-256 hash over the canonicalized Elixir
payload term. It is intended for dedupe/debugging and does not replace the raw
payload itself.

## Suggested Tables

The manifest currently maps to these canonical table names:

- `ccxt_tickers`
- `ccxt_ticker_events`
- `ccxt_order_books`
- `ccxt_order_book_events`
- `ccxt_trades`
- `ccxt_ohlcvs`
- `ccxt_balances`
- `ccxt_balance_events`
- `ccxt_orders`
- `ccxt_order_events`
- `ccxt_positions`
- `ccxt_position_events`
- `ccxt_liquidations`

The first migration generator should add common columns to every table:

```text
id
exchange_id
market_type
account_type
symbol
inserted_at
updated_at
raw_ref_id
```

Only fields that make sense for the specific structure should be populated.

## Verification

`test/ccxt_pro_binance_test.exs` loads the manifest and validates it against
current parser output for:

- ticker
- trade
- order book
- ohlcv
- balance
- order
- position
- liquidation

This keeps the schema layer tied to the generated/runtime output without adding
database concerns to `Ccxt.Pro.Binance`.

## Generated Templates

The manifest can already emit database templates:

```bash
npm run generateElixirStructureSchemas
npm run assertElixirStructureSchemas
```

Generated files:

```text
priv/ccxt_structures/generated/binance_pro_ecto_schemas.exs
priv/ccxt_structures/generated/binance_pro_migration.exs
```

These are templates, not compiled runtime modules in this package. Copy them
into the consuming Ecto application, review table prefixes/index names, and wire
them to the application repo module before running migrations.

## Runtime Metadata

`Ccxt.StructureSchema` exposes the same manifest contract at runtime:

```elixir
Ccxt.StructureSchema.structures()
Ccxt.StructureSchema.table_names()
Ccxt.StructureSchema.schema_targets(:ticker)
Ccxt.StructureSchema.field_entries(:order)
Ccxt.StructureSchema.required_fields(:ohlcv)
Ccxt.StructureSchema.primary_key(:order)
Ccxt.StructureSchema.source_methods(:order_book)
```

This is the stable lookup layer for a consuming app that wants to derive
schemas, changesets, upsert keys, or persistence routing from CCXT unified
structures without parsing generated `.exs` templates.

## Attr Normalization

`Ccxt.StructureNormalizer` converts CCXT unified output into attrs shaped for
the generated Ecto templates:

```elixir
{:ok, attrs} =
  Ccxt.StructureNormalizer.normalize(:ticker, ticker,
    exchange_id: "binance",
    market_type: "spot"
  )
```

The normalizer:

- converts CCXT camelCase fields to snake_case
- maps CCXT business `id` to `ccxt_id` to avoid Ecto primary-key collisions
- emits decimal fields as strings for downstream Ecto `:decimal` casting
- wraps JSONB list values as `%{"data" => list}` so generated `:map` fields can
  accept them
- expands balance maps into one attrs row per currency

## Persistence Contract

`Ccxt.StructurePersistence` derives insert/upsert plans from the same manifest:

```elixir
{:ok, [state_plan, event_plan]} = Ccxt.StructurePersistence.plans(:ticker)

state_plan.table
# "ccxt_tickers"

state_plan.operation
# :upsert

Ccxt.StructurePersistence.ecto_options(state_plan)
# [
#   conflict_target: [:exchange_id, :market_type, :symbol],
#   on_conflict: {:replace, state_plan.update_fields}
# ]
```

The contract is intentionally Repo-agnostic:

- canonical state tables use `:upsert` when the manifest defines a stable
  logical identity
- history/event tables use append-only `:insert`
- event-only structures such as trades and liquidations use append-only
  `:insert`
- `source_method_plans/1` maps a high-level Pro method such as
  `:stream_order_book` to the persistence plans for its emitted structure

Runnable persistence dry-run example:

```bash
cd elixir
mix run examples/pro_structure_persistence.exs
```

The example streams public ticker updates, prints a raw insert plan first, then
normalizes the ticker with `raw_ref_id` and prints the state/history
persistence plans a consuming application would pass to its database layer.