docs/schema.md

Select File:
# Schema Design Guide

This comprehensive guide covers schema design principles, field types, and best practices for building efficient search applications with TantivyEx.

## Related Documentation

- **[Document Operations Guide](documents.md)** - Learn how to create documents that conform to your schema
- **[Indexing Guide](indexing.md)** - Index documents based on your schema design
- **[Search Guide](search.md)** - Query documents using schema-aware searches
- **[Tokenizers Guide](tokenizers.md)** - Choose the right text processing for your fields

## Table of Contents

- [Understanding Schemas](#understanding-schemas)
- [Schema Basics](#schema-basics)
- [Field Types Reference](#field-types-reference)
- [Field Options Deep Dive](#field-options-deep-dive)
- [Schema Design Patterns](#schema-design-patterns)
- [Performance Considerations](#performance-considerations)
- [Common Pitfalls](#common-pitfalls)
- [Migration Strategies](#migration-strategies)
- [Real-World Examples](#real-world-examples)

## Understanding Schemas

### What is a Schema?

A **schema** in TantivyEx is a blueprint that defines:

1. **Document Structure**: What fields your documents contain
2. **Field Types**: How each field's data should be interpreted (text, numbers, dates, etc.)
3. **Indexing Strategy**: How each field should be processed for search
4. **Storage Options**: Whether field values should be retrievable from search results

Think of a schema as a contract between your application and the search engine - it tells TantivyEx exactly how to handle your data for optimal search performance.

### Schema Design Philosophy

When designing a schema, consider these key principles:

- **Search Requirements**: What types of queries will you run?
- **Performance Needs**: What are your speed and memory constraints?
- **Data Characteristics**: What types of data are you working with?
- **Future Growth**: How might your requirements evolve?

## Schema Basics

### Creating a Schema

```elixir
alias TantivyEx.Schema

# Create a new schema
{:ok, schema} = Schema.new()

# Add fields to the schema
{:ok, schema} = Schema.add_text_field(schema, "title", :text_stored)
{:ok, schema} = Schema.add_text_field(schema, "body", :text)
{:ok, schema} = Schema.add_u64_field(schema, "timestamp", :fast_stored)
{:ok, schema} = Schema.add_f64_field(schema, "price", :fast_stored)
{:ok, schema} = Schema.add_facet_field(schema, "category", :facet)
```

### Schema Introspection

```elixir
# Get all field names
{:ok, fields} = Schema.get_field_names(schema)
# Returns: ["title", "body", "timestamp", "price", "category"]

# Get specific field information
{:ok, field_info} = Schema.get_field_info(schema, "title")
# Returns detailed information about the field configuration
```

### Schema Validation

Always validate your schema before creating an index:

```elixir
defmodule MyApp.SchemaValidator do
  def validate_schema(schema) do
    with {:ok, fields} <- Schema.get_field_names(schema),
         :ok <- check_required_fields(fields),
         :ok <- check_field_types(schema, fields) do
      {:ok, schema}
    else
      {:error, reason} -> {:error, "Schema validation failed: #{reason}"}
    end
  end

  defp check_required_fields(fields) do
    required = ["title", "content"]
    missing = required -- fields

    case missing do
      [] -> :ok
      missing_fields -> {:error, "Schema validation failed: Missing required fields: #{inspect(missing_fields)}"}
    end
  end

  defp check_field_types(schema, fields) do
    # Validate that each field has appropriate type for its intended use
    Enum.reduce_while(fields, :ok, fn field, acc ->
      case Schema.get_field_info(schema, field) do
        {:ok, _info} -> {:cont, acc}
        {:error, reason} -> {:halt, {:error, "Invalid field #{field}: #{reason}"}}
      end
    end)
  end
end
```

## Field Types Reference

### Text Fields

Text fields are used for full-text search and support various indexing options.

#### Options

- `:text` - Indexed for search only
- `:text_stored` - Indexed and stored (retrievable)
- `:stored` - Stored only (not searchable)
- `:fast` - Indexed and optimized for fast access
- `:fast_stored` - Indexed with positions for phrase queries, stored, and optimized for fast access

#### Examples

```elixir
# Full-text searchable title that can be retrieved
{:ok, schema} = Schema.add_text_field(schema, "title", :text_stored)

# Full-text searchable content (not stored to save space)
{:ok, schema} = Schema.add_text_field(schema, "content", :text)

# Metadata that's stored but not searchable
{:ok, schema} = Schema.add_text_field(schema, "metadata", :stored)
```

#### With Custom Tokenizers

```elixir
# Use simple tokenizer for exact matching
{:ok, schema} = Schema.add_text_field_with_tokenizer(
  schema,
  "product_code",
  :text_stored,
  "simple"
)

# Use whitespace tokenizer for basic word splitting
{:ok, schema} = Schema.add_text_field_with_tokenizer(
  schema,
  "tags",
  :text,
  "whitespace"
)
```

### Numeric Fields

Numeric fields support range queries and sorting.

#### U64 Fields (Unsigned 64-bit integers)

```elixir
# Indexed timestamp for range queries
{:ok, schema} = Schema.add_u64_field(schema, "created_at", :indexed)

# Stored and indexed user ID
{:ok, schema} = Schema.add_u64_field(schema, "user_id", :indexed_stored)

# Stored-only view count (not queryable)
{:ok, schema} = Schema.add_u64_field(schema, "view_count", :stored)
```

#### I64 Fields (Signed 64-bit integers)

```elixir
# Temperature readings (can be negative)
{:ok, schema} = Schema.add_i64_field(schema, "temperature", :indexed)

# Profit/loss calculations
{:ok, schema} = Schema.add_i64_field(schema, "profit", :indexed_stored)
```

#### F64 Fields (64-bit floating point)

```elixir
# Product prices for range filtering
{:ok, schema} = Schema.add_f64_field(schema, "price", :indexed)

# Geographic coordinates
{:ok, schema} = Schema.add_f64_field(schema, "latitude", :indexed_stored)
{:ok, schema} = Schema.add_f64_field(schema, "longitude", :indexed_stored)

# Rating scores
{:ok, schema} = Schema.add_f64_field(schema, "rating", :indexed)
```

### Binary Fields

Binary fields store arbitrary byte data.

```elixir
# Store file content
{:ok, schema} = Schema.add_bytes_field(schema, "file_data", :stored)

# Store and index binary checksums
{:ok, schema} = Schema.add_bytes_field(schema, "checksum", :indexed_stored)
```

### Date Fields

Date fields provide optimized date/time handling.

```elixir
# Article publication date
{:ok, schema} = Schema.add_date_field(schema, "published_at", :indexed)

# User registration with storage
{:ok, schema} = Schema.add_date_field(schema, "registered_at", :indexed_stored)
```

### JSON Fields

JSON fields store structured data as JSON objects.

```elixir
# Store user preferences
{:ok, schema} = Schema.add_json_field(schema, "preferences", :stored)

# Store and index configuration
{:ok, schema} = Schema.add_json_field(schema, "config", :indexed_stored)
```

### IP Address Fields

Specialized fields for IPv4 and IPv6 addresses.

```elixir
# Client IP addresses
{:ok, schema} = Schema.add_ip_addr_field(schema, "client_ip", :indexed)

# Server addresses with storage
{:ok, schema} = Schema.add_ip_addr_field(schema, "server_ip", :indexed_stored)
```

### Facet Fields

Facet fields enable hierarchical categorization and faceted search.

```elixir
# Product categories (e.g., "/electronics/phones/smartphones")
{:ok, schema} = Schema.add_facet_field(schema, "category", :indexed)

# Geographic hierarchy with storage
{:ok, schema} = Schema.add_facet_field(schema, "location", :indexed_stored)
```

## Schema Design Patterns

### E-commerce Product Catalog

```elixir
{:ok, schema} = Schema.new()

# Basic product information
{:ok, schema} = Schema.add_text_field(schema, "name", :text_stored)
{:ok, schema} = Schema.add_text_field(schema, "description", :text)
{:ok, schema} = Schema.add_text_field(schema, "brand", :text_stored)

# Pricing and inventory
{:ok, schema} = Schema.add_f64_field(schema, "price", :indexed)
{:ok, schema} = Schema.add_u64_field(schema, "stock_quantity", :indexed)

# Categories and attributes
{:ok, schema} = Schema.add_facet_field(schema, "category", :indexed)
{:ok, schema} = Schema.add_json_field(schema, "attributes", :stored)

# Ratings and reviews
{:ok, schema} = Schema.add_f64_field(schema, "average_rating", :indexed)
{:ok, schema} = Schema.add_u64_field(schema, "review_count", :indexed)

# Metadata
{:ok, schema} = Schema.add_date_field(schema, "created_at", :indexed)
{:ok, schema} = Schema.add_date_field(schema, "updated_at", :indexed)
```

### Blog/CMS System

```elixir
{:ok, schema} = Schema.new()

# Content fields
{:ok, schema} = Schema.add_text_field(schema, "title", :text_stored)
{:ok, schema} = Schema.add_text_field(schema, "content", :text)
{:ok, schema} = Schema.add_text_field(schema, "excerpt", :text_stored)
{:ok, schema} = Schema.add_text_field(schema, "slug", :stored)

# Author information
{:ok, schema} = Schema.add_text_field(schema, "author_name", :text_stored)
{:ok, schema} = Schema.add_u64_field(schema, "author_id", :indexed)

# Categorization
{:ok, schema} = Schema.add_facet_field(schema, "category", :indexed)
{:ok, schema} = Schema.add_text_field_with_tokenizer(
  schema, "tags", :text, "whitespace"
)

# Publishing workflow
{:ok, schema} = Schema.add_text_field(schema, "status", :indexed)
{:ok, schema} = Schema.add_date_field(schema, "published_at", :indexed)
{:ok, schema} = Schema.add_date_field(schema, "created_at", :indexed)
```

### Log Analysis System

```elixir
{:ok, schema} = Schema.new()

# Log entry basics
{:ok, schema} = Schema.add_text_field(schema, "message", :text)
{:ok, schema} = Schema.add_text_field(schema, "level", :indexed)
{:ok, schema} = Schema.add_date_field(schema, "timestamp", :indexed)

# Source information
{:ok, schema} = Schema.add_text_field(schema, "service", :indexed)
{:ok, schema} = Schema.add_text_field(schema, "host", :indexed)
{:ok, schema} = Schema.add_ip_addr_field(schema, "client_ip", :indexed)

# Structured data
{:ok, schema} = Schema.add_json_field(schema, "metadata", :stored)
{:ok, schema} = Schema.add_u64_field(schema, "request_id", :indexed)

# Performance metrics
{:ok, schema} = Schema.add_f64_field(schema, "response_time", :indexed)
{:ok, schema} = Schema.add_u64_field(schema, "status_code", :indexed)
```

## Performance Considerations

### Field Storage Strategy

**Store only what you need to retrieve:**

- Use `:text` instead of `:text_stored` for large content that you don't need to display
- Store frequently accessed fields for better retrieval performance
- Consider the trade-off between index size and retrieval speed

**Example:**

```elixir
# Good: Store title for display, don't store body (search only)
{:ok, schema} = Schema.add_text_field(schema, "title", :text_stored)
{:ok, schema} = Schema.add_text_field(schema, "body", :text)

# Bad: Storing large content unnecessarily
{:ok, schema} = Schema.add_text_field(schema, "body", :text_stored)  # Bloats index
```

### Indexing Strategy

**Index only queryable fields:**

- Don't index fields that are only used for display
- Use appropriate numeric types for range queries
- Consider facet fields for categorical data

**Example:**

```elixir
# Good: Index searchable and filterable fields
{:ok, schema} = Schema.add_text_field(schema, "searchable_content", :text)
{:ok, schema} = Schema.add_u64_field(schema, "category_id", :indexed)
{:ok, schema} = Schema.add_text_field(schema, "display_only", :stored)

# Bad: Indexing display-only data
{:ok, schema} = Schema.add_text_field(schema, "display_only", :text)  # Wastes space
```

### Tokenizer Selection

Choose tokenizers based on your search requirements:

- **Default**: Good for most text search scenarios
- **Simple**: For exact matching (product codes, IDs)
- **Whitespace**: For tag-like data where punctuation matters
- **Keyword**: For fields that should be treated as single terms

## Migration Strategies

### Schema Evolution

Tantivy schemas are immutable once an index is created. For schema changes:

1. **Create a new index** with the updated schema
2. **Reindex all documents** into the new index
3. **Switch over** to the new index atomically

### Backwards Compatibility

When designing schemas, consider future needs:

```elixir
# Add optional fields that can be null/empty
{:ok, schema} = Schema.add_json_field(schema, "extensions", :stored)

# Use generic field names for flexibility
{:ok, schema} = Schema.add_f64_field(schema, "metric_1", :indexed)
{:ok, schema} = Schema.add_f64_field(schema, "metric_2", :indexed)
```

### Data Migration Example

```elixir
defmodule MyApp.IndexMigration do
  alias TantivyEx.{Index, Schema}

  def migrate_to_new_schema(old_index_path, new_index_path) do
    # Create new schema
    {:ok, new_schema} = create_new_schema()

    # Create new index
    {:ok, new_index} = Index.create(new_index_path, new_schema)

    # Read from old index and write to new
    old_docs = read_all_documents(old_index_path)

    {:ok, writer} = TantivyEx.IndexWriter.new(new_index)

    Enum.each(old_docs, fn doc ->
      transformed_doc = transform_document(doc)
      TantivyEx.IndexWriter.add_document(writer, transformed_doc)
    end)

    TantivyEx.IndexWriter.commit(writer)
  end

  defp transform_document(old_doc) do
    # Transform document structure for new schema
    # Handle field renames, type changes, etc.
    old_doc
    |> Map.put("new_field", derive_new_field_value(old_doc))
    |> Map.delete("deprecated_field")
  end
end
```

## Best Practices

1. **Plan ahead**: Design schemas with future requirements in mind
2. **Test with real data**: Validate schema performance with representative datasets
3. **Monitor index size**: Balance between functionality and storage/memory usage
4. **Document your schema**: Keep clear documentation of field purposes and constraints
5. **Use consistent naming**: Follow naming conventions across your application
6. **Consider query patterns**: Design fields to support your most common query types

## Troubleshooting

### Common Schema Issues

**Field not searchable:**

- Ensure the field is indexed (`:text`, `:indexed`, etc.)
- Check that the correct tokenizer is used for text fields

**Large index size:**

- Review which fields are stored vs. indexed
- Consider using `:text` instead of `:text_stored` for large content

**Slow queries:**

- Ensure filtered fields are indexed
- Consider using facet fields for categorical data
- Review tokenizer choice for text fields

**Type mismatches:**

- Ensure document field types match schema definitions
- Use appropriate numeric types (u64 vs. i64 vs. f64)

## Field Options Deep Dive

Understanding field options is crucial for optimal performance and functionality. Each option serves specific use cases and has performance implications.

### Text Field Options Explained

#### `:text` - Search Only

- **Use case**: Large content fields (article body, descriptions)
- **Storage**: Not stored in index (saves space)
- **Searchable**: Yes (full-text search)
- **Retrievable**: No
- **Performance**: Fastest indexing, smallest index size

```elixir
# Perfect for large content you only need to search
{:ok, schema} = Schema.add_text_field(schema, "article_content", :text)
```

#### `:text_stored` - Search and Retrieve

- **Use case**: Titles, names, short descriptions
- **Storage**: Stored in index
- **Searchable**: Yes (full-text search)
- **Retrievable**: Yes
- **Performance**: Larger index size, retrieval without external lookup

```elixir
# Perfect for fields you need in search results
{:ok, schema} = Schema.add_text_field(schema, "title", :text_stored)
```

#### `:stored` - Storage Only

- **Use case**: Metadata, IDs, non-searchable data
- **Storage**: Stored in index
- **Searchable**: No
- **Retrievable**: Yes
- **Performance**: Minimal indexing overhead

```elixir
# Perfect for display-only data
{:ok, schema} = Schema.add_text_field(schema, "internal_id", :stored)
```

#### `:fast` - Optimized Access

- **Use case**: Fields used in sorting, faceting, or frequent filtering
- **Storage**: Not stored (saves space)
- **Searchable**: Yes (term queries, not full-text)
- **Retrievable**: No
- **Performance**: Fast random access, optimized for aggregations

```elixir
# Perfect for categorical data used in filters
{:ok, schema} = Schema.add_text_field(schema, "status", :fast)
```

#### `:fast_stored` - Complete Functionality

- **Use case**: Fields needing full functionality (search, retrieve, sort, phrase queries)
- **Storage**: Stored with position information
- **Searchable**: Yes (full-text with phrase queries)
- **Retrievable**: Yes
- **Performance**: Largest index size, most functionality

```elixir
# Perfect for important fields needing all features
{:ok, schema} = Schema.add_text_field(schema, "product_name", :fast_stored)
```

### Numeric Field Options

#### `:indexed` - Basic Indexing

- **Use case**: Range queries, basic filtering
- **Functionality**: Range queries (`field:[10 TO 20]`)
- **Storage**: Not stored
- **Performance**: Good for filtering, can't retrieve values

#### `:fast` - Optimized Performance

- **Use case**: High-performance filtering, sorting, aggregations
- **Functionality**: Very fast range queries, sorting
- **Storage**: Not stored
- **Performance**: Optimized data structure, fastest queries

### Field Option Decision Matrix

| Need to... | Text Option | Numeric Option |
|------------|-------------|----------------|
| Search only | `:text` | `:indexed` |
| Search + retrieve | `:text_stored` | `:indexed_stored` |
| Fast operations only | `:fast` | `:fast` |
| Everything | `:fast_stored` | `:fast_stored` |
| Store only | `:stored` | `:stored` |

## Common Pitfalls

### 1. Over-storing Data

**Problem**: Storing every field makes indexes unnecessarily large.

```elixir
# ❌ Bad: Storing large content unnecessarily
{:ok, schema} = Schema.add_text_field(schema, "full_article", :text_stored)

# ✅ Good: Store summary, search full content
{:ok, schema} = Schema.add_text_field(schema, "full_article", :text)
{:ok, schema} = Schema.add_text_field(schema, "summary", :text_stored)
```

### 2. Wrong Field Types for Data

**Problem**: Using text fields for structured data that should be numeric or faceted.

```elixir
# ❌ Bad: String for numeric data
{:ok, schema} = Schema.add_text_field(schema, "price", :text_stored)

# ✅ Good: Proper numeric type
{:ok, schema} = Schema.add_f64_field(schema, "price", :fast_stored)

# ❌ Bad: Text for categories
{:ok, schema} = Schema.add_text_field(schema, "category", :text)

# ✅ Good: Facet for hierarchical categories
{:ok, schema} = Schema.add_facet_field(schema, "category", :facet)
```

### 3. Inadequate Field Options

**Problem**: Choosing field options that don't support your query patterns.

```elixir
# ❌ Bad: Can't do phrase queries
{:ok, schema} = Schema.add_text_field(schema, "title", :text_stored)

# If you need phrase queries ("exact phrase"), use:
# ✅ Good: Supports phrase queries
{:ok, schema} = Schema.add_text_field(schema, "title", :fast_stored)
```

### 4. Inconsistent Field Naming

**Problem**: Inconsistent naming makes code harder to maintain.

```elixir
# ❌ Bad: Inconsistent naming
{:ok, schema} = Schema.add_text_field(schema, "Title", :text_stored)
{:ok, schema} = Schema.add_text_field(schema, "article_content", :text)
{:ok, schema} = Schema.add_u64_field(schema, "created", :indexed)

# ✅ Good: Consistent naming convention
{:ok, schema} = Schema.add_text_field(schema, "title", :text_stored)
{:ok, schema} = Schema.add_text_field(schema, "content", :text)
{:ok, schema} = Schema.add_u64_field(schema, "created_at", :indexed)
```

### 5. Missing Required Fields

**Problem**: Forgetting fields needed for core functionality.

```elixir
# ✅ Always include essential fields
defmodule MyApp.SchemaBuilder do
  def build_schema do
    {:ok, schema} = Schema.new()

    # Core searchable content
    {:ok, schema} = Schema.add_text_field(schema, "title", :text_stored)
    {:ok, schema} = Schema.add_text_field(schema, "content", :text)

    # Essential metadata
    {:ok, schema} = Schema.add_u64_field(schema, "created_at", :fast_stored)
    {:ok, schema} = Schema.add_u64_field(schema, "updated_at", :fast_stored)

    # Unique identifier (always store)
    {:ok, schema} = Schema.add_text_field(schema, "id", :stored)

    {:ok, schema}
  end
end
```

## Real-World Examples

### E-commerce Search Engine

This example shows a comprehensive e-commerce product search schema:

```elixir
defmodule EcommerceSchema do
  alias TantivyEx.Schema

  def create_product_schema do
    {:ok, schema} = Schema.new()

    # Product identification and basic info
    {:ok, schema} = Schema.add_text_field(schema, "id", :stored)
    {:ok, schema} = Schema.add_text_field(schema, "sku", :text_stored)
    {:ok, schema} = Schema.add_text_field(schema, "name", :fast_stored)  # Phrase queries for exact names
    {:ok, schema} = Schema.add_text_field(schema, "description", :text)  # Search only, save space

    # Brand and manufacturer
    {:ok, schema} = Schema.add_text_field(schema, "brand", :fast_stored)  # Fast filtering + display
    {:ok, schema} = Schema.add_text_field(schema, "manufacturer", :text_stored)

    # Pricing and inventory
    {:ok, schema} = Schema.add_f64_field(schema, "price", :fast_stored)  # Range queries + display
    {:ok, schema} = Schema.add_f64_field(schema, "sale_price", :fast_stored)
    {:ok, schema} = Schema.add_u64_field(schema, "stock_quantity", :fast)  # Fast filtering
    {:ok, schema} = Schema.add_text_field(schema, "availability", :fast)  # in_stock, out_of_stock, etc.

    # Categories and classification
    {:ok, schema} = Schema.add_facet_field(schema, "category", :facet)  # /electronics/phones/smartphones
    {:ok, schema} = Schema.add_facet_field(schema, "department", :facet)  # /men/clothing/shirts
    {:ok, schema} = Schema.add_text_field_with_tokenizer(schema, "tags", :text, "whitespace")

    # Product attributes (color, size, etc.)
    {:ok, schema} = Schema.add_json_field(schema, "attributes", :stored)  # Flexible storage
    {:ok, schema} = Schema.add_text_field(schema, "color", :fast)  # Fast filtering
    {:ok, schema} = Schema.add_text_field(schema, "size", :fast)
    {:ok, schema} = Schema.add_text_field(schema, "material", :text_stored)

    # Ratings and reviews
    {:ok, schema} = Schema.add_f64_field(schema, "average_rating", :fast_stored)
    {:ok, schema} = Schema.add_u64_field(schema, "review_count", :fast_stored)
    {:ok, schema} = Schema.add_u64_field(schema, "five_star_count", :fast)

    # SEO and metadata
    {:ok, schema} = Schema.add_text_field(schema, "meta_title", :stored)
    {:ok, schema} = Schema.add_text_field(schema, "meta_description", :stored)
    {:ok, schema} = Schema.add_text_field(schema, "url_slug", :stored)

    # Timestamps and versioning
    {:ok, schema} = Schema.add_date_field(schema, "created_at", :fast_stored)
    {:ok, schema} = Schema.add_date_field(schema, "updated_at", :fast_stored)
    {:ok, schema} = Schema.add_date_field(schema, "published_at", :fast)

    # Sales and popularity metrics
    {:ok, schema} = Schema.add_u64_field(schema, "sales_count", :fast)
    {:ok, schema} = Schema.add_u64_field(schema, "view_count", :fast)
    {:ok, schema} = Schema.add_f64_field(schema, "popularity_score", :fast)

    {:ok, schema}
  end

  # Example usage
  def search_products(index, query_params) do
    query = build_search_query(query_params)
    searcher = TantivyEx.Searcher.new(index)
    TantivyEx.Searcher.search(searcher, query, 50)
  end

  defp build_search_query(%{
    text: text,
    brand: brand,
    category: category,
    min_price: min_price,
    max_price: max_price,
    min_rating: min_rating
  }) do
    parts = []

    # Text search in name and description
    if text && text != "" do
      parts = ["(name:#{text} OR description:#{text})" | parts]
    end

    # Brand filter
    if brand && brand != "" do
      parts = ["brand:\"#{brand}\"" | parts]
    end

    # Category filter (facet)
    if category && category != "" do
      parts = ["category:\"#{category}\"" | parts]
    end

    # Price range
    if min_price || max_price do
      min_val = min_price || "*"
      max_val = max_price || "*"
      parts = ["price:[#{min_val} TO #{max_val}]" | parts]
    end

    # Rating filter
    if min_rating do
      parts = ["average_rating:[#{min_rating} TO *]" | parts]
    end

    # Combine with AND
    Enum.join(parts, " AND ")
  end
end
```

### Document Management System

This example shows a schema for a legal document management system:

```elixir
defmodule DocumentManagementSchema do
  alias TantivyEx.Schema

  def create_document_schema do
    {:ok, schema} = Schema.new()

    # Document identification
    {:ok, schema} = Schema.add_text_field(schema, "id", :stored)
    {:ok, schema} = Schema.add_text_field(schema, "document_number", :text_stored)
    {:ok, schema} = Schema.add_text_field(schema, "title", :fast_stored)

    # Content fields
    {:ok, schema} = Schema.add_text_field(schema, "content", :text)  # Full-text search only
    {:ok, schema} = Schema.add_text_field(schema, "summary", :text_stored)  # Display summary
    {:ok, schema} = Schema.add_text_field(schema, "abstract", :text_stored)

    # Document classification
    {:ok, schema} = Schema.add_facet_field(schema, "document_type", :facet)  # /legal/contracts/employment
    {:ok, schema} = Schema.add_facet_field(schema, "practice_area", :facet)  # /corporate/mergers
    {:ok, schema} = Schema.add_text_field(schema, "subject_matter", :text_stored)

    # Legal-specific fields
    {:ok, schema} = Schema.add_text_field(schema, "jurisdiction", :fast_stored)
    {:ok, schema} = Schema.add_text_field(schema, "court", :text_stored)
    {:ok, schema} = Schema.add_text_field(schema, "case_number", :text_stored)
    {:ok, schema} = Schema.add_date_field(schema, "filing_date", :fast_stored)
    {:ok, schema} = Schema.add_date_field(schema, "decision_date", :fast_stored)

    # Parties and entities
    {:ok, schema} = Schema.add_text_field(schema, "plaintiff", :text_stored)
    {:ok, schema} = Schema.add_text_field(schema, "defendant", :text_stored)
    {:ok, schema} = Schema.add_text_field(schema, "judge", :text_stored)
    {:ok, schema} = Schema.add_text_field(schema, "attorney_firm", :text_stored)

    # Document metadata
    {:ok, schema} = Schema.add_text_field(schema, "language", :fast)
    {:ok, schema} = Schema.add_u64_field(schema, "page_count", :fast_stored)
    {:ok, schema} = Schema.add_f64_field(schema, "confidence_score", :fast)  # OCR confidence
    {:ok, schema} = Schema.add_text_field(schema, "file_format", :fast)

    # Access control and security
    {:ok, schema} = Schema.add_facet_field(schema, "security_classification", :facet)
    {:ok, schema} = Schema.add_json_field(schema, "access_permissions", :stored)
    {:ok, schema} = Schema.add_text_field(schema, "owner", :fast)

    # Versioning and workflow
    {:ok, schema} = Schema.add_u64_field(schema, "version", :fast_stored)
    {:ok, schema} = Schema.add_text_field(schema, "status", :fast)  # draft, reviewed, approved, archived
    {:ok, schema} = Schema.add_text_field(schema, "workflow_stage", :fast)

    # Citations and references
    {:ok, schema} = Schema.add_text_field_with_tokenizer(schema, "cited_cases", :text, "whitespace")
    {:ok, schema} = Schema.add_text_field_with_tokenizer(schema, "cited_statutes", :text, "whitespace")
    {:ok, schema} = Schema.add_u64_field(schema, "citation_count", :fast)

    # Timestamps
    {:ok, schema} = Schema.add_date_field(schema, "created_at", :fast_stored)
    {:ok, schema} = Schema.add_date_field(schema, "updated_at", :fast_stored)
    {:ok, schema} = Schema.add_date_field(schema, "last_accessed", :fast)

    {:ok, schema}
  end
end
```

### Social Media Analytics

This example shows a schema for social media post analysis:

```elixir
defmodule SocialMediaSchema do
  alias TantivyEx.Schema

  def create_social_post_schema do
    {:ok, schema} = Schema.new()

    # Post identification
    {:ok, schema} = Schema.add_text_field(schema, "id", :stored)
    {:ok, schema} = Schema.add_text_field(schema, "platform_id", :text_stored)  # Original platform ID
    {:ok, schema} = Schema.add_text_field(schema, "platform", :fast)  # twitter, facebook, instagram

    # Content fields
    {:ok, schema} = Schema.add_text_field(schema, "content", :text_stored)  # Need to display
    {:ok, schema} = Schema.add_text_field(schema, "title", :text_stored)
    {:ok, schema} = Schema.add_text_field_with_tokenizer(schema, "hashtags", :text, "whitespace")
    {:ok, schema} = Schema.add_text_field_with_tokenizer(schema, "mentions", :text, "whitespace")

    # Author information
    {:ok, schema} = Schema.add_text_field(schema, "author_username", :fast_stored)
    {:ok, schema} = Schema.add_text_field(schema, "author_display_name", :text_stored)
    {:ok, schema} = Schema.add_u64_field(schema, "author_follower_count", :fast)
    {:ok, schema} = Schema.add_text_field(schema, "author_verified", :fast)  # true/false

    # Engagement metrics
    {:ok, schema} = Schema.add_u64_field(schema, "like_count", :fast_stored)
    {:ok, schema} = Schema.add_u64_field(schema, "share_count", :fast_stored)
    {:ok, schema} = Schema.add_u64_field(schema, "comment_count", :fast_stored)
    {:ok, schema} = Schema.add_u64_field(schema, "view_count", :fast)
    {:ok, schema} = Schema.add_f64_field(schema, "engagement_rate", :fast)

    # Sentiment and analysis
    {:ok, schema} = Schema.add_text_field(schema, "sentiment", :fast)  # positive, negative, neutral
    {:ok, schema} = Schema.add_f64_field(schema, "sentiment_score", :fast_stored)  # -1.0 to 1.0
    {:ok, schema} = Schema.add_f64_field(schema, "toxicity_score", :fast)
    {:ok, schema} = Schema.add_text_field_with_tokenizer(schema, "topics", :text, "whitespace")

    # Geographic and temporal data
    {:ok, schema} = Schema.add_text_field(schema, "country", :fast)
    {:ok, schema} = Schema.add_text_field(schema, "city", :fast)
    {:ok, schema} = Schema.add_f64_field(schema, "latitude", :fast)
    {:ok, schema} = Schema.add_f64_field(schema, "longitude", :fast)
    {:ok, schema} = Schema.add_date_field(schema, "posted_at", :fast_stored)
    {:ok, schema} = Schema.add_u64_field(schema, "hour_of_day", :fast)  # 0-23
    {:ok, schema} = Schema.add_u64_field(schema, "day_of_week", :fast)  # 1-7

    # Content classification
    {:ok, schema} = Schema.add_facet_field(schema, "content_type", :facet)  # /text, /image, /video
    {:ok, schema} = Schema.add_text_field(schema, "language", :fast)
    {:ok, schema} = Schema.add_text_field(schema, "adult_content", :fast)  # safe, questionable, explicit

    # Campaign and tracking
    {:ok, schema} = Schema.add_text_field(schema, "campaign_id", :fast)
    {:ok, schema} = Schema.add_text_field_with_tokenizer(schema, "tracking_codes", :text, "whitespace")
    {:ok, schema} = Schema.add_text_field(schema, "source", :fast)  # organic, paid, influencer

    # Media attachments
    {:ok, schema} = Schema.add_u64_field(schema, "media_count", :fast)
    {:ok, schema} = Schema.add_json_field(schema, "media_metadata", :stored)

    {:ok, schema}
  end

  # Example query patterns for social media analytics
  def trending_hashtags_query(time_range_start, time_range_end) do
    "posted_at:[#{time_range_start} TO #{time_range_end}] AND engagement_rate:[0.05 TO *]"
  end

  def viral_content_query(min_shares \\ 100) do
    "share_count:[#{min_shares} TO *] AND sentiment:positive"
  end

  def brand_mention_query(brand_name) do
    "(content:\"#{brand_name}\" OR mentions:\"@#{brand_name}\")"
  end
end
```

These real-world examples demonstrate how to design schemas for different domains, showing the thought process behind field selection, option choices, and query patterns. Each schema is optimized for its specific use case while maintaining good performance characteristics.

---

## Summary

Effective schema design is crucial for search performance and functionality. Remember these key principles:

1. **Choose appropriate field types** for your data
2. **Select field options** based on query requirements
3. **Balance storage and performance** needs
4. **Plan for future requirements** when possible
5. **Validate your schema** before production use

Take time to understand your search requirements and data characteristics before designing your schema. A well-designed schema will serve as the foundation for a fast, reliable search experience.