# Aggregations Guide
**Updated for TantivyEx v0.2.0** - This comprehensive guide covers the powerful aggregation system in TantivyEx, providing Elasticsearch-compatible functionality for data analysis and search insights.
## Quick Start
```elixir
# Simple terms aggregation to group by category
aggregations = %{
"categories" => %{
"terms" => %{
"field" => "category",
"size" => 10
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
IO.inspect(results["categories"]["buckets"])
# [%{"key" => "electronics", "doc_count" => 42}, ...]
# Histogram aggregation for price distribution
price_histogram = %{
"price_ranges" => %{
"histogram" => %{
"field" => "price",
"interval" => 50.0
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, price_histogram)
# Combined search with aggregations
{:ok, search_results, agg_results} = TantivyEx.Aggregation.search_with_aggregations(
searcher,
query,
aggregations,
20 # search limit
)
```
## Related Documentation
- **[Search Guide](search.md)** - Understand how to combine search with aggregations
- **[Search Results Guide](search_results.md)** - Process aggregation results effectively
- **[Schema Design Guide](schema.md)** - Design schemas for optimal aggregation performance
- **[Performance Tuning](performance-tuning.md)** - Optimize aggregation performance
## Table of Contents
- [Quick Start](#quick-start)
- [Understanding Aggregations](#understanding-aggregations)
- [TantivyEx.Aggregation Module](#tantivyexaggregation-module)
- [Bucket Aggregations](#bucket-aggregations)
- [Metric Aggregations](#metric-aggregations)
- [Nested Aggregations](#nested-aggregations)
- [Advanced Features](#advanced-features)
- [Elasticsearch Compatibility](#elasticsearch-compatibility)
- [Performance Optimization](#performance-optimization)
- [Real-world Examples](#real-world-examples)
- [Aggregation Helpers](#aggregation-helpers)
- [Error Handling](#error-handling)
- [Troubleshooting](#troubleshooting)
## Understanding Aggregations
Aggregations allow you to analyze and summarize your data beyond simple search results. They provide insights into data distribution, statistical summaries, and patterns within your document collection.
### What Aggregations Do
Aggregations perform two main functions:
1. **Bucket Aggregations**: Group documents into buckets based on field values, ranges, or intervals
2. **Metric Aggregations**: Calculate statistical values (averages, sums, counts) across document sets
### Aggregation Pipeline
```text
Documents → Bucket Grouping → Metric Calculation → Results
```
**Example transformation:**
```text
1000 product documents
→ Group by category (bucket aggregation)
→ Calculate average price per category (metric aggregation)
→ Result: {"electronics": avg_price: 299.99, "books": avg_price: 24.99}
```
### Benefits
- **Data Insights**: Understand data distribution and patterns
- **Faceted Search**: Provide search result refinement options
- **Analytics**: Generate reports and dashboards
- **Performance**: Server-side aggregation is faster than client-side processing
- **Elasticsearch Compatibility**: Familiar API for developers
## TantivyEx.Aggregation Module
**New in v0.2.0:** The `TantivyEx.Aggregation` module provides comprehensive aggregation functionality with an Elasticsearch-compatible API.
### Core Functions
#### Basic Aggregation Operations
```elixir
# Run aggregations on search results
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Combine search with aggregations
{:ok, search_results, agg_results} = TantivyEx.Aggregation.search_with_aggregations(
searcher,
query,
aggregations,
search_limit
)
```
#### Basic Helper Usage
```elixir
# Build terms aggregation
terms_agg = TantivyEx.Aggregation.terms("category", 10)
# Build histogram aggregation
histogram_agg = TantivyEx.Aggregation.histogram("price", 50.0)
# Build metric aggregations
avg_agg = TantivyEx.Aggregation.avg("price")
stats_agg = TantivyEx.Aggregation.stats("price")
```
#### Request Building
```elixir
# Build complex aggregation requests
request = TantivyEx.Aggregation.build_request([
{"categories", TantivyEx.Aggregation.terms("category", 10)},
{"price_stats", TantivyEx.Aggregation.stats("price")}
])
```
## Bucket Aggregations
Bucket aggregations group documents into buckets based on field values or criteria.
### Terms Aggregation
Groups documents by unique field values.
**Use Cases:**
- Category facets in e-commerce
- Author grouping for articles
- Tag distribution analysis
- Status value counts
**Basic Example:**
```elixir
# Group products by category
aggregations = %{
"categories" => %{
"terms" => %{
"field" => "category",
"size" => 10
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result format:
# %{
# "categories" => %{
# "buckets" => [
# %{"key" => "electronics", "doc_count" => 42},
# %{"key" => "books", "doc_count" => 28},
# %{"key" => "clothing", "doc_count" => 15}
# ]
# }
# }
```
**Advanced Options:**
```elixir
# Terms aggregation with all options
advanced_terms = %{
"popular_tags" => %{
"terms" => %{
"field" => "tags",
"size" => 20, # Number of top buckets to return
"min_doc_count" => 5, # Minimum documents required for bucket
"order" => %{"_count" => "desc"} # Sort by document count (descending)
}
}
}
```
**Helper Function:**
```elixir
# Using the helper function
categories_agg = TantivyEx.Aggregation.terms("category", 10)
# Equivalent to the manual JSON structure above
```
### Histogram Aggregation
Groups numeric values into buckets with fixed intervals.
**Use Cases:**
- Price distribution analysis
- Performance metrics grouping
- Age range analysis
- Score distribution
**Basic Example:**
```elixir
# Price distribution with $50 intervals
aggregations = %{
"price_distribution" => %{
"histogram" => %{
"field" => "price",
"interval" => 50.0
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result format:
# %{
# "price_distribution" => %{
# "buckets" => [
# %{"key" => 0.0, "doc_count" => 15}, # $0-50
# %{"key" => 50.0, "doc_count" => 32}, # $50-100
# %{"key" => 100.0, "doc_count" => 28} # $100-150
# ]
# }
# }
```
**Advanced Options:**
```elixir
# Histogram with range and minimum document count
advanced_histogram = %{
"rating_distribution" => %{
"histogram" => %{
"field" => "rating",
"interval" => 1.0,
"min_doc_count" => 1,
"extended_bounds" => %{
"min" => 1.0,
"max" => 5.0
}
}
}
}
```
**Helper Function:**
```elixir
# Using the helper function
price_hist = TantivyEx.Aggregation.histogram("price", 50.0)
```
### Date Histogram Aggregation
Groups date values into time-based buckets.
**Use Cases:**
- Time-series analysis
- Publication date trends
- Activity monitoring
- Seasonal analysis
**Example:**
```elixir
# Group articles by publication month
aggregations = %{
"articles_over_time" => %{
"date_histogram" => %{
"field" => "published_at",
"calendar_interval" => "month",
"format" => "yyyy-MM"
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result format:
# %{
# "articles_over_time" => %{
# "buckets" => [
# %{"key" => "2024-01", "key_as_string" => "2024-01", "doc_count" => 25},
# %{"key" => "2024-02", "key_as_string" => "2024-02", "doc_count" => 18}
# ]
# }
# }
```
**Calendar Intervals:**
- `"second"`, `"minute"`, `"hour"`
- `"day"`, `"week"`, `"month"`, `"quarter"`, `"year"`
### Range Aggregation
Groups documents into custom value ranges.
**Use Cases:**
- Price range facets
- Age group analysis
- Performance tier classification
- Custom score ranges
**Example:**
```elixir
# Group products by price ranges
aggregations = %{
"price_ranges" => %{
"range" => %{
"field" => "price",
"ranges" => [
%{"to" => 50.0, "key" => "budget"},
%{"from" => 50.0, "to" => 200.0, "key" => "mid_range"},
%{"from" => 200.0, "key" => "premium"}
]
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result format:
# %{
# "price_ranges" => %{
# "buckets" => [
# %{"key" => "budget", "to" => 50.0, "doc_count" => 42},
# %{"key" => "mid_range", "from" => 50.0, "to" => 200.0, "doc_count" => 28},
# %{"key" => "premium", "from" => 200.0, "doc_count" => 8}
# ]
# }
# }
```
**Tuple Format (Alternative):**
```elixir
# Range aggregation using tuple format
aggregations = %{
"score_ranges" => %{
"range" => %{
"field" => "score",
"ranges" => [
{nil, 3.0}, # score < 3.0
{3.0, 4.0}, # 3.0 <= score < 4.0
{4.0, nil} # score >= 4.0
]
}
}
}
```
## Metric Aggregations
Metric aggregations calculate statistical values across document sets.
### Average Aggregation
Calculates the average value of a numeric field.
**Example:**
```elixir
aggregations = %{
"average_price" => %{
"avg" => %{
"field" => "price"
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result: %{"average_price" => %{"value" => 129.99}}
```
### Min/Max Aggregations
Find minimum and maximum values.
**Example:**
```elixir
aggregations = %{
"min_price" => %{"min" => %{"field" => "price"}},
"max_price" => %{"max" => %{"field" => "price"}}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result:
# %{
# "min_price" => %{"value" => 9.99},
# "max_price" => %{"value" => 999.99}
# }
```
### Sum Aggregation
Calculates the sum of numeric field values.
**Example:**
```elixir
aggregations = %{
"total_sales" => %{
"sum" => %{
"field" => "sales_amount"
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result: %{"total_sales" => %{"value" => 45628.50}}
```
### Count Aggregation
Counts documents (value count aggregation).
**Example:**
```elixir
aggregations = %{
"product_count" => %{
"value_count" => %{
"field" => "product_id"
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result: %{"product_count" => %{"value" => 1250}}
```
### Stats Aggregation
Calculates multiple statistics in one aggregation.
**Example:**
```elixir
aggregations = %{
"price_stats" => %{
"stats" => %{
"field" => "price"
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result:
# %{
# "price_stats" => %{
# "count" => 1000,
# "min" => 9.99,
# "max" => 999.99,
# "avg" => 129.45,
# "sum" => 129450.00
# }
# }
```
### Percentiles Aggregation
Calculates percentile values for statistical analysis.
**Example:**
```elixir
aggregations = %{
"response_time_percentiles" => %{
"percentiles" => %{
"field" => "response_time",
"percents" => [50, 95, 99]
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result:
# %{
# "response_time_percentiles" => %{
# "values" => %{
# "50.0" => 125.0,
# "95.0" => 450.0,
# "99.0" => 750.0
# }
# }
# }
```
## Nested Aggregations
Combine bucket and metric aggregations for powerful data analysis.
### Terms with Metrics
Calculate statistics for each bucket in a terms aggregation.
**Example:**
```elixir
# Average price per category
aggregations = %{
"categories" => %{
"terms" => %{
"field" => "category",
"size" => 10
},
"aggs" => %{
"avg_price" => %{
"avg" => %{"field" => "price"}
},
"price_stats" => %{
"stats" => %{"field" => "price"}
}
}
}
}
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
# Result:
# %{
# "categories" => %{
# "buckets" => [
# %{
# "key" => "electronics",
# "doc_count" => 42,
# "avg_price" => %{"value" => 299.99},
# "price_stats" => %{
# "count" => 42,
# "min" => 49.99,
# "max" => 999.99,
# "avg" => 299.99,
# "sum" => 12599.58
# }
# }
# ]
# }
# }
```
### Histogram with Sub-aggregations
Analyze data distribution with detailed metrics per bucket.
**Example:**
```elixir
# Price distribution with category breakdown
aggregations = %{
"price_histogram" => %{
"histogram" => %{
"field" => "price",
"interval" => 100.0
},
"aggs" => %{
"categories" => %{
"terms" => %{
"field" => "category",
"size" => 5
}
},
"avg_rating" => %{
"avg" => %{"field" => "rating"}
}
}
}
}
```
### Multi-Level Nesting
Create complex hierarchical aggregations.
**Example:**
```elixir
# Category → Brand → Price Statistics
aggregations = %{
"categories" => %{
"terms" => %{
"field" => "category",
"size" => 10
},
"aggs" => %{
"brands" => %{
"terms" => %{
"field" => "brand",
"size" => 5
},
"aggs" => %{
"price_stats" => %{
"stats" => %{"field" => "price"}
},
"rating_avg" => %{
"avg" => %{"field" => "rating"}
}
}
}
}
}
}
```
## Advanced Features
### Memory Management
TantivyEx provides built-in memory limits and optimizations for large aggregations.
```elixir
# The aggregation system automatically manages memory usage
# and applies limits to prevent excessive memory consumption
# For very large datasets, consider:
# 1. Using smaller "size" parameters in terms aggregations
# 2. Adding "min_doc_count" filters to reduce bucket count
# 3. Using range aggregations instead of histograms for very large ranges
```
### Error Handling
Comprehensive validation ensures aggregation requests are correct.
```elixir
# Invalid aggregation request
invalid_agg = %{
"bad_terms" => %{
"terms" => %{
# Missing required "field" parameter
"size" => 10
}
}
}
case TantivyEx.Aggregation.run(searcher, query, invalid_agg) do
{:ok, results} ->
IO.inspect(results)
{:error, reason} ->
IO.puts("Aggregation error: #{reason}")
# "Field parameter is required for terms aggregation"
end
```
### Request Validation
All aggregation requests are validated before execution.
```elixir
# Validation catches common issues:
# - Missing required fields
# - Invalid field names
# - Malformed range specifications
# - Incorrect data types
# - Unsupported aggregation types
```
## Elasticsearch Compatibility
TantivyEx aggregations are designed to be compatible with Elasticsearch aggregation syntax.
### Request Format
```elixir
# TantivyEx format (matches Elasticsearch)
elasticsearch_format = %{
"aggs" => %{
"categories" => %{
"terms" => %{
"field" => "category",
"size" => 10
}
},
"price_histogram" => %{
"histogram" => %{
"field" => "price",
"interval" => 50
}
}
}
}
# Also accepts "aggregations" key
alternative_format = %{
"aggregations" => %{
# same structure
}
}
```
### Response Format
```elixir
# Response format matches Elasticsearch structure
response = %{
"categories" => %{
"buckets" => [
%{"key" => "electronics", "doc_count" => 42}
]
},
"price_histogram" => %{
"buckets" => [
%{"key" => 0.0, "doc_count" => 15}
]
}
}
```
### Migration from Elasticsearch
Most Elasticsearch aggregation queries work directly with TantivyEx:
```elixir
# Direct migration example
elasticsearch_query = %{
"aggs" => %{
"status_counts" => %{
"terms" => %{
"field" => "status",
"size" => 10
}
}
}
}
# Works directly with TantivyEx
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, elasticsearch_query)
```
## Performance Optimization
### Schema Design for Aggregations
Design your schema with aggregations in mind:
```elixir
# Use appropriate field options for aggregation fields
schema = TantivyEx.Schema.new()
|> TantivyEx.Schema.add_text_field("title", :text_stored)
|> TantivyEx.Schema.add_text_field("category", :text) # For terms aggregation
|> TantivyEx.Schema.add_f64_field("price", :fast) # :fast for efficient aggregations
|> TantivyEx.Schema.add_u64_field("rating", :fast) # :fast for numeric aggregations
|> TantivyEx.Schema.add_date_field("created_at", :fast) # :fast for date histograms
```
### Aggregation Best Practices
1. **Use Fast Fields**: Add `:fast` option to fields used in aggregations
2. **Limit Bucket Count**: Use reasonable `size` parameters in terms aggregations
3. **Filter Early**: Apply filters before aggregations to reduce data volume
4. **Batch Operations**: Combine multiple aggregations in single request
5. **Index Design**: Consider field cardinality when designing aggregations
### Memory Optimization
```elixir
# Optimize memory usage with smart limits
optimized_agg = %{
"categories" => %{
"terms" => %{
"field" => "category",
"size" => 50, # Reasonable limit
"min_doc_count" => 5 # Filter low-count buckets
}
}
}
# Use range aggregations for high-cardinality fields
range_agg = %{
"price_ranges" => %{
"range" => %{
"field" => "price",
"ranges" => [
%{"to" => 100}, %{"from" => 100, "to" => 500}, %{"from" => 500}
]
}
}
}
```
## Real-world Examples
### E-commerce Product Analytics
```elixir
defmodule EcommerceAnalytics do
alias TantivyEx.Aggregation
def product_analytics(searcher, query) do
aggregations = %{
# Category distribution
"categories" => %{
"terms" => %{
"field" => "category",
"size" => 20
},
"aggs" => %{
"avg_price" => %{"avg" => %{"field" => "price"}},
"avg_rating" => %{"avg" => %{"field" => "rating"}}
}
},
# Price distribution
"price_histogram" => %{
"histogram" => %{
"field" => "price",
"interval" => 25.0
}
},
# Rating distribution
"rating_distribution" => %{
"terms" => %{
"field" => "rating",
"size" => 10
}
},
# Price ranges
"price_ranges" => %{
"range" => %{
"field" => "price",
"ranges" => [
%{"to" => 25.0, "key" => "budget"},
%{"from" => 25.0, "to" => 100.0, "key" => "mid_range"},
%{"from" => 100.0, "to" => 500.0, "key" => "premium"},
%{"from" => 500.0, "key" => "luxury"}
]
}
},
# Overall statistics
"price_stats" => %{
"stats" => %{"field" => "price"}
}
}
case Aggregation.run(searcher, query, aggregations) do
{:ok, results} ->
%{
category_breakdown: results["categories"]["buckets"],
price_distribution: results["price_histogram"]["buckets"],
rating_counts: results["rating_distribution"]["buckets"],
price_ranges: results["price_ranges"]["buckets"],
price_statistics: results["price_stats"]
}
{:error, reason} ->
{:error, "Analytics failed: #{reason}"}
end
end
end
```
### Blog Content Analysis
```elixir
defmodule BlogAnalytics do
alias TantivyEx.Aggregation
def content_insights(searcher, query) do
aggregations = %{
# Popular authors
"top_authors" => %{
"terms" => %{
"field" => "author",
"size" => 10
},
"aggs" => %{
"avg_views" => %{"avg" => %{"field" => "view_count"}},
"total_articles" => %{"value_count" => %{"field" => "article_id"}}
}
},
# Publication timeline
"publication_timeline" => %{
"date_histogram" => %{
"field" => "published_at",
"calendar_interval" => "month",
"format" => "yyyy-MM"
}
},
# Popular tags
"popular_tags" => %{
"terms" => %{
"field" => "tags",
"size" => 20,
"min_doc_count" => 3
}
},
# Reading time distribution
"reading_time_ranges" => %{
"range" => %{
"field" => "reading_time_minutes",
"ranges" => [
%{"to" => 3, "key" => "quick_read"},
%{"from" => 3, "to" => 10, "key" => "medium_read"},
%{"from" => 10, "key" => "long_read"}
]
}
}
}
Aggregation.run(searcher, query, aggregations)
end
end
```
### User Activity Analysis
```elixir
defmodule UserActivityAnalytics do
alias TantivyEx.Aggregation
def activity_report(searcher, query) do
aggregations = %{
# Activity by hour
"hourly_activity" => %{
"date_histogram" => %{
"field" => "timestamp",
"calendar_interval" => "hour",
"format" => "HH"
}
},
# Activity by day of week
"daily_activity" => %{
"date_histogram" => %{
"field" => "timestamp",
"calendar_interval" => "day",
"format" => "EEEE"
}
},
# Action types
"action_types" => %{
"terms" => %{
"field" => "action_type",
"size" => 15
}
},
# User agent distribution
"browsers" => %{
"terms" => %{
"field" => "browser",
"size" => 10
}
},
# Session duration ranges
"session_duration" => %{
"histogram" => %{
"field" => "session_duration_seconds",
"interval" => 300 # 5-minute intervals
}
}
}
Aggregation.run(searcher, query, aggregations)
end
end
```
## Aggregation Helpers
TantivyEx provides helper functions to simplify aggregation creation.
### Helper Functions
```elixir
# Terms aggregation helper
terms_agg = TantivyEx.Aggregation.terms("category", 10)
# Creates: %{"terms" => %{"field" => "category", "size" => 10}}
# Histogram aggregation helper
histogram_agg = TantivyEx.Aggregation.histogram("price", 50.0)
# Creates: %{"histogram" => %{"field" => "price", "interval" => 50.0}}
# Metric aggregation helpers
avg_agg = TantivyEx.Aggregation.avg("price")
min_agg = TantivyEx.Aggregation.min("price")
max_agg = TantivyEx.Aggregation.max("price")
sum_agg = TantivyEx.Aggregation.sum("sales")
stats_agg = TantivyEx.Aggregation.stats("performance")
percentiles_agg = TantivyEx.Aggregation.percentiles("response_time", [50, 95, 99])
```
### Building Complex Requests
```elixir
# Build request using helpers
request = TantivyEx.Aggregation.build_request([
{"categories", TantivyEx.Aggregation.terms("category", 10)},
{"price_stats", TantivyEx.Aggregation.stats("price")},
{"rating_histogram", TantivyEx.Aggregation.histogram("rating", 1.0)}
])
# Add nested aggregations
nested_request = TantivyEx.Aggregation.build_request([
{"categories",
TantivyEx.Aggregation.terms("category", 10)
|> TantivyEx.Aggregation.add_sub_aggregation("avg_price", TantivyEx.Aggregation.avg("price"))
|> TantivyEx.Aggregation.add_sub_aggregation("top_brands", TantivyEx.Aggregation.terms("brand", 5))
}
])
```
### Validation Helpers
```elixir
# Validate aggregation requests before execution
case TantivyEx.Aggregation.validate_request(aggregations) do
:ok ->
{:ok, results} = TantivyEx.Aggregation.run(searcher, query, aggregations)
{:error, errors} ->
IO.puts("Validation failed: #{inspect(errors)}")
end
```
## Error Handling
### Common Errors and Solutions
#### Field Not Found
```elixir
# Error: Field 'non_existent_field' not found in schema
aggregations = %{
"bad_agg" => %{
"terms" => %{
"field" => "non_existent_field"
}
}
}
# Solution: Check field names in schema
field_names = TantivyEx.Schema.get_field_names(schema)
IO.inspect(field_names)
```
#### Invalid Aggregation Type
```elixir
# Error: Unknown aggregation type 'invalid_type'
aggregations = %{
"bad_agg" => %{
"invalid_type" => %{
"field" => "category"
}
}
}
# Solution: Use supported aggregation types
# Supported: terms, histogram, date_histogram, range, avg, min, max, sum, count, stats, percentiles
```
#### Malformed Request
```elixir
# Error: Missing required field parameter
aggregations = %{
"incomplete_agg" => %{
"terms" => %{
"size" => 10 # Missing "field" parameter
}
}
}
# Solution: Include all required parameters
correct_agg = %{
"complete_agg" => %{
"terms" => %{
"field" => "category",
"size" => 10
}
}
}
```
### Error Handling Best Practices
```elixir
defmodule SafeAggregations do
alias TantivyEx.Aggregation
def safe_run(searcher, query, aggregations) do
# Validate request first
case Aggregation.validate_request(aggregations) do
:ok ->
# Run aggregation
case Aggregation.run(searcher, query, aggregations) do
{:ok, results} ->
{:ok, results}
{:error, reason} ->
Logger.error("Aggregation execution failed: #{reason}")
{:error, :execution_failed}
end
{:error, validation_errors} ->
Logger.error("Aggregation validation failed: #{inspect(validation_errors)}")
{:error, :validation_failed}
end
end
def with_fallback(searcher, query, primary_agg, fallback_agg) do
case safe_run(searcher, query, primary_agg) do
{:ok, results} -> {:ok, results}
{:error, _} -> safe_run(searcher, query, fallback_agg)
end
end
end
```
## Troubleshooting
### Performance Issues
**Problem**: Aggregations are slow or use too much memory.
**Solutions**:
1. Use `:fast` field options for aggregation fields
2. Reduce `size` parameters in terms aggregations
3. Add `min_doc_count` filters to reduce bucket count
4. Use range aggregations instead of histograms for high-cardinality fields
5. Apply filters before aggregations to reduce data volume
```elixir
# Before: Slow aggregation
slow_agg = %{
"all_users" => %{
"terms" => %{
"field" => "user_id", # High cardinality field
"size" => 10000 # Too large
}
}
}
# After: Optimized aggregation
fast_agg = %{
"active_users" => %{
"terms" => %{
"field" => "user_id",
"size" => 100, # Reasonable size
"min_doc_count" => 5 # Filter low activity users
}
}
}
```
### Memory Issues
**Problem**: Out of memory errors during aggregation.
**Solutions**:
1. Reduce aggregation complexity
2. Use smaller bucket limits
3. Filter data before aggregation
4. Use range aggregations for high-cardinality data
```elixir
# Memory-efficient aggregation design
memory_friendly = %{
"price_ranges" => %{
"range" => %{
"field" => "price",
"ranges" => [
%{"to" => 50}, %{"from" => 50, "to" => 200}, %{"from" => 200}
]
}
}
}
```
### Data Type Issues
**Problem**: Aggregation fails with data type errors.
**Solutions**:
1. Ensure field types match aggregation requirements
2. Use text fields for terms aggregations
3. Use numeric fields for histogram/range aggregations
4. Check schema field definitions
```elixir
# Check field types before aggregation
def check_field_type(schema, field_name) do
case TantivyEx.Schema.get_field_type(schema, field_name) do
{:ok, field_type} ->
IO.puts("Field #{field_name} is type: #{field_type}")
{:error, _} ->
IO.puts("Field #{field_name} not found")
end
end
```
### Common Patterns
#### Debugging Aggregations
```elixir
defmodule AggregationDebugger do
def debug_aggregation(searcher, query, aggregations) do
IO.puts("=== Aggregation Debug ===")
IO.puts("Query: #{inspect(query)}")
IO.puts("Aggregations: #{inspect(aggregations, pretty: true)}")
case TantivyEx.Aggregation.run(searcher, query, aggregations) do
{:ok, results} ->
IO.puts("Success!")
IO.puts("Results: #{inspect(results, pretty: true)}")
{:ok, results}
{:error, reason} ->
IO.puts("Error: #{reason}")
{:error, reason}
end
end
end
```
#### Progressive Aggregation Building
```elixir
defmodule ProgressiveAggregations do
def build_step_by_step(searcher, query) do
# Start simple
simple_agg = %{"count" => %{"value_count" => %{"field" => "id"}}}
case TantivyEx.Aggregation.run(searcher, query, simple_agg) do
{:ok, _} ->
# Add complexity gradually
add_terms_aggregation(searcher, query)
{:error, reason} ->
{:error, "Failed at basic aggregation: #{reason}"}
end
end
defp add_terms_aggregation(searcher, query) do
terms_agg = %{
"count" => %{"value_count" => %{"field" => "id"}},
"categories" => %{"terms" => %{"field" => "category", "size" => 5}}
}
case TantivyEx.Aggregation.run(searcher, query, terms_agg) do
{:ok, results} -> {:ok, results}
{:error, reason} -> {:error, "Failed at terms aggregation: #{reason}"}
end
end
end
```
---
**Ready to analyze your data?** Start with simple aggregations and gradually build complexity as you understand your data patterns! 📊