# Recursive Language Model (RLM) Patterns
This guide covers implementing RLM patterns in ptc_runner for processing large datasets that exceed practical context limits.
## What is RLM?
Recursive Language Models ([arXiv:2512.24601](https://arxiv.org/abs/2512.24601)) is an approach where the LLM acts as an **orchestrator** rather than a processor. Instead of feeding massive data into a single prompt, the model writes code that:
1. **Chunks** large data into manageable pieces
2. **Fans out** work to parallel sub-agents
3. **Aggregates** results into a final answer
The key insight: put bulk context in the data namespace (`data/`), let the model manipulate it via code, and use sub-LLM calls strategically.
## The Problem: Context Limitations
Traditional approaches fail with large datasets:
| Approach | Problem |
|----------|---------|
| Stuff everything in prompt | Attention dilution, cost explosion |
| Sequential chunk processing | Slow, loses cross-chunk patterns |
| RAG retrieval | Only sees "relevant" snippets, misses structure |
## The Solution: RLM with PTC-Lisp
ptc_runner is well-suited for RLM because:
- **Programmatic data access**: Large data can stay in runtime context and be summarized by PTC-Lisp programs
- **Native parallelism**: `pmap` spawns concurrent BEAM processes
- **Pre-chunking**: `PtcRunner.Chunker` handles chunking in Elixir (recommended)
- **Budget introspection**: `(budget/remaining)` enables adaptive strategies
- **Recursive agents**: Agents can call themselves for divide-and-conquer
## Basic Pattern: Chunk-Map-Aggregate
The simplest RLM pattern pre-chunks data in Elixir using `PtcRunner.Chunker`:
```elixir
alias PtcRunner.{SubAgent, Chunker}
# 1. Pre-chunk in Elixir with overlap (recommended)
corpus = File.read!("logs/production.log")
chunks = Chunker.by_tokens(corpus, 4000, overlap: 200)
# 2. Define a simple worker agent
worker = SubAgent.new(
prompt: """
Analyze the log chunk in data/chunk for CRITICAL or ERROR incidents.
Return a list of incident descriptions found.
""",
signature: "(chunk :string) -> {incidents [:string]}",
max_turns: 3,
llm: :haiku
)
# 3. Define the orchestrator (no chunking logic needed)
orchestrator = SubAgent.new(
prompt: """
Process the pre-chunked logs in data/chunks.
Use pmap with 'analyze' tool to process all chunks in parallel.
Aggregate and return total count and first 10 unique incidents.
""",
signature: "(chunks [:string]) -> {total :int, incidents [:string]}",
tools: %{"analyze" => SubAgent.as_tool(worker)},
max_turns: 5,
llm: :sonnet
)
# 4. Run with pre-chunked data and budget control
{:ok, step} = SubAgent.run(orchestrator,
context: %{"chunks" => chunks},
token_limit: 100_000,
on_budget_exceeded: :return_partial
)
```
The orchestrator generates simple PTC-Lisp (no chunking logic):
```clojure
(let [results (pmap #(tool/analyze {:chunk %}) data/chunks)
all-incidents (flatten (map :incidents results))
unique (distinct all-incidents)]
(return {:total (count unique)
:incidents (take 10 unique)}))
```
## Recursive Pattern: Self-Subdividing Agents
For hierarchical decomposition, use the `:self` sentinel in the tools map:
```elixir
analyzer = SubAgent.new(
prompt: """
Analyze the data chunk in data/chunk.
If the chunk is small (< 1000 lines), analyze directly.
If large, subdivide into smaller chunks and use the 'worker' tool recursively.
Aggregate child results before returning.
""",
signature: "(chunk :string) -> {findings [:string]}",
tools: %{"worker" => :self}, # Self-recursion via :self sentinel
max_depth: 3,
max_turns: 5,
llm: :haiku
)
{:ok, step} = SubAgent.run(analyzer,
context: %{"chunk" => large_data},
llm_registry: registry
)
```
The agent decides dynamically whether to process or subdivide:
```clojure
(let [lines (split-lines data/chunk)
n (count lines)]
(if (< n 1000)
;; Base case: analyze directly
(return {:findings (analyze-for-patterns lines)})
;; Recursive case: subdivide
(let [halves (partition (/ n 2) lines)
results (pmap #(tool/worker {:chunk (join "\n" %)}) halves)]
(return {:findings (flatten (map :findings results))}))))
```
## Budget-Aware Orchestration
Agents can query remaining budget via `(budget/remaining)`:
```clojure
(budget/remaining)
;; => {:turns 15
;; "work-turns" 10
;; "retry-turns" 5
;; :depth {:current 1 :max 3}
;; :tokens {:input 5000 :output 2000 :total 7000}
;; "llm-requests" 3}
```
Use this to make smart decisions about parallelization:
```clojure
(let [b (budget/remaining)
chunk-count (count chunks)]
(if (> chunk-count (:turns b))
;; Not enough budget for all chunks, batch them
(let [batch-size (max 1 (/ chunk-count (:turns b)))
batches (partition batch-size chunks)]
(pmap #(tool/analyze-batch {:chunks %}) batches))
;; Enough budget, process individually
(pmap #(tool/analyze {:chunk %}) chunks)))
```
For recursive agents, check depth before subdividing:
```clojure
(let [b (budget/remaining)
at-max-depth? (>= (get-in b [:depth :current])
(dec (get-in b [:depth :max])))]
(if at-max-depth?
(analyze-directly data/chunk)
(subdivide-and-recurse data/chunk)))
```
## Chunking Strategies
### Pre-Chunking in Elixir (Recommended)
Use `PtcRunner.Chunker` to chunk data before passing to the agent. Token-based chunking with overlap is safest for production:
```elixir
alias PtcRunner.Chunker
# Token-based with overlap (recommended for production)
# - Handles variable line lengths (JSON blobs, stack traces)
# - Overlap ensures boundary incidents aren't split
chunks = Chunker.by_tokens(corpus, 4000, overlap: 200)
# Line-based (simpler, fine if line lengths are predictable)
chunks = Chunker.by_lines(corpus, 2000)
# Line-based with overlap
chunks = Chunker.by_lines(corpus, 2000, overlap: 100)
SubAgent.run(orchestrator,
context: %{"chunks" => chunks}
)
```
The agent then processes pre-chunked data directly:
```clojure
(pmap #(tool/analyze {:chunk %}) data/chunks)
```
This approach is simpler and more reliable than having the LLM generate chunking code.
### LLM-Generated Chunking (Alternative)
For dynamic chunking where the LLM decides how to split:
#### By Line Count
```clojure
(let [lines (split-lines data/corpus)
chunks (partition 2000 lines)] ; 2000 lines per chunk
...)
```
#### By Delimiter
```clojure
(let [sections (split data/corpus #"---\n") ; Split on a regex delimiter
chunks (partition 5 sections)] ; Group 5 sections per chunk
...)
```
## Model Selection Strategy
| Role | Model | Rationale |
|------|-------|-----------|
| Orchestrator | Sonnet/Opus | Needs reasoning for strategy |
| Chunk workers | Haiku | Fast, cheap, parallelizable |
| Aggregator | Sonnet | Synthesis requires intelligence |
```elixir
# Workers use haiku (bound at tool creation)
worker_tool = SubAgent.as_tool(worker, llm: :haiku)
# Orchestrator uses sonnet (at runtime)
SubAgent.run(orchestrator, llm: :sonnet, tools: %{"worker" => worker_tool})
```
## Budget Enforcement
For operator-level cost control, use `token_limit` or a custom `budget` callback:
```elixir
# Simple token limit
SubAgent.run(orchestrator,
llm: llm,
token_limit: 100_000,
on_budget_exceeded: :return_partial # or :fail (default)
)
# Custom callback for fine-grained control
SubAgent.run(orchestrator,
llm: llm,
budget: fn usage ->
cond do
usage.total_tokens > 100_000 -> :stop
usage.llm_requests > 50 -> :stop
true -> :continue
end
end
)
```
The callback receives `%{total_tokens, input_tokens, output_tokens, llm_requests}`.
## Comparison with Alternatives
| Feature | Standard RAG | Long Context | RLM (ptc_runner) |
|---------|--------------|--------------|------------------|
| Data scope | Retrieved snippets | Everything in prompt | Everything via code |
| Logic | Fixed retrieval | Probabilistic | Orchestrated map-reduce |
| Parallelism | None | None | Native (`pmap`) |
| Cost | Low | Very high | Medium (structured) |
| Cross-chunk patterns | Poor | Good (but diluted) | Good (aggregation) |
## Best Practices
1. **Pre-chunk in Elixir**: Use `PtcRunner.Chunker` instead of LLM-generated chunking for reliability.
2. **Size chunks appropriately**: 1000-3000 lines is typical. Too small = overhead, too large = attention issues.
3. **Use fast models for workers**: Haiku processes chunks; Sonnet orchestrates.
4. **Set budget limits**: Use `token_limit` to control costs in production.
5. **Set depth limits**: Recursive agents should have `max_depth: 3` or less to prevent runaway recursion.
6. **Use `memory_strategy: :rollback`**: Recursive agents that build large intermediate results can exceed memory limits. With `:rollback`, the memory is reverted to pre-turn state and the error is fed back to the LLM, giving it a chance to try a different approach (e.g., smaller batches or deeper recursion).
7. **Monitor with telemetry**: Track `llm_requests` and `duration_ms` to tune chunk sizes.
## Production Considerations
### Boundary Handling with Overlap
Multi-line incidents (stack traces, JSON blobs) can be split across chunk boundaries. Use overlap to ensure nothing is missed:
```elixir
# 200 tokens of overlap ensures incidents at boundaries are seen by both chunks
chunks = Chunker.by_tokens(corpus, 4000, overlap: 200)
```
The worker may report the same incident twice, but the final `distinct` handles deduplication.
### Token-based vs Line-based Chunking
Line-based chunking (`by_lines`) is intuitive but risky - a single JSON log line could be 10KB. Token-based chunking (`by_tokens`) ensures workers never hit context limits:
```elixir
# Safer for logs with variable line lengths
chunks = Chunker.by_tokens(corpus, 4000)
# vs. line-based (fine if line lengths are predictable)
chunks = Chunker.by_lines(corpus, 2000)
```
### Worker Failure Handling
Currently, if one worker in `pmap` fails, the entire operation fails. For fault-tolerant RLM:
**Option 1: Prompt engineering** - Instruct the planner to handle partial results:
```
"If some worker calls fail, proceed with available results and note which chunks failed."
```
**Option 2: Defensive Lisp** - Wrap worker calls in error handling (future library feature).
For most use cases, fail-fast is acceptable. For mission-critical RLM over unreliable data, consider pre-validating chunks in Elixir.
### Aggregation Patterns
For simple aggregation (flatten, distinct, take), inline Lisp is fine:
```clojure
(let [all (flatten (map :incidents results))]
(return {:total (count (distinct all))
:incidents (take 10 (distinct all))}))
```
For complex aggregation (merging time-series, conflict resolution, weighted scoring), consider:
- A dedicated Aggregator agent with its own prompt
- Pre-processing in Elixir before returning to the user
## Example: Log Analysis
See `examples/parallel_workers/` for a complete working example that:
- Generates a 10k+ line test corpus with hidden incidents
- Uses Sonnet as planner, Haiku as workers
- Demonstrates parallel chunk processing
- Aggregates findings into a final report
```bash
# Generate test data
mix run examples/parallel_workers/gen_data.exs
# Run the parallel workers workflow
mix run examples/parallel_workers/run.exs
```
## See Also
- [Composition Patterns](subagent-patterns.md) - SubAgents as tools, orchestration
- [Core Concepts](subagent-concepts.md) - Context and memory
- [PTC-Lisp Specification](../ptc-lisp-specification.md) - `pmap`, `partition`, etc.
- [Observability](subagent-observability.md) - Tracking parallel execution