# em_filter
[](https://hex.pm/packages/em_filter)
[](https://hexdocs.pm/em_filter)
[](LICENSE.md)
An Erlang library for building Emergence filters and agents connected to an `em_disco` discovery service.
## Features
- Connects your filter or agent to `em_disco` over a persistent WebSocket
- Automatically registers on startup and reconnects on failure
- Optionally announces agent capabilities to the `em_disco` registry
- Optionally enables per-agent memory (ETS) passed across queries
- Full set of HTML scraping utilities included
## Concepts
**Filter** — stateless node. Receives a query, returns results, remembers nothing. This is the 1.0.0 behaviour, unchanged.
**Agent** — extends a filter with two optional features:
- **Capabilities** — a list of strings (`<<"summarize">>`, `<<"llm">>`, …) announced to `em_disco` at startup via `agent_hello`. The Queen agent reads `GET /registry` to discover them.
- **Memory** — a persistent map passed to `handle/2` on every query and updated with the returned value. Backed by a local ETS table.
A filter started without capabilities or memory is identical to a 1.0.0 filter.
## Installation
Add to your `rebar.config`:
```erlang
{deps, [
{em_filter, "1.1.0"}
]}.
```
## Usage
### Plain filter (unchanged from 1.0.0)
```erlang
-module(my_filter).
-export([handle/1]).
handle(Body) ->
%% fetch, scrape, compute — return a JSON-encodable term
json:encode(#{<<"embryo_list">> => []}).
```
```erlang
em_filter:start_filter(my_filter, my_filter).
```
### Agent with capabilities (no memory)
Announces itself to `em_disco` so the Queen can find it via `GET /registry`.
`handle/1` is still used — behaviour is stateless.
```erlang
em_filter:start_agent(my_agent, my_handler, #{
capabilities => [<<"summarize">>, <<"llm">>]
}).
```
### Agent with memory
`handle/2` receives the current memory map and must return `{Result, NewMemory}`.
The updated memory is stored in a local ETS table and passed on the next query.
```erlang
-module(my_agent).
-export([handle/2]).
handle(Body, Memory) ->
Seen = maps:get(seen, Memory, []),
Result = do_work(Body, Seen),
NewMem = Memory#{seen => [Body | Seen]},
{json:encode(Result), NewMem}.
```
```erlang
em_filter:start_agent(my_agent, my_agent, #{
capabilities => [<<"summarize">>],
memory => ets
}).
```
## Configuration
The `em_disco` address is resolved in this order:
1. Environment variables `EM_DISCO_HOST` / `EM_DISCO_PORT`
2. `~/.config/emergence/emergence.conf` (Linux/macOS) or `%APPDATA%\emergence\emergence.conf` (Windows)
3. Defaults: `localhost:8080`
`emergence.conf` example:
```ini
[em_disco]
host = 192.168.1.10
port = 8080
```
## HTML utilities
The following helpers are available for filters that scrape HTML:
| Function | Description |
|---|---|
| `strip_scripts/1` | Removes `<script>` tags |
| `extract_elements/2` | CSS-style element extraction |
| `get_text/1` | Strips all HTML tags |
| `extract_attribute/2` | Extracts a tag attribute value |
| `clean_text/3` | Strips noise and decodes entities |
| `decode_html_entities/1` | Decodes `&`, `&#x…;`, `&#…;` |
| `should_skip_link/2` | Filters out unwanted URLs |
## License
Apache 2.0 — see [LICENSE.md](LICENSE.md).