# em_filter
[](https://hex.pm/packages/em_filter)
[](https://hexdocs.pm/em_filter)
[](LICENSE.md)
An Erlang library for building Emergence agents connected to an `em_disco` discovery service.
## Features
- Connects your agent to one or more `em_disco` nodes over persistent WebSockets
- Automatically registers on startup and reconnects on failure
- Announces agent capabilities to the `em_disco` registry via `agent_hello`
- Optional persistent memory (ETS) passed across queries
- Full set of HTML scraping utilities included
## Concepts
Every node in the Emergence system is an **agent**. An agent has two optional features:
- **Capabilities** — a list of strings (`<<"rss">>`, `<<"dns">>`, …) announced to `em_disco` at startup. Used by disco to route queries to relevant agents only.
- **Memory** — a map passed to `handle/2` on every query and updated with the returned value.
- `ram` (default): lives in the process state, resets to `#{}` on restart.
- `ets`: persisted in a local ETS table, survives worker restarts within the same BEAM session.
Memory is best used for caching expensive operations (HTTP responses, DNS lookups, rate limit state).
**Do not use memory to deduplicate results** — deduplication is handled upstream by the Emquest pipeline.
### Handler contract
Every handler module must export `handle/2`:
```erlang
handle(Body :: binary(), Memory :: map()) ->
{Result :: term(), NewMemory :: map()}
```
`Body` is the raw JSON query binary. `Result` is typically a list of embryo maps.
Returning the same map as `NewMemory` is valid for stateless behaviour.
### Embryo format
Agents return a list of embryo maps:
```erlang
#{
<<"type">> => <<"rss">>, %% agent-defined type
<<"properties">> => #{
<<"url">> => <<"https://...">>,
<<"title">> => <<"...">>,
<<"resume">> => <<"...">>
}
}
```
## Installation
Add to your `rebar.config`:
```erlang
{deps, [
{em_filter, "1.2.0"}
]}.
```
## Usage
### Stateless agent
Announces capabilities but does not persist state between queries.
```erlang
em_filter:start_agent(my_agent, my_handler, #{
capabilities => [<<"search">>, <<"web">>]
}).
```
```erlang
-module(my_handler).
-export([handle/2]).
handle(Body, Memory) ->
Results = do_search(Body),
{Results, Memory}.
```
### Agent with memory (cache)
Memory is useful for caching.
```erlang
-module(my_handler).
-export([handle/2]).
handle(Body, Memory) ->
Cache = maps:get(cache, Memory, #{}),
case maps:get(Body, Cache, undefined) of
undefined ->
Results = fetch_from_api(Body),
NewCache = Cache#{Body => Results},
{Results, Memory#{cache => NewCache}};
Cached ->
{Cached, Memory}
end.
```
```erlang
em_filter:start_agent(my_agent, my_handler, #{
capabilities => [<<"search">>],
memory => ets
}).
```
## Multi-disco connectivity
An agent connects to every disco node listed in `emergence.conf`.
Each node gets its own persistent WebSocket connection and worker process.
```ini
[em_disco]
nodes = localhost:8080, em-disco.roques.me
```
With this config, `start_agent/3` spawns two workers automatically:
- `my_agent_localhost_8080_server` — connected to local disco
- `my_agent_em_disco_roques_me_443_server` — connected to public disco
Port and transport resolution:
- `localhost` / `127.0.0.1` → port 8080, plain TCP (default)
- any other host without port → port 443, TLS (default)
- explicit port 443 → TLS
- any other explicit port → plain TCP
## Configuration
The `em_disco` address is resolved in this order:
1. `[em_disco] nodes` in `emergence.conf` (recommended)
2. `EM_DISCO_HOST` / `EM_DISCO_PORT` environment variables (legacy, single node)
3. Default: `localhost:8080`
`emergence.conf` locations:
- Linux/macOS: `~/.config/emergence/emergence.conf`
- Windows: `%APPDATA%\emergence\emergence.conf`
Full example:
```ini
[em_disco]
nodes = localhost:8080, em-disco.roques.me
```
## HTML utilities
The following helpers are available for agents that scrape HTML:
| Function | Description |
|---|---|
| `strip_scripts/1` | Removes `<script>` tags |
| `extract_elements/2` | CSS-style element extraction |
| `get_text/1` | Strips all HTML tags |
| `extract_attribute/2` | Extracts a tag attribute value |
| `clean_text/3` | Strips noise and decodes entities |
| `decode_html_entities/1` | Decodes `&`, `&#x...;`, `&#...;` |
| `should_skip_link/2` | Filters out unwanted URLs |
## License
Apache 2.0 — see [LICENSE.md](LICENSE.md).