usage-rules/filtering.md

# Complete Filtering Guide

PcapFileEx provides three different filtering systems. This guide explains when and how to use each one.

## Filtering Systems Overview

| Filter Type | Where | Performance | Flexibility | Best For |
|-------------|-------|-------------|-------------|----------|
| **PreFilter** | Rust-side (pre-decode) | ⚡⚡⚡ Fastest (10-100x) | Simple criteria | Large files, selective queries |
| **Filter** | Elixir-side (post-decode) | ⚡ Standard | Very flexible | Complex logic, small files |
| **DisplayFilter** | Elixir-side (post-decode) | ⚡ Standard | Wireshark-style | Familiar syntax |

## Decision Tree: Which Filter to Use?

```
Is file > 100MB?
├─ YES: Is query selective (<10% of packets)?
│   ├─ YES: Is criteria simple (IP/port/protocol)?
│   │   ├─ YES: Use PreFilter ⚡⚡⚡
│   │   └─ NO: Use Filter/DisplayFilter ⚡
│   └─ NO: Use Filter/DisplayFilter ⚡
└─ NO: Is syntax important?
    ├─ Wireshark-style preferred: Use DisplayFilter
    ├─ Function-based preferred: Use Filter
    └─ Simple criteria: Use PreFilter (small benefit)
```

## PreFilter (Rust-Side Filtering)

### Overview

- **Location**: Rust native code
- **Timing**: Before packet decode
- **Performance**: 10-100x faster than Elixir filtering
- **Limitation**: Only simple criteria (IP, port, protocol)

### When to Use PreFilter

✅ **Use PreFilter when:**
- File is large (>100MB)
- You need small subset of packets (<10%)
- Criteria are simple (IP, port, protocol)
- Early termination (take/find)

❌ **Don't use PreFilter when:**
- File is small (<10MB) - overhead not worth it
- Need most packets (>50%)
- Need complex application logic
- Need to check decoded payloads

### Available PreFilter Functions

#### Protocol Filtering

```elixir
# Single protocol
PreFilter.protocol("tcp")
PreFilter.protocol("udp")
PreFilter.protocol("icmp")
PreFilter.protocol("http")

# Multiple protocols (OR)
PreFilter.any([
  PreFilter.protocol("tcp"),
  PreFilter.protocol("udp")
])
```

#### Port Filtering

```elixir
# Destination port
PreFilter.port_dest(80)
PreFilter.port_dest(443)

# Source port
PreFilter.port_source(8080)

# Either source or destination
PreFilter.port(443)

# Multiple ports (OR)
PreFilter.any([
  PreFilter.port_dest(80),
  PreFilter.port_dest(443),
  PreFilter.port_dest(8080)
])
```

#### IP Address Filtering

```elixir
# Source IP (exact)
PreFilter.ip_source("192.168.1.1")

# Destination IP (exact)
PreFilter.ip_dest("10.0.0.1")

# Either source or destination
PreFilter.ip("192.168.1.1")

# CIDR range
PreFilter.ip_source_cidr("192.168.0.0/16")
PreFilter.ip_dest_cidr("10.0.0.0/8")
```

#### Combining Filters

```elixir
# AND semantics (all must match)
PreFilter.all([
  PreFilter.protocol("tcp"),
  PreFilter.port_dest(80)
])
# Packet must be TCP AND destination port 80

# OR semantics (any can match)
PreFilter.any([
  PreFilter.port_dest(80),
  PreFilter.port_dest(443)
])
# Packet can have destination port 80 OR 443

# Nested combinations
PreFilter.all([
  PreFilter.protocol("tcp"),
  PreFilter.any([
    PreFilter.port_dest(80),
    PreFilter.port_dest(443),
    PreFilter.port_dest(8080)
  ])
])
# TCP packets to ports 80, 443, or 8080
```

### PreFilter Examples

```elixir
# Example 1: Find HTTPS traffic
{:ok, reader} = PcapFileEx.open("capture.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [
  PreFilter.protocol("tcp"),
  PreFilter.port_dest(443)
])
packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.take(100)
PcapFileEx.Pcap.close(reader)

# Example 2: Internal network traffic
{:ok, reader} = PcapFileEx.open("capture.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [
  PreFilter.ip_source_cidr("10.0.0.0/8")
])
packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.to_list()
PcapFileEx.Pcap.close(reader)

# Example 3: Web traffic (HTTP or HTTPS)
{:ok, reader} = PcapFileEx.open("capture.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [
  PreFilter.protocol("tcp"),
  PreFilter.any([
    PreFilter.port_dest(80),
    PreFilter.port_dest(443)
  ])
])
packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.to_list()
PcapFileEx.Pcap.close(reader)

# Example 4: Clearing filter
{:ok, reader} = PcapFileEx.open("capture.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [PreFilter.protocol("tcp")])
tcp_packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.take(100)

:ok = PcapFileEx.Pcap.clear_filter(reader)  # Back to all packets
all_packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.take(100)
PcapFileEx.Pcap.close(reader)
```

## Filter (Elixir-Side Filtering)

### Overview

- **Location**: Elixir code
- **Timing**: After packet decode
- **Performance**: Standard
- **Flexibility**: Very flexible, full Elixir logic

### Available Filter Functions

#### Protocol Filtering

```elixir
# Filter by single protocol
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_protocol(:tcp)
|> Enum.to_list()

# Filter by multiple protocols
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_protocol([:tcp, :udp])
|> Enum.to_list()
```

#### Size Filtering

```elixir
# Exact size
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_size(1500)
|> Enum.to_list()

# Size range
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_size(100..1500)
|> Enum.to_list()

# Minimum size
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_size(1000..)
|> Enum.to_list()
```

#### Time Range Filtering

```elixir
start_time = ~U[2025-01-01 00:00:00Z]
end_time = ~U[2025-01-02 00:00:00Z]

PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_time_range(start_time, end_time)
|> Enum.to_list()
```

#### Endpoint Filtering

```elixir
# By source endpoint
endpoint = %PcapFileEx.Endpoint{ip: "192.168.1.1", port: 8080}
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_source(endpoint)
|> Enum.to_list()

# By destination endpoint
endpoint = %PcapFileEx.Endpoint{ip: "10.0.0.1", port: 80}
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_destination(endpoint)
|> Enum.to_list()

# By either source or destination
endpoint = %PcapFileEx.Endpoint{ip: "192.168.1.1", port: nil}
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_endpoint(endpoint)
|> Enum.to_list()
```

#### Custom Matching

```elixir
# Custom predicate function
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.matching(fn packet ->
  # Any custom logic
  :http in packet.protocols and
  byte_size(packet.data) > 1000 and
  packet.timestamp.hour >= 9 and
  packet.timestamp.hour <= 17
end)
|> Enum.to_list()
```

### Chaining Filters

```elixir
# Combine multiple filters
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_protocol(:tcp)
|> PcapFileEx.Filter.by_size(100..1500)
|> PcapFileEx.Filter.by_time_range(start_time, end_time)
|> PcapFileEx.Filter.matching(fn p ->
  p.dst.port in [80, 443, 8080]
end)
|> Enum.to_list()
```

### Filter Examples

```elixir
# Example 1: Large HTTP packets
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_protocol(:http)
|> PcapFileEx.Filter.by_size(1000..)
|> Enum.to_list()

# Example 2: Traffic to specific server during business hours
server = %PcapFileEx.Endpoint{ip: "10.0.0.1", port: nil}
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.by_destination(server)
|> PcapFileEx.Filter.matching(fn p ->
  p.timestamp.hour >= 9 and p.timestamp.hour <= 17
end)
|> Enum.to_list()

# Example 3: Complex application logic
PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.Filter.matching(fn packet ->
  cond do
    :http in packet.protocols ->
      http = packet.decoded[:http]
      http.method == "POST" and String.contains?(http.path || "", "/api/")

    :tcp in packet.protocols ->
      packet.dst.port in [80, 443, 8080]

    true ->
      false
  end
end)
|> Enum.to_list()
```

## DisplayFilter (Wireshark-Style)

### Overview

- **Location**: Elixir code
- **Timing**: After packet decode
- **Syntax**: Wireshark-style expressions
- **Best for**: Users familiar with Wireshark

### Supported Operators

#### Comparison Operators

```
==    Equal
!=    Not equal
>     Greater than
<     Less than
>=    Greater than or equal
<=    Less than or equal
```

#### Logical Operators

```
&&    AND
||    OR
!     NOT
```

#### Field Types

```
String fields:   "value" or 'value'
Numeric fields:  123, 456.78
IP addresses:    192.168.1.1
Boolean:         true, false
```

### Available Fields

#### IP Layer

```
ip.src          Source IP address
ip.dst          Destination IP address
ip.version      IP version (4 or 6)
```

#### TCP Layer

```
tcp.srcport     Source port
tcp.dstport     Destination port
tcp.flags.syn   SYN flag
tcp.flags.ack   ACK flag
tcp.flags.fin   FIN flag
tcp.flags.rst   RST flag
```

#### UDP Layer

```
udp.srcport     Source port
udp.dstport     Destination port
```

#### HTTP Layer

```
http.request.method      HTTP method (GET, POST, etc.)
http.request.uri         Request URI/path
http.request.version     HTTP version
http.response.code       Response status code
http.host                Host header
```

#### Packet Metadata

```
frame.len        Packet length (bytes)
frame.time       Packet timestamp
```

### DisplayFilter Examples

```elixir
# Example 1: Simple inline filter
packets = PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.DisplayFilter.filter("tcp.dstport == 80")
|> Enum.to_list()

# Example 2: Compiled filter (reuse)
{:ok, filter} = PcapFileEx.DisplayFilter.compile("ip.src == 192.168.1.1 && tcp.dstport == 443")
packets = PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.DisplayFilter.run(filter)
|> Enum.to_list()

# Example 3: HTTP GET requests
packets = PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.DisplayFilter.filter("http.request.method == \"GET\"")
|> Enum.to_list()

# Example 4: Complex expression
packets = PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.DisplayFilter.filter("""
  (ip.src == 192.168.1.1 || ip.dst == 192.168.1.1) &&
  (tcp.dstport == 80 || tcp.dstport == 443) &&
  frame.len > 1000
""")
|> Enum.to_list()

# Example 5: HTTP responses with errors
packets = PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.DisplayFilter.filter("http.response.code >= 400")
|> Enum.to_list()

# Example 6: SYN packets
packets = PcapFileEx.stream!("capture.pcap")
|> PcapFileEx.DisplayFilter.filter("tcp.flags.syn == true && tcp.flags.ack == false")
|> Enum.to_list()
```

## Comparing the Three Approaches

### Same Query, Three Ways

Find all HTTPS traffic from 192.168.1.0/24:

#### Method 1: PreFilter (Fastest for large files)

```elixir
{:ok, reader} = PcapFileEx.open("large.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [
  PreFilter.protocol("tcp"),
  PreFilter.port_dest(443),
  PreFilter.ip_source_cidr("192.168.1.0/24")
])
packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.to_list()
PcapFileEx.Pcap.close(reader)
```

#### Method 2: Filter (Most flexible)

```elixir
source_endpoint = %PcapFileEx.Endpoint{ip: "192.168.1.0/24", port: nil}
packets = PcapFileEx.stream!("large.pcap")
|> PcapFileEx.Filter.by_protocol(:tcp)
|> PcapFileEx.Filter.matching(fn p ->
  p.dst.port == 443 and ip_in_cidr?(p.src.ip, "192.168.1.0/24")
end)
|> Enum.to_list()
```

#### Method 3: DisplayFilter (Wireshark syntax)

```elixir
packets = PcapFileEx.stream!("large.pcap")
|> PcapFileEx.DisplayFilter.filter("""
  tcp.dstport == 443 &&
  ip.src >= 192.168.1.0 &&
  ip.src <= 192.168.1.255
""")
|> Enum.to_list()
```

## Advanced Filtering Patterns

### Pattern 1: Two-Stage Filtering

Combine PreFilter (fast) with Elixir Filter (flexible):

```elixir
# Stage 1: PreFilter eliminates ~90% of packets (fast)
{:ok, reader} = PcapFileEx.open("huge.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [
  PreFilter.protocol("tcp"),
  PreFilter.port_dest(80)
])

# Stage 2: Elixir Filter for complex logic (on remaining 10%)
packets = PcapFileEx.Stream.from_reader!(reader)
|> Stream.filter(fn p ->
  :http in p.protocols and
  p.decoded[:http].method == "POST" and
  String.contains?(p.decoded[:http].path || "", "/api/users")
end)
|> Enum.to_list()

PcapFileEx.Pcap.close(reader)
```

### Pattern 2: Conditional Filtering

```elixir
# Different filters based on packet type
packets = PcapFileEx.stream!("capture.pcap")
|> Stream.filter(fn packet ->
  cond do
    :http in packet.protocols ->
      http = packet.decoded[:http]
      http.method in ["POST", "PUT", "DELETE"]

    :dns in packet.protocols ->
      # DNS query packets
      true

    :tcp in packet.protocols ->
      packet.dst.port in [22, 3389]  # SSH or RDP

    true ->
      false
  end
end)
|> Enum.to_list()
```

### Pattern 3: Stateful Filtering

```elixir
# Track TCP connections, filter by connection state
connections = %{}

packets = PcapFileEx.stream!("capture.pcap")
|> Enum.reduce([], fn packet, acc ->
  if :tcp in packet.protocols do
    conn_key = {packet.src, packet.dst}

    # Update connection state
    # ... stateful logic ...

    # Filter based on state
    if should_include?(packet, connections[conn_key]) do
      [packet | acc]
    else
      acc
    end
  else
    acc
  end
end)
|> Enum.reverse()
```

### Pattern 4: Sampling

```elixir
# Keep every Nth packet
packets = PcapFileEx.stream!("huge.pcap")
|> Stream.with_index()
|> Stream.filter(fn {_packet, index} -> rem(index, 100) == 0 end)
|> Stream.map(fn {packet, _index} -> packet end)
|> Enum.to_list()

# Random sampling (10%)
packets = PcapFileEx.stream!("huge.pcap")
|> Stream.filter(fn _packet -> :rand.uniform() < 0.1 end)
|> Enum.to_list()
```

## Filter Performance Comparison

### Benchmark: 10GB file, 50M packets, find 100 TCP:443 packets

| Method | Time | Memory | Notes |
|--------|------|--------|-------|
| PreFilter | 1.2s | 50MB | Fastest, Rust-side |
| Filter | 120s | 50MB | 100x slower, Elixir-side |
| DisplayFilter | 125s | 50MB | Similar to Filter |
| Two-stage | 5s | 50MB | PreFilter + complex Elixir logic |

## Common Filtering Mistakes

### ❌ Mistake 1: Wrong Filter Choice for Large Files

```elixir
# DON'T: Use Elixir filter on 10GB file for simple query
PcapFileEx.stream!("10gb.pcap")
|> Stream.filter(fn p -> :tcp in p.protocols and p.dst.port == 443 end)
|> Enum.take(10)  # Takes 2 minutes!

# DO: Use PreFilter
{:ok, r} = PcapFileEx.open("10gb.pcap")
:ok = PcapFileEx.Pcap.set_filter(r, [
  PreFilter.protocol("tcp"),
  PreFilter.port_dest(443)
])
packets = PcapFileEx.Stream.from_reader(r) |> Enum.take(10)  # Takes 1 second!
PcapFileEx.Pcap.close(r)
```

### ❌ Mistake 2: Forgetting to Close Reader

```elixir
# DON'T: Forget to close
{:ok, r} = PcapFileEx.open("file.pcap")
:ok = PcapFileEx.Pcap.set_filter(r, [...])
packets = PcapFileEx.Stream.from_reader(r) |> Enum.to_list()
# Missing close!

# DO: Always close
{:ok, r} = PcapFileEx.open("file.pcap")
try do
  :ok = PcapFileEx.Pcap.set_filter(r, [...])
  packets = PcapFileEx.Stream.from_reader(r) |> Enum.to_list()
after
  PcapFileEx.Pcap.close(r)
end
```

### ❌ Mistake 3: Using PreFilter for Broad Queries

```elixir
# DON'T: PreFilter that matches most packets (overhead not worth it)
{:ok, r} = PcapFileEx.open("file.pcap")
:ok = PcapFileEx.Pcap.set_filter(r, [
  PreFilter.any([  # Matches 90% of packets!
    PreFilter.protocol("tcp"),
    PreFilter.protocol("udp")
  ])
])

# DO: Use Elixir filter or no filter at all
packets = PcapFileEx.stream!("file.pcap")
|> Stream.filter(fn p -> p.protocol in [:tcp, :udp] end)
|> Enum.to_list()
```

## Summary: Filter Selection Guide

**Use PreFilter when:**
- ✅ File > 100MB
- ✅ Selective query (<10% of packets)
- ✅ Simple criteria (IP/port/protocol)
- ✅ Need maximum performance

**Use Filter when:**
- ✅ Complex application logic
- ✅ Need to check decoded payloads
- ✅ Flexible predicate functions
- ✅ File < 100MB

**Use DisplayFilter when:**
- ✅ Familiar with Wireshark syntax
- ✅ Want readable filter expressions
- ✅ Field-based queries
- ✅ Network engineer background