README.md

# ExBags

A duplicate bag (multiset) implementation for Elixir with set-like operations.

## Installation

Add to your `mix.exs`:

```elixir
def deps do
  [
    {:ex_bags, "~> 0.1.0"}
  ]
end
```

## Overview

ExBags provides a duplicate bag implementation that allows multiple values for the same key. A duplicate bag is like a map but stores values as lists, enabling you to track multiple occurrences of the same value for each key.

Use cases:
- Data reconciliation and synchronization
- Multiset operations and counting
- Data analysis and comparison
- Inventory management and tracking

## Core Bag Operations

### `new/0`

Creates a new empty bag.

```elixir
iex> ExBags.new()
%{}
```

### `put/3`

Adds a value to the bag for the given key.

```elixir
iex> bag = ExBags.new()
iex> bag = ExBags.put(bag, :a, 1)
iex> bag = ExBags.put(bag, :a, 2)
iex> ExBags.get(bag, :a)
[1, 2]
```

### `get/2`

Gets all values for a given key from the bag.

```elixir
iex> bag = %{a: [1, 2, 3], b: ["hello"]}
iex> ExBags.get(bag, :a)
[1, 2, 3]

iex> ExBags.get(bag, :c)
[]
```

### `keys/1`

Gets all keys from the bag.

```elixir
iex> bag = %{a: [1, 2], b: [3], c: []}
iex> ExBags.keys(bag) |> Enum.sort()
[:a, :b, :c]
```

### `values/1`

Gets all values from the bag, flattened into a single list.

```elixir
iex> bag = %{a: [1, 2], b: [3, 4]}
iex> ExBags.values(bag) |> Enum.sort()
[1, 2, 3, 4]
```

### `update/3`

Updates the values for a key in the bag using a function.

```elixir
iex> bag = %{a: [1, 2, 3]}
iex> ExBags.update(bag, :a, fn values -> Enum.map(values, &(&1 * 2)) end)
%{a: [2, 4, 6]}
```

## Set Operations

### `intersect/2`

Returns a bag containing only the key-value pairs that exist in both bags.

```elixir
iex> ExBags.intersect(%{a: [1, 2], b: [2, 3]}, %{b: [2, 4], c: [5]})
%{b: [2, 3, 2, 4]}

iex> ExBags.intersect(%{a: [1, 1, 2], b: [2, 2, 3]}, %{a: [1, 2], b: [2, 4]})
%{a: [1, 1, 2, 1, 2], b: [2, 2, 3, 2, 4]}
```

### `difference/2`

Returns a bag containing only the key-value pairs that exist in the first bag but not in the second bag.

```elixir
iex> ExBags.difference(%{a: [1, 2], b: [2, 3]}, %{b: [2, 4], c: [5]})
%{a: [1, 2], b: [3]}

iex> ExBags.difference(%{a: [1, 1, 2], b: [2, 2, 3]}, %{a: [1], b: [2]})
%{a: [1, 2], b: [2, 3]}
```

### `symmetric_difference/2`

Returns a bag containing key-value pairs that exist in either bag but not in both.

```elixir
iex> ExBags.symmetric_difference(%{a: [1, 2], b: [2, 3]}, %{b: [2, 4], c: [5]})
%{a: [1, 2], b: [3, 4], c: [5]}

iex> ExBags.symmetric_difference(%{a: [1, 1, 2]}, %{a: [1, 2, 2]})
%{a: [1, 2]}
```

### `reconcile/2`

Performs a reconciliation operation similar to SQL's FULL OUTER JOIN for duplicate bags. Returns a tuple of three bags:

1. Intersection: Key-value pairs that exist in both bags
2. Only in first: Key-value pairs that exist only in the first bag
3. Only in second: Key-value pairs that exist only in the second bag

```elixir
iex> ExBags.reconcile(%{a: [1, 2], b: [2, 3]}, %{b: [2, 4], c: [5]})
{%{b: [2]}, %{a: [1, 2], b: [3]}, %{b: [4], c: [5]}}

iex> ExBags.reconcile(%{a: [1]}, %{b: [2]})
{%{}, %{a: [1]}, %{b: [2]}}
```

## Stream Functions

Memory-efficient stream versions of all functions:

### `intersect_stream/2`, `difference_stream/2`, `symmetric_difference_stream/2`, `reconcile_stream/2`

These functions return streams instead of bags for processing large datasets.

```elixir
iex> ExBags.intersect_stream(%{a: [1, 2], b: [2, 3]}, %{b: [2, 4], c: [5]}) |> Enum.to_list() |> Enum.sort()
[{:b, [2, 3, 2, 4]}]

iex> stream = ExBags.intersect_stream(large_bag1, large_bag2)
iex> first_ten = stream |> Stream.take(10) |> Enum.to_list()

iex> result = ExBags.intersect_stream(bag1, bag2)
...> |> Stream.filter(fn {_key, values} -> length(values) > 1 end)
...> |> Stream.map(fn {key, values} -> {key, Enum.map(values, &(&1 * 2))} end)
...> |> Enum.to_list()

iex> {common, only_first, only_second} = ExBags.reconcile_stream(bag1, bag2)
iex> {Enum.to_list(common) |> Enum.sort(), Enum.to_list(only_first) |> Enum.sort(), Enum.to_list(only_second) |> Enum.sort()}
```

## Use Cases

### Inventory Management

```elixir
inventory = ExBags.new()
inventory = ExBags.put(inventory, :apples, "red")
inventory = ExBags.put(inventory, :apples, "green")
inventory = ExBags.put(inventory, :bananas, "yellow")

ExBags.get(inventory, :apples)
# ["red", "green"]

common_items = ExBags.intersect(warehouse_inventory, store_inventory)
```

### Data Synchronization

```elixir
local_data = %{users: ["alice", "bob"], sessions: ["session1", "session2"]}
remote_data = %{users: ["alice", "charlie"], sessions: ["session2", "session3"]}

{common, local_only, remote_only} = ExBags.reconcile(local_data, remote_data)
# common: %{users: ["alice"], sessions: ["session2"]}
# local_only: %{users: ["bob"], sessions: ["session1"]}
# remote_only: %{users: ["charlie"], sessions: ["session3"]}
```

### Event Tracking

```elixir
events = ExBags.new()
events = ExBags.put(events, :user1, "login")
events = ExBags.put(events, :user1, "view_page")
events = ExBags.put(events, :user2, "login")

common_events = ExBags.intersect(user_events, admin_events)
```

## Performance

- Optimized using Elixir's built-in Map operations
- Time complexity: O(n) where n is the number of keys
- Memory efficient with lazy evaluation for streams
- Handles edge cases gracefully

## Testing

```bash
mix test
```

Test coverage:
- 33 unit tests
- 36 doctests
- 22 property tests using StreamData

Property testing validates functions with various inputs including different value types, empty bags, and large datasets.

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Changelog

### 0.1.0
- Initial release
- Duplicate bag implementation with core operations
- Set operations: intersect, difference, symmetric_difference, reconcile
- Stream versions for memory-efficient processing
- Comprehensive test coverage