# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.7.9] - 2025-11-16
### Added
- **Gateway Supervision Refactoring**: Implemented proper OTP supervision tree
- New 3-tier architecture: `macula_gateway_sup` (root) supervises `macula_gateway_quic_server`, `macula_gateway`, `macula_gateway_workers_sup`
- Added `macula_gateway_quic_server.erl` - Dedicated QUIC transport layer (248 LOC, 17 tests)
- Added `macula_gateway_workers_sup.erl` - Supervises business logic workers (152 LOC, 24 tests)
- Added `macula_gateway_clients.erl` - Renamed from `macula_gateway_client_manager` (clearer naming)
- Circular dependency resolution via `set_gateway/2` callback pattern
- `rest_for_one` supervision strategy for controlled fault isolation
### Changed
- **Gateway Architecture**: Refactored from manual process management to supervised architecture
- Gateway now finds siblings via supervisor instead of starting them manually
- Simplified `macula_gateway` init/1 - uses `find_parent_supervisor/0` and `find_sibling/2`
- Removed manual lifecycle management - supervisor handles cleanup
- Updated `macula_gateway_sup.erl` to be root supervisor (was workers supervisor)
- All gateway tests updated for new supervision tree (106 tests, 0 failures)
### Fixed
- **CRITICAL**: Gateway now actually USES DHT-routed pub/sub (v0.7.8 had the code but wasn't calling it!)
- Bug: Gateway's `handle_publish` was still using v0.7.7 endpoint-based routing
- Impact: v0.7.8 protocol infrastructure existed but gateway bypassed it entirely
- Root cause: `handle_publish` (macula_gateway.erl:885-943) never called `macula_pubsub_routing`
- Solution: Rewrote `handle_publish` to use `macula_pubsub_routing:wrap_publish` and send via `pubsub_route` messages
- Flow: Gateway now queries DHT for `node_id` (not endpoint), wraps PUBLISH in `pubsub_route`, sends via mesh connection manager
- Result: Messages now actually route via multi-hop Kademlia DHT to remote subscribers
- Fixed test failures in `macula_connection_tests` - replaced invalid `connected` message type with `subscribe`
- Fixed edoc warning in `macula_gateway_sup.erl` - replaced markdown code fence with HTML pre tags for proper documentation generation
### Improved
- **Fault Tolerance**: Automatic recovery from gateway/QUIC/worker crashes
- **Production Stability**: Proper OTP supervision with configurable restart strategies
- **Code Organization**: Clean separation between transport (QUIC), coordination (gateway), and business logic (workers)
- **Testability**: Each module tested independently with comprehensive coverage
### Technical Details
- v0.7.8 added `pubsub_route` protocol + routing modules but gateway never used them
- v0.7.9 integrates the v0.7.8 infrastructure into gateway's publish flow
- This completes the DHT-routed pub/sub implementation started in v0.7.8
- Supervision refactoring provides +2/10 scalability improvement (foundational infrastructure)
- Enables future optimizations: process pools, connection pooling, horizontal scaling
## [0.7.8] - 2025-11-16
### Fixed
- **CRITICAL**: Implemented multi-hop DHT routing for pub/sub to fix matchmaking
- Bug: v0.7.7 gateway queried DHT but routed to endpoints, which failed for NAT peers
- Impact: Matchmaking still broken - messages couldn't reach subscribers behind NAT
- Root cause: Split-brain architecture - subscribers register locally but routing via gateway
- Solution: Multi-hop Kademlia DHT routing (same pattern as RPC routing)
### Added
- **Protocol Layer**: New `pubsub_route` message type (0x13)
- Wraps PUBLISH messages for multi-hop routing through mesh
- Fields: `destination_node_id`, `source_node_id`, `hop_count`, `max_hops`, `topic`, `payload`
- Protocol encoder/decoder support with validation
- 8 encoder tests + 3 decoder tests added
- **Routing Module**: `macula_pubsub_routing.erl` (NEW - 115 LOC)
- Stateless routing logic for pub/sub messages
- `wrap_publish/4` - Wraps PUBLISH in routing envelope
- `route_or_deliver/3` - Routes to next hop or delivers locally
- `should_deliver_locally/2` - Checks if destination matches
- TTL protection via `max_hops` (default: 10)
- 14 comprehensive tests (all passing)
- **Gateway Integration**: Enhanced `macula_gateway.erl`
- Added `handle_decoded_message` clause for `pubsub_route` messages
- Routes via XOR distance to next hop OR delivers locally
- `handle_pubsub_route_deliver/2` - Unwraps and delivers to local subscribers
- `forward_pubsub_route/3` - Forwards to next hop through mesh
- **Pub/Sub Handler**: Updated `macula_pubsub_dht.erl`
- `route_to_subscribers/5` now uses actual DHT routing (was TODO stub)
- Extracts subscriber `node_id` (not endpoint) from DHT results
- Wraps PUBLISH in `pubsub_route` envelope
- Sends via connection manager which routes through gateway
### Technical Details
**v0.7.7 Architecture (BROKEN):**
- ❌ Publisher queries DHT for subscriber endpoints
- ❌ Tries to route directly to endpoints
- ❌ Fails for NAT peers (can't accept connections)
- ❌ Matchmaking stuck on "Looking for opponent..."
**v0.7.8 Architecture (FIXED):**
- ✅ Publisher queries DHT for subscriber node IDs
- ✅ Wraps PUBLISH in `pubsub_route` envelope
- ✅ Routes via multi-hop Kademlia (same as RPC)
- ✅ Works with relay OR direct connections
- ✅ Matchmaking succeeds across NAT peers
**Message Flow:**
```
Publisher Gateway Node A Subscriber
| | | |
|--pubsub_route---------->| | |
| dest: Subscriber |--pubsub_route----->| |
| topic: "matchmaking" | (forward closer) |--pubsub_route------->|
| payload: {msg} | | |
| | | | Deliver locally
```
### Tests
- Protocol encoder: 49 tests (8 new for pubsub_route)
- Protocol decoder: 35 tests (3 new for pubsub_route)
- Pub/sub routing: 14 tests (all passing)
- wrap_publish envelope creation
- should_deliver_locally checks
- route_or_deliver decision logic
- TTL exhaustion handling
- No-route error handling
### Architecture Documentation
- Added `architecture/dht_routed_pubsub.md` with complete design
- Future refactoring note: Consider unifying RPC and pub/sub routing modules (nearly identical logic)
**This completes the DHT-routed pub/sub implementation and should enable working matchmaking.**
---
## [0.7.7] - 2025-11-15
### Fixed
- **CRITICAL**: Gateway pub/sub now queries DHT for remote subscribers
- Bug: Gateway only checked local subscriptions, never queried DHT for remote subscribers
- Impact: Distributed pub/sub and matchmaking completely broken - remote peers couldn't receive messages
- Root cause: `handle_publish` only called `macula_gateway_pubsub:get_subscribers` (local streams only)
- Fix Phase 1: Added endpoint → stream PID tracking in `macula_gateway_client_manager`
- New state field: `endpoint_to_stream :: #{binary() => pid()}`
- New API: `get_stream_by_endpoint/2`
- Updated `store_client_stream/4` to track endpoints
- Updated `remove_client/2` to clean up endpoint mappings
- Fix Phase 2: Modified `handle_publish` to query DHT
- Queries local subscribers (existing behavior)
- Queries DHT for remote subscribers via `crypto:hash(sha256, Topic)`
- Converts remote endpoints to stream PIDs using client_manager
- Combines local + remote and delivers to all
- Fix Phase 3: Added `macula_gateway_dht:lookup_value/1`
- Synchronous lookup from local DHT storage
- Calls `macula_routing_server:find_value/3` with K=20
- Returns `{ok, [Subscriber]}` or `{error, not_found}`
- Tests: 90 tests passing (39 client_manager + 49 gateway + 7 endpoint + 5 pub/sub DHT)
**This completes the distributed pub/sub fix and enables working matchmaking across multiple peer containers.**
### Technical Details
Before v0.7.7:
- ❌ Gateway only queried `macula_gateway_pubsub` (local subscriptions)
- ❌ Remote subscribers stored in DHT but never looked up
- ❌ Pub/sub messages only delivered to local streams
- ❌ Multi-peer matchmaking broken
After v0.7.7:
- ✅ Gateway queries both local + DHT for subscribers
- ✅ Remote endpoints resolved to stream PIDs via endpoint tracking
- ✅ Messages delivered to all subscribers (local + remote)
- ✅ Multi-peer matchmaking works correctly
The architecture remains hub-and-spoke (v0.7.x):
- All peers connect to gateway
- Gateway routes all pub/sub messages
- Subscriptions stored in DHT for discovery
- Gateway has stream PIDs for all connected peers
---
## [0.8.0] - TBD (Q2 2025)
### Planned - True Mesh Architecture
- **BREAKING**: Opportunistic NAT hole punching for direct peer-to-peer connections
- 80% direct P2P connections (cone NAT, no firewall)
- 20% gateway relay fallback (symmetric NAT, strict firewalls)
- True mesh topology (no single point of failure)
- New modules: `macula_nat_discovery`, `macula_hole_punch`, `macula_connection_upgrade`
- Backward compatible with v0.7.x gateway relay architecture
**This will transform Macula from hub-and-spoke (star topology) to true decentralized mesh.**
See `architecture/NAT_TRAVERSAL_ROADMAP.md` for complete design.
---
## [0.7.6] - 2025-11-15
### Fixed
- **CRITICAL**: Disabled QUIC transport-layer idle timeout causing connection closures
- Root cause: MsQuic default idle timeout of 30 seconds (2x = 60s to closure)
- v0.7.4-0.7.5 application-level PING/PONG worked but didn't reset QUIC transport timer
- Added `idle_timeout_ms => 0` to both client connection and gateway listener options
- Setting to 0 disables QUIC idle timeout entirely
- Connections now stay alive indefinitely (application PING/PONG provides health checks)
- Modified: `macula_quic:connect/4` and `macula_quic:listen/2`
**This completes the connection stability fix started in v0.7.4-0.7.5.**
### Tests
- Added `test/macula_quic_idle_timeout_tests.erl` with 7 tests
- Client connection idle timeout configuration
- Gateway listener idle timeout configuration
- Option structure and value validation
- Defense-in-depth architecture documentation
### Technical Details
**Defense in Depth** approach:
1. **Transport Layer** (v0.7.6): QUIC idle timeout disabled (`idle_timeout_ms => 0`)
2. **Application Layer** (v0.7.4-0.7.5): PING/PONG keep-alive every 30 seconds
3. **Result**: Connections stay alive + health monitoring
Previous versions had application keep-alive but QUIC transport still enforced 30s idle timeout independently.
---
## [0.7.5] - 2025-11-15
### Fixed
- **CRITICAL**: Gateway PING message handler missing, preventing keep-alive from working
- v0.7.4 implemented keep-alive on edge peer side only
- Gateway had no handler for incoming PING messages
- Result: PINGs sent but never acknowledged, connections still timed out after 2 minutes
- Added `handle_decoded_message({ok, {ping, PingMsg}}, ...)` to gateway
- Gateway now responds with PONG to all incoming PING messages
- Keep-alive now works bidirectionally (edge peer ↔ gateway)
- Also added PONG message handler to gateway for completeness
**This completes the keep-alive implementation started in v0.7.4.**
### Technical Details
The keep-alive flow now works correctly:
1. Edge peer timer fires every 30 seconds (configurable)
2. Edge peer sends PING to gateway
3. **Gateway receives PING and responds with PONG** (new in v0.7.5)
4. Edge peer receives PONG confirmation
5. QUIC connection stays alive (no idle timeout)
Without this fix, PINGs were sent but ignored, causing connections to timeout despite v0.7.4's implementation.
---
## [0.7.4] - 2025-11-15
### Fixed
- **CRITICAL**: Configurable keep-alive mechanism to prevent QUIC connection timeouts
- PING/PONG message support in `macula_connection`
- Default keep-alive interval: 30 seconds (configurable)
- Keep-alive enabled by default (can be disabled via options)
- Automatic PONG response to incoming PING messages
- Configuration via `macula_connection:default_config/0`
- Prevents 2-minute connection timeout that broke distributed matchmaking
- Added 6 tests for keep-alive functionality (all passing)
**This is a critical fix for production deployments where QUIC connections timeout after ~2 minutes of inactivity, breaking pub/sub and matchmaking.**
### Configuration
Enable/disable keep-alive:
```erlang
%% Enable with custom interval (milliseconds)
Opts = #{
keepalive_enabled => true,
keepalive_interval => 30000 %% 30 seconds
}.
%% Disable keep-alive
Opts = #{
keepalive_enabled => false
}.
%% Use defaults (enabled, 30 second interval)
DefaultConfig = macula_connection:default_config().
```
### Architecture Note
**v0.7.4 maintains hub-and-spoke (star) topology**:
- Edge peers connect to gateway (not each other)
- Gateway routes all messages (relay architecture)
- Gateway is single point of failure (by design for now)
- DHT routing table exists but routing happens at gateway
- True peer-to-peer mesh deferred to v0.8.0 (NAT traversal required)
## [0.7.3] - 2025-11-15
### Fixed
- **CRITICAL**: Fixed DHT routing table address serialization crash in `macula_gateway_dht`
- Bug: Gateway stored parsed address **tuples** `{{127,0,0,1}, 9443}` in DHT instead of binary strings
- Impact: When FIND_VALUE replies tried to serialize node addresses, msgpack returned error `{:error, {:badarg, {{127,0,0,1}, 9443}}}`
- Root cause: `macula_gateway.erl:522` used `Address` (tuple from `parse_endpoint/1`) instead of `Endpoint` (binary string)
- Error chain: DHT stored tuples → encode_node_info extracted tuples → msgpack:pack failed → byte_size crashed
- Symptoms: Gateway crashed with "ArgumentError: 1st argument not a bitstring" when peers queried DHT
- Fix: Store original `Endpoint` binary string in DHT routing table instead of parsed tuple
- Added test: `dht_address_serialization_test` documents bug and validates fix
**This is a critical fix for distributed matchmaking and service discovery. Without it, DHT queries crash the gateway.**
## [0.7.2] - 2025-11-15
### Fixed
- **CRITICAL**: Fixed gateway crash in `parse_endpoint/1` when DNS resolution fails
- Bug: `inet:getaddr/2` error tuple was not handled, causing ArgumentError when passed to `byte_size/1`
- Impact: Gateway crashed repeatedly, closing all client connections and preventing pub/sub communication
- Symptoms: "Failed to publish to topic: :closed", "Failed to send STORE for subscription: :closed"
- Fix: Added proper error handling with localhost fallback when DNS resolution fails
- Now returns `{{127,0,0,1}, Port}` fallback instead of crashing
**This is a critical fix for production deployments where endpoint DNS resolution may fail.**
## [0.7.1] - 2025-11-15
### Fixed
- **CRITICAL**: Fixed ArithmeticError in `macula_pubsub_handler` message ID handling
- Bug: Was assigning binary MsgId to counter instead of integer NewCounter
- Impact: Caused pub/sub to crash on second publish attempt with "bad argument in arithmetic expression"
- Fix: Corrected destructuring in line 300 to use `{_MsgId, NewCounter}` instead of `{MsgIdCounter, _}`
- Now properly increments integer counter instead of trying to do arithmetic on binary
**This is a critical fix for anyone using pub/sub functionality in v0.7.0.**
## [0.7.0] - 2025-11-15
### Changed
- **BREAKING**: Major nomenclature refactoring for clarity and industry alignment
- Renamed `macula_connection` → `macula_peer` (mesh participant facade - high-level API)
- Renamed `macula_connection_manager` → `macula_connection` (QUIC transport layer - low-level)
- Follows industry standards used by libp2p, IPFS, and BitTorrent
- Clear separation: `macula_peer` = mesh participant, `macula_connection` = transport
### Added
- Comprehensive transport layer test coverage (36 tests total)
- 11 new tests for message decoding, buffering, URL parsing, and realm normalization
- All tests passing with zero regressions
- Complete v0.7.0 documentation in CLAUDE.md
- Migration guide with specific API examples
- Architecture rationale and benefits
- Status tracking for implementation phases
### Migration Guide (0.6.x → 0.7.0)
**API Changes:**
All high-level mesh operations now use `macula_peer` instead of `macula_connection`:
```erlang
%% Before (0.6.x)
{ok, Client} = macula_connection:start_link(Url, Opts).
ok = macula_connection:publish(Client, Topic, Data).
{ok, SubRef} = macula_connection:subscribe(Client, Topic, Callback).
{ok, Result} = macula_connection:call(Client, Procedure, Args).
%% After (0.7.0)
{ok, Client} = macula_peer:start_link(Url, Opts).
ok = macula_peer:publish(Client, Topic, Data).
{ok, SubRef} = macula_peer:subscribe(Client, Topic, Callback).
{ok, Result} = macula_peer:call(Client, Procedure, Args).
```
**Why This Change?**
The original naming was confusing:
- ❌ `macula_connection` served both facade AND transport roles
- ❌ Mixed high-level mesh operations with low-level QUIC handling
- ❌ Not aligned with P2P industry standards
After v0.7.0:
- ✅ `macula_peer` = mesh participant (clear high-level API for pub/sub, RPC, DHT)
- ✅ `macula_connection` = QUIC transport (clear low-level transport layer)
- ✅ Follows libp2p/IPFS/BitTorrent naming conventions
**Note:** The `macula_client` wrapper module has been updated to use `macula_peer` internally, so if you're using `macula_client`, no changes are required.
## [0.6.7] - 2025-11-15
### Fixed
- **CRITICAL:** Fixed all installation examples to use Hex package references instead of git dependencies
- README.md: Changed from git-based to `{:macula, "~> 0.6"}` (Elixir) and `{macula, "0.6.7"}` (Erlang)
- HELLO_WORLD.md: Updated to use proper Hex package format
- architecture/macula_http3_mesh_hello_world.md: Fixed tutorial installation examples
- architecture/macula_http3_mesh_rpc_guide.md: Fixed migration guide examples
- All code examples now show proper Hex.pm installation for published package
## [0.6.6] - 2025-11-15
### Fixed
- Fixed navigation links in documentation guides to use ex_doc HTML filenames
- Changed GitHub-style relative paths (`../README.md`) to ex_doc HTML references (`readme.html`)
- Fixed all navigation links in EXECUTIVE_SUMMARY.md, COMPARISONS.md, USE_CASES.md, and DEVELOPMENT.md
- Links now work correctly in published Hexdocs without "page not found" errors
## [0.6.5] - 2025-11-15
### Changed
- Updated to modern alternative logo (macula-alt-logo.svg) in both README.md and ex_doc
- Changed tutorial greeting to brand-specific "Hello, Macula!" instead of generic greeting
### Fixed
- Replaced old color logo with cleaner, more modern alternative logo for better visual appeal
## [0.6.4] - 2025-11-15
### Changed
- **Documentation restructuring** - Split README.md into focused landing page with table of contents
- Created `docs/EXECUTIVE_SUMMARY.md` - Why Macula and the case for decentralization
- Created `docs/COMPARISONS.md` - How Macula compares to libp2p, Distributed Erlang, Akka, etc.
- Created `docs/USE_CASES.md` - Real-world applications across business, IoT, and AI domains
- Created `docs/DEVELOPMENT.md` - Complete development guide and coding standards
- README.md now serves as concise landing page (119 lines vs 372 lines)
- All detailed content accessible via clear table of contents
- Removed Mermaid diagram from README.md (ex_doc doesn't support Mermaid - works on GitHub)
### Fixed
- ex_doc landing page uses HELLO_WORLD.md (tutorial-first approach, no multi-page split)
- Documentation properly links to all new guide documents
- Better first impression for Hex.pm users (logo, quick navigation)
## [0.6.3] - 2025-11-15
### Fixed
- Removed README.md from ex_doc extras to prevent multi-page split and broken landing page
- Documentation now correctly redirects to API reference page
## [0.6.2] - 2025-11-15
### Fixed
- ex_doc landing page configuration (`{main, "api-reference"}`) - resolved "readme.html not found" error
## [0.6.1] - 2025-11-15
### Added
- Professional documentation structure for Hex publication
- Architecture diagram in README.md (Mermaid format) showing mesh topology
- Organized documentation: moved 50+ files from root to docs/archive/, docs/development/, docs/planning/
- Created docs/README.md navigation index
- Logo and assets configuration for ex_doc
- Comprehensive Hex package file list (artwork/, docs/, architecture/)
### Fixed
- README.md badge rendering (moved badges outside `<div>` tag for proper GitHub display)
- ex_doc assets configuration (deprecated warning resolved)
- ex_doc landing page configuration (changed `{main, "readme"}` to `{main, "api-reference"}` to fix "readme.html not found" error)
- Hex package configuration to include all necessary assets and documentation
- Documentation organization for professional first impression
## [0.6.0] - 2025-11-15
### Changed
- **BREAKING**: Renamed environment variable from `GATEWAY_REALM` to `MACULA_REALM` for better API consistency
- All `MACULA_*` environment variables now follow consistent naming
- Applies to both gateway mode and edge peer mode
- Update your deployment configurations to use `MACULA_REALM` instead of `GATEWAY_REALM`
### Added
- Comprehensive Kademlia DHT architecture documentation (`docs/KADEMLIA_DHT_ARCHITECTURE.md`)
- XOR distance metric explanation
- K-bucket routing table details
- DHT operations (PING, STORE, FIND_NODE, FIND_VALUE)
- Iterative lookup algorithm
- Macula-specific adaptations (realm-scoped DHT, HTTP/3 transport)
- Performance characteristics and comparisons
### Fixed
- Updated documentation to reflect `MACULA_REALM` environment variable usage
- Updated `entrypoint.sh`, `Dockerfile.gateway`, and `config/sys.config` to use `MACULA_REALM`
### Upcoming in v0.7.0
- Architecture improvement: Separation of `macula_connection` into `macula_peer` (high-level mesh API) and `macula_connection` (low-level QUIC transport)
- See `docs/NOMENCLATURE_PROPOSAL_CONNECTION_TO_PEER.md` and `docs/PEER_CONNECTION_SEPARATION_PLAN.md` for details
- Expected timeline: 4-5 weeks after v0.6.0 release
### Migration Guide (0.5.0 → 0.6.0)
If you're using Macula in gateway mode or configuring realm multi-tenancy:
**Before (0.5.0):**
```bash
export GATEWAY_REALM=my-app
```
**After (0.6.0):**
```bash
export MACULA_REALM=my-app
```
**Elixir/Phoenix runtime.exs:**
```elixir
# Before (0.5.0)
System.put_env("GATEWAY_REALM", realm)
# After (0.6.0)
System.put_env("MACULA_REALM", realm)
```
## [0.5.0] - 2025-11-14
### Added
- Initial public release
- HTTP/3 (QUIC) mesh networking platform
- Gateway mode for accepting incoming connections
- Edge peer mode for mesh participation
- Multi-tenancy via realm isolation
- Pub/Sub messaging with wildcard support
- RPC (request/response) patterns
- Service discovery and advertisement
- mDNS local discovery support
- Process registry via gproc
- Comprehensive documentation
### Known Issues
- Gateway mode requires proper TLS certificate configuration
- Certificates must have Subject Alternative Name (SAN) extension
- Docker deployments require proper file ownership (`--chown=app:app`)
---
[0.7.0]: https://github.com/macula-io/macula/compare/v0.6.7...v0.7.0
[0.6.7]: https://github.com/macula-io/macula/compare/v0.6.6...v0.6.7
[0.6.0]: https://github.com/macula-io/macula/compare/v0.5.0...v0.6.0
[0.5.0]: https://github.com/macula-io/macula/releases/tag/v0.5.0