README.md

<p align="center">
  <img src="assets/nsai_registry.svg" alt="NSAI Registry" width="200">
</p>

<h1 align="center">NSAI Registry</h1>

<p align="center">
  <a href="https://github.com/North-Shore-AI/nsai_registry/actions"><img src="https://github.com/North-Shore-AI/nsai_registry/workflows/CI/badge.svg" alt="CI Status"></a>
  <a href="https://hex.pm/packages/nsai_registry"><img src="https://img.shields.io/hexpm/v/nsai_registry.svg" alt="Hex.pm"></a>
  <a href="https://hexdocs.pm/nsai_registry"><img src="https://img.shields.io/badge/docs-hexdocs-blue.svg" alt="Documentation"></a>
  <img src="https://img.shields.io/badge/elixir-%3E%3D%201.14-purple.svg" alt="Elixir">
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License"></a>
</p>

<p align="center">
  Service discovery and registry for the NSAI ecosystem
</p>

---

A robust, production-ready service registry and discovery system for the NSAI (North Shore AI) ecosystem. Built with Elixir, it provides health checking, event broadcasting, multiple storage backends, and distributed clustering capabilities.

## Features

- **Service Registration & Discovery**: Register services and discover them by name with support for multiple instances
- **Health Checking**: Automatic health monitoring with support for HTTP, HTTPS, TCP, and gRPC protocols
- **Circuit Breaker**: Prevent cascading failures with built-in circuit breaker pattern
- **Event Broadcasting**: Real-time PubSub events for service topology changes
- **Multiple Storage Backends**: In-memory (ETS) for development, PostgreSQL for production
- **Load Balancing**: Built-in client with round-robin and health-aware routing
- **Telemetry**: Comprehensive instrumentation for monitoring and observability
- **Distributed Ready**: Optional Horde integration for multi-node clustering
- **CLI Management**: Mix tasks for service management

## Installation

Add `nsai_registry` to your list of dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:nsai_registry, "~> 0.1.0"},

    # Optional: For PostgreSQL backend
    {:postgrex, "~> 0.17"},
    {:ecto_sql, "~> 3.10"},

    # Optional: For distributed registry
    {:horde, "~> 0.9"}
  ]
end
```

## Quick Start

```elixir
# Register a service
{:ok, service} = NsaiRegistry.register(%{
  name: "work",
  host: "localhost",
  port: 4000,
  protocol: :http,
  health_check: "/health",
  metadata: %{version: "0.1.0"}
})

# Discover a service
{:ok, service} = NsaiRegistry.lookup("work")
url = NsaiRegistry.Service.url(service)

# Discover all instances (for load balancing)
{:ok, services} = NsaiRegistry.lookup_all("work")

# Subscribe to topology changes
NsaiRegistry.PubSub.subscribe()

receive do
  {:service_registered, svc} ->
    IO.puts("New service: #{svc.name}")
  {:service_healthy, svc} ->
    IO.puts("Service #{svc.name} is healthy")
end
```

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                      Client Applications                     │
└───────────────┬──────────────────────────────┬──────────────┘
                │                              │
                ▼                              ▼
┌───────────────────────────┐  ┌──────────────────────────────┐
│   NsaiRegistry.Client     │  │   NsaiRegistry (Main API)    │
│  - Load Balancing         │  │  - Register/Deregister       │
│  - Failover               │  │  - Lookup Services           │
│  - Health-Aware Routing   │  │  - Status Updates            │
└───────────────┬───────────┘  └───────────┬──────────────────┘
                │                          │
                └──────────┬───────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                    NsaiRegistry.Registry                     │
│                      (GenServer Core)                        │
└───┬─────────────┬──────────────┬─────────────┬─────────────┘
    │             │              │             │
    ▼             ▼              ▼             ▼
┌─────────┐ ┌──────────┐ ┌────────────┐ ┌──────────────────┐
│ Storage │ │  PubSub  │ │ Telemetry  │ │ Health Checker   │
│ Backend │ │  Events  │ │ Metrics    │ │ - HTTP/TCP/gRPC  │
│ (ETS/PG)│ │          │ │            │ │ - Circuit Breaker│
└─────────┘ └──────────┘ └────────────┘ └──────────────────┘
```

## Configuration

### Basic Configuration

```elixir
# config/config.exs

# Registry configuration
config :nsai_registry, NsaiRegistry.Registry,
  storage_backend: NsaiRegistry.Storage.ETS,
  storage_opts: [table_name: :nsai_registry]

# Health checker configuration
config :nsai_registry, NsaiRegistry.HealthChecker,
  check_interval: 30_000,           # Check every 30 seconds
  timeout: 5_000,                   # 5 second timeout per check
  auto_deregister: false,           # Don't auto-remove unhealthy services
  unhealthy_threshold: 3            # Mark unhealthy after 3 consecutive failures

# Circuit breaker configuration
config :nsai_registry, NsaiRegistry.CircuitBreaker,
  failure_threshold: 5,             # Open circuit after 5 failures
  timeout: 60_000,                  # Wait 60s before testing recovery
  half_open_max_calls: 3            # Max calls in half-open state
```

### PostgreSQL Storage Backend

```elixir
# 1. Configure your Ecto repo
config :my_app, MyApp.Repo,
  database: "my_app_dev",
  username: "postgres",
  password: "postgres",
  hostname: "localhost",
  pool_size: 10

# 2. Use Postgres backend
config :nsai_registry, NsaiRegistry.Registry,
  storage_backend: NsaiRegistry.Storage.Postgres,
  storage_opts: [repo: MyApp.Repo]

# 3. Run the migration
# Copy priv/repo/migrations/create_services.exs.template
# to your app and run: mix ecto.migrate
```

## Usage Examples

### Load Balancing with Client

```elixir
# Get a healthy service instance with automatic failover
{:ok, response} = NsaiRegistry.Client.call("work", fn service ->
  url = NsaiRegistry.Service.url(service)
  Req.post(url <> "/api/task", json: %{job: "process"})
end, max_retries: 3)

# Round-robin load balancing
{:ok, service} = NsaiRegistry.Client.round_robin("work")

# Get all healthy instances
{:ok, healthy_services} = NsaiRegistry.Client.get_all_healthy("work")
```

### Health Checking

```elixir
# HTTP/HTTPS health check (default)
NsaiRegistry.register(%{
  name: "api",
  host: "api.example.com",
  port: 443,
  protocol: :https,
  health_check: "/health"
})

# TCP health check
NsaiRegistry.register(%{
  name: "database",
  host: "db.example.com",
  port: 5432,
  protocol: :tcp
})

# gRPC health check
NsaiRegistry.register(%{
  name: "grpc-service",
  host: "grpc.example.com",
  port: 9090,
  protocol: :grpc
})

# Manual health check trigger
NsaiRegistry.HealthChecker.check_now()
NsaiRegistry.HealthChecker.check_service("work:localhost:4000")
```

### Event Subscriptions

```elixir
# Subscribe to all service events
NsaiRegistry.PubSub.subscribe()

# Subscribe to specific service events
NsaiRegistry.PubSub.subscribe("work")

# Use the Client helper for callbacks
NsaiRegistry.Client.watch("work",
  on_healthy: fn service ->
    Logger.info("Service #{service.name} is healthy!")
  end,
  on_unhealthy: fn service ->
    Logger.warning("Service #{service.name} is unhealthy!")
  end
)
```

### Circuit Breaker

```elixir
# The circuit breaker automatically protects health checks
# You can also use it directly:

NsaiRegistry.CircuitBreaker.call("my-operation", fn ->
  # Expensive or failure-prone operation
  perform_external_api_call()
end)

# Check circuit state
state = NsaiRegistry.CircuitBreaker.get_state("my-operation")
# Returns: :closed | :open | :half_open

# Get statistics
stats = NsaiRegistry.CircuitBreaker.stats()
```

## CLI Management

```bash
# List all registered services
mix nsai_registry.list

# Register a service
mix nsai_registry.register work localhost 4000 --health-check /health

# Register with metadata
mix nsai_registry.register api api.example.com 443 \
  --protocol https \
  --metadata version=1.0.0 \
  --metadata region=us-east

# Deregister a service
mix nsai_registry.deregister work:localhost:4000

# Trigger health checks
mix nsai_registry.health_check                      # All services
mix nsai_registry.health_check work:localhost:4000  # Specific service
```

## Telemetry Events

NsaiRegistry emits comprehensive telemetry events for monitoring:

```elixir
:telemetry.attach_many(
  "nsai-registry-handler",
  [
    [:nsai_registry, :register, :stop],
    [:nsai_registry, :deregister, :stop],
    [:nsai_registry, :lookup, :stop],
    [:nsai_registry, :health_check, :stop],
    [:nsai_registry, :status_change]
  ],
  fn event_name, measurements, metadata, _config ->
    # Log or send to monitoring system
    Logger.info("Event: #{inspect(event_name)}")
    Logger.info("Duration: #{measurements[:duration]}")
    Logger.info("Service: #{metadata[:service_name]}")
  end,
  nil
)
```

## Testing

```bash
# Run all tests
mix test

# Run with coverage
mix test --cover

# Run property-based tests
mix test test/nsai_registry/property_test.exs

# Run quality checks
mix format --check-formatted
mix credo --strict
mix dialyzer
```

## Development

```bash
# Get dependencies
mix deps.get

# Compile
mix compile

# Format code
mix format

# Run linter
mix credo --strict

# Type checking
mix dialyzer

# Generate documentation
mix docs

# Start IEx with the application
iex -S mix
```

## Production Deployment

### Recommended Configuration

```elixir
# config/prod.exs

config :nsai_registry, NsaiRegistry.Registry,
  storage_backend: NsaiRegistry.Storage.Postgres,
  storage_opts: [repo: MyApp.Repo]

config :nsai_registry, NsaiRegistry.HealthChecker,
  check_interval: 15_000,           # More frequent checks
  timeout: 3_000,
  auto_deregister: true,            # Auto-remove unhealthy services
  unhealthy_threshold: 2

config :nsai_registry, NsaiRegistry.CircuitBreaker,
  failure_threshold: 3,
  timeout: 30_000,
  half_open_max_calls: 2

# Enable telemetry reporting
config :nsai_registry, :telemetry,
  enabled: true,
  reporters: [MyApp.TelemetryReporter]
```

### Distributed Clustering (Optional)

For multi-node deployments with Horde:

```elixir
# config/config.exs
config :nsai_registry, :distributed, true

# In your application supervision tree
def start(_type, _args) do
  children = [
    # ... other children
    {Horde.Registry, [name: NsaiRegistry.HordeRegistry, keys: :unique]},
    {Horde.DynamicSupervisor, [name: NsaiRegistry.HordeSupervisor, strategy: :one_for_one]},
    NsaiRegistry.Application
  ]

  Supervisor.start_link(children, strategy: :one_for_one)
end
```

## Performance Characteristics

### ETS Backend
- **Reads**: O(1) - Hash table lookups
- **Writes**: O(1) - Direct insertion
- **Memory**: In-memory only, lost on restart
- **Throughput**: Millions of ops/second
- **Best for**: Development, single-node deployments

### PostgreSQL Backend
- **Reads**: O(log n) with indexes
- **Writes**: O(log n) with B-tree
- **Memory**: Persistent storage
- **Throughput**: Thousands of ops/second
- **Best for**: Production, multi-node clusters

## Comparison with Alternatives

| Feature | NsaiRegistry | Consul | etcd | Eureka |
|---------|--------------|--------|------|--------|
| Language | Elixir | Go | Go | Java |
| Storage | ETS/Postgres | Raft | Raft | In-Memory |
| Health Checks | HTTP/TCP/gRPC | ✓ | ✗ | HTTP |
| Circuit Breaker | ✓ | ✗ | ✗ | ✗ |
| PubSub Events | ✓ | ✓ | ✓ | ✗ |
| Multi-Protocol | ✓ | Limited | Limited | HTTP Only |
| Elixir Native | ✓ | ✗ | ✗ | ✗ |

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass: `mix test`
5. Run quality checks: `mix format && mix credo --strict`
6. Submit a pull request

## License

Copyright (c) 2025 North Shore AI

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- Built on Phoenix PubSub for reliable event broadcasting
- Inspired by Consul, etcd, and Eureka
- Part of the North Shore AI ecosystem for ML reliability research

## Support

- Documentation: https://hexdocs.pm/nsai_registry
- Issues: https://github.com/North-Shore-AI/nsai_registry/issues
- Discussions: https://github.com/North-Shore-AI/nsai_registry/discussions