README.md

Select File:
# KittAgent

**KittAgent** is an advanced AI agent platform built with **Elixir** and **Phoenix LiveView**, designed to bring physical robots to life using Large Language Models (LLMs).

It currently specializes in controlling **mBot2** robots, enabling them to engage in natural language conversations while autonomously deciding on physical actions like moving and turning based on context.

## 🚀 Key Features

### 1. Real-time Interaction Dashboard
*   **Direct Communication**: Chat directly with your agents via the web interface to test personality and responsiveness.
*   **Multimedia Feedback**: 
    *   **Audio Playback**: Listen to generated voice responses directly in the feed (integrated with TTS engines like Zonos).
    *   **Visual Logs**: See immediate feedback on role, message content, and mood.
*   **Transaction Management**: View and clean up recent interaction logs.
*   **Action Inspection**: Inspect complex `SystemActions` in a formatted, scrollable code view to verify parameters.

### 2. Advanced Agent Management ("Kitts")
*   **Comprehensive Profiles**: Manage multiple agents with distinct profiles including Name, Model, Vendor, Birthday, and Hometown.
*   **Localization Support**:
    *   **Language Selection**: Choose from 18+ languages (Japanese, English, Swahili, etc.) for your agent's communication.
    *   **Timezone Awareness**: Selectable timezone support (via `Tzdata`) to ground the agent in a specific locale.
*   **Personality Engine**: Define complex biographies and behavioral traits ("Personality"). Long descriptions are easily managed via pop-up modals.
*   **Smart Defaults**: Automatically applies system-wide default language and timezone settings to new agents.

### 3. LLM-Driven Intelligence & Control
*   **Flexible Model Selection**:
    *   **Main Conversation Model**: Choose any model supported by your provider (e.g., Gemini 2.0 Flash) for agent dialogue.
    *   **Summarization Model**: Select a specialized model for high-quality memory consolidation.
*   **Custom LLM Providers**: Configure custom API endpoints and keys (e.g., OpenRouter) directly from the settings menu.
*   **Structured Outputs**: Enforces strict JSON Schema for all LLM responses to ensure reliable parsing.
*   **Physical Capabilities (SystemActions)**:
    *   **Direct Code Generation**: The agent generates **MicroPython code** dynamically to control the robot, allowing for complex and adaptive behaviors beyond simple preset commands.
    *   **Hardware Control**:
        *   **Core (CyberPi)**: Control LEDs, speaker, display, and read inputs (buttons, gyro, mic).
        *   **Chassis (mBot2)**: Precise movement control (speed, duration, turning).
        *   **Sensors (mBuild)**: Access external modules like Ultrasonic Sensor 2 and Quad RGB Sensor.
    *   **Flexible Logic**: Supports conditional logic and loops within the generated code (e.g., "Forward until obstacle < 10cm, then turn").

### 4. Comprehensive Activity Monitoring ("Activities")
*   **Live Audit Log**: A dedicated "Activities" dashboard to track all historical agent responses and decisions.
*   **Status Management**: Monitor and manually override action statuses (`pending`, `processing`, `completed`, `failed`).
    *   **Formatted Code Blocks**: Ensure long parameters are readable without breaking the layout.
    *   **Queue Maintenance**: Monitor real-time queue depths and manually clear "Talk" or "System Action" queues (globally or per-agent) to prevent stale task accumulation.
    *   **Advanced Filtering**: Filter logs by Kitt, Status, or Role to pinpoint specific events.

### 5. Dual-Layer Memory Architecture
*   **Short-term Memory (Events)**: Maintains a log of recent interactions for immediate context.
*   **Long-term Memory (Memories)**:
    *   **Auto-Summarization**: A background process condense new events into narrative summaries.
    *   **Persistent Context**: Summaries are injected into the system prompt, providing a persistent sense of history.

### 6. Centralized Configuration ("Settings")
*   **System Defaults**: Set global defaults for new agent creation.
*   **LLM Provider Setup**: Update API keys and Base URLs without touching environment variables or restarting the server.
*   **Model Management**: Switch between different LLM models for conversation and summarization on the fly.

## 🤖 Physical Action Architecture & Client Design

KittAgent employs a distributed client architecture designed to overcome the resource constraints of microcontroller-based robots like **mBot2**.

### Dual-Queue System
To handle different types of agent outputs efficiently, the system maintains two separate, independent queues:

1.  **Talk Queue (`Talks.Queue`)**: Stores audio response data (TTS-generated WAV files).
2.  **System Action Queue (`SystemActions.Queue`)**: Stores physical command data (MicroPython code).

### Client Roles
*   **mBot2 (Robot Client)**:
    *   **Primary Role**: Physical execution.
    *   **Mechanism**: Polls the **System Action Queue**, retrieves generated **MicroPython code**, and executes it locally to perform actions (move, turn, LED control).
    *   **Constraint**: Due to limited memory and processing power, it does *not* handle audio playback.
*   **Companion Device (Mobile/PC Client)**:
    *   **Primary Role**: Voice interaction.
    *   **Mechanism**: Polls the **Talk Queue** and plays back the audio responses.
    *   **Benefit**: This offloads the heavy lifting of audio streaming/decoding from the robot, ensuring smooth, uninterrupted movement and clear voice output.

## 🛠 Tech Stack
*   **Core**: Elixir, Phoenix Framework (LiveView)
*   **Database**: PostgreSQL (with `pgvector` support planned/ready)
*   **AI Provider**: OpenRouter (default), or any OpenAI-compatible API
*   **TTS Provider**: Zonos (Gradio) / Custom Audio Pipelines
*   **Styling**: Tailwind CSS + DaisyUI
*   **Infrastructure**: Docker & Docker Compose

## 🗄 Database Schema Overview

*   **kitts**: Core agent metadata.
*   **biographies**: Detailed "Personality" text.
*   **events**: Raw log of interactions.
*   **contents**: Structured data for each event (message, action, mood, status, audio paths).
*   **system_actions**: Specific parameters for physical movements.
*   **memories**: Narrative summaries generated by the agent.
*   **configs**: Global key-value settings (LLM models, API credentials, defaults).

## 📦 Installation & Setup

### Prerequisites
*   Docker & Docker Compose

### Quick Start

1.  **Clone & Setup:**
    ```bash
    git clone <repository_url>
    cd kitt_agent
    docker compose run --rm app mix setup
    ```

2.  **Initial Configuration:**
    1. Start the server: `docker compose up`
    2. Visit `http://localhost:4000/kitt-web/settings`
    3. Configure your **API Key** and **API Base URL** (OpenRouter default: `https://openrouter.ai/api/v1/chat/completions`).
    4. Select your preferred **Main** and **Summary** models.

3.  **Create your first Kitt:**
    Navigate to the **KITTs** page and click **New Kitt**.

## 📝 License

This project is licensed under the MIT License.