docs/guides/file_search_stores.md

# File Search Stores Guide

Complete guide to using File Search Stores for semantic search and retrieval-augmented generation (RAG) in the Gemini Elixir client.

## Table of Contents

- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Quick Start](#quick-start)
- [Creating Stores](#creating-stores)
- [Managing Documents](#managing-documents)
- [Querying Stores](#querying-stores)
- [Best Practices](#best-practices)
- [Advanced Usage](#advanced-usage)
- [Error Handling](#error-handling)
- [API Reference](#api-reference)

## Overview

File Search Stores enable semantic search over your documents using vector embeddings. They are part of Google's RAG (Retrieval-Augmented Generation) system and allow you to:

- **Store and index documents** for semantic search
- **Ground AI responses** with your own data
- **Build knowledge bases** from your document collections
- **Search across documents** using natural language queries

### Key Features

- **Automatic Indexing**: Documents are automatically chunked and indexed
- **Semantic Search**: Find relevant content using natural language
- **Vector Embeddings**: Powered by Google's text-embedding models
- **RAG Integration**: Use directly in generation requests for grounded responses
- **Document Management**: Full CRUD operations on stores and documents

### Important Notes

- **Vertex AI Only**: File Search Stores are only available through Vertex AI authentication
- **Asynchronous Processing**: Store creation and document indexing happen asynchronously
- **Automatic Chunking**: Documents are split into chunks optimized for retrieval

## Prerequisites

### Required Setup

1. **Google Cloud Project**: You need an active GCP project
2. **Vertex AI API**: Must be enabled in your project
3. **Authentication**: Valid service account credentials

### Environment Variables

```bash
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
```

### Elixir Configuration

```elixir
# config/config.exs
config :gemini_ex,
  auth: %{
    type: :vertex_ai,
    credentials: %{
      project_id: System.get_env("GOOGLE_CLOUD_PROJECT"),
      location: "us-central1"  # Choose your region
    }
  }
```

## Quick Start

Here's a complete example of creating a store, adding documents, and using it for search:

```elixir
alias Gemini.APIs.FileSearchStores
alias Gemini.Types.CreateFileSearchStoreConfig

# 1. Create a store
config = %CreateFileSearchStoreConfig{
  display_name: "Product Documentation",
  description: "Technical documentation for all our products"
}

{:ok, store} = FileSearchStores.create(config, auth: :vertex_ai)

# 2. Wait for the store to be ready
{:ok, ready_store} = FileSearchStores.wait_for_active(store.name)
IO.puts("Store ready: #{ready_store.name}")

# 3. Upload and import documents
{:ok, doc1} = FileSearchStores.upload_to_store(
  store.name,
  "/path/to/product-manual.pdf",
  display_name: "Product Manual v2.0"
)

{:ok, doc2} = FileSearchStores.upload_to_store(
  store.name,
  "/path/to/api-reference.md",
  display_name: "API Reference"
)

# 4. Wait for documents to be processed
{:ok, _} = FileSearchStores.wait_for_document(doc1.name)
{:ok, _} = FileSearchStores.wait_for_document(doc2.name)

# 5. Use in generation for grounded responses
{:ok, response} = Gemini.generate_content(
  "What are the safety features in the product?",
  tools: [
    %{file_search_stores: [store.name]}
  ]
)

IO.puts(Gemini.extract_text!(response))
```

## Creating Stores

### Basic Store Creation

Create a store with just a name:

```elixir
config = %CreateFileSearchStoreConfig{
  display_name: "My Knowledge Base"
}

{:ok, store} = FileSearchStores.create(config, auth: :vertex_ai)
```

### Store with Description

Add a description for better organization:

```elixir
config = %CreateFileSearchStoreConfig{
  display_name: "Customer Support KB",
  description: "Knowledge base for customer support team with FAQs and troubleshooting guides"
}

{:ok, store} = FileSearchStores.create(config, auth: :vertex_ai)
```

### Store with Custom Vector Config

Specify embedding model and dimensions:

```elixir
config = %CreateFileSearchStoreConfig{
  display_name: "Technical Docs",
  vector_config: %{
    embedding_model: "text-embedding-004",
    dimensions: 768
  }
}

{:ok, store} = FileSearchStores.create(config, auth: :vertex_ai)
```

### Waiting for Store Activation

Stores are created asynchronously. Always wait for activation before adding documents:

```elixir
{:ok, store} = FileSearchStores.create(config, auth: :vertex_ai)

# Wait with default settings (2 second intervals, 5 minute timeout)
{:ok, active_store} = FileSearchStores.wait_for_active(store.name)

# Or customize polling
{:ok, active_store} = FileSearchStores.wait_for_active(
  store.name,
  poll_interval: 5000,      # Check every 5 seconds
  timeout: 600_000,         # 10 minute timeout
  on_status: fn s ->
    IO.puts("Store state: #{s.state}")
  end
)
```

## Managing Documents

### Importing Already-Uploaded Files

If you've already uploaded a file using the Files API:

```elixir
# Upload a file first
{:ok, file} = Gemini.upload_file("/path/to/document.pdf")

# Import it into the store
{:ok, doc} = FileSearchStores.import_file(
  store.name,
  file.name,
  auth: :vertex_ai
)

# Wait for processing
{:ok, ready_doc} = FileSearchStores.wait_for_document(doc.name)
IO.puts("Document ready with #{ready_doc.chunk_count} chunks")
```

### Direct Upload to Store

Upload and import in one step:

```elixir
{:ok, doc} = FileSearchStores.upload_to_store(
  store.name,
  "/path/to/document.pdf",
  display_name: "Product Manual",
  mime_type: "application/pdf"  # Optional, auto-detected
)
```

### Batch Upload

Upload multiple documents efficiently:

```elixir
files = [
  "/path/to/doc1.pdf",
  "/path/to/doc2.md",
  "/path/to/doc3.txt"
]

# Upload all files
documents =
  Enum.map(files, fn file_path ->
    {:ok, doc} = FileSearchStores.upload_to_store(
      store.name,
      file_path,
      display_name: Path.basename(file_path)
    )
    doc
  end)

# Wait for all to be processed
Enum.each(documents, fn doc ->
  {:ok, _} = FileSearchStores.wait_for_document(doc.name)
end)

IO.puts("All #{length(documents)} documents are ready!")
```

### Checking Document Status

Get detailed document information:

```elixir
{:ok, doc} = FileSearchStores.get_document(
  "fileSearchStores/store123/documents/doc456"
)

case doc.state do
  :active ->
    IO.puts("✓ Document ready with #{doc.chunk_count} chunks")
    IO.puts("  Size: #{doc.size_bytes} bytes")
    IO.puts("  Type: #{doc.mime_type}")

  :processing ->
    IO.puts("⏳ Still processing...")

  :failed ->
    IO.puts("✗ Processing failed: #{inspect(doc.error)}")
end
```

## Querying Stores

### Using Stores in Generation

The primary way to use File Search Stores is through generation requests:

```elixir
{:ok, response} = Gemini.generate_content(
  "What are the main features of the product?",
  tools: [
    %{file_search_stores: [store.name]}
  ]
)

text = Gemini.extract_text!(response)
IO.puts(text)
```

### Multiple Stores

Query across multiple knowledge bases:

```elixir
{:ok, response} = Gemini.generate_content(
  "Compare the pricing models",
  tools: [
    %{file_search_stores: [
      "fileSearchStores/product-docs",
      "fileSearchStores/pricing-info"
    ]}
  ]
)
```

### With Generation Config

Combine with other generation options:

```elixir
{:ok, response} = Gemini.generate_content(
  "Summarize the safety guidelines",
  tools: [%{file_search_stores: [store.name]}],
  temperature: 0.3,
  max_output_tokens: 500,
  model: "gemini-1.5-pro-002"
)
```

### Accessing Source Citations

Check if the response includes grounding metadata:

```elixir
{:ok, response} = Gemini.generate_content(
  "What are the warranty terms?",
  tools: [%{file_search_stores: [store.name]}]
)

# The response may include grounding metadata showing
# which documents were used for the answer
IO.inspect(response, label: "Full Response")
```

## Best Practices

### 1. Descriptive Naming

Use clear, descriptive names for stores and documents:

```elixir
# Good
config = %CreateFileSearchStoreConfig{
  display_name: "Customer Support FAQ - 2024",
  description: "Frequently asked questions for customer support team"
}

# Less helpful
config = %CreateFileSearchStoreConfig{
  display_name: "Store 1"
}
```

### 2. Wait for Processing

Always wait for stores and documents to be active:

```elixir
# Create store
{:ok, store} = FileSearchStores.create(config, auth: :vertex_ai)

# Wait for store
{:ok, store} = FileSearchStores.wait_for_active(store.name)

# Upload document
{:ok, doc} = FileSearchStores.upload_to_store(store.name, path)

# Wait for document
{:ok, doc} = FileSearchStores.wait_for_document(doc.name)

# Now ready to use!
```

### 3. Batch Operations

Upload multiple documents before waiting:

```elixir
# Upload all documents
docs = Enum.map(file_paths, fn path ->
  {:ok, doc} = FileSearchStores.upload_to_store(store.name, path)
  doc
end)

# Then wait for all
Enum.each(docs, fn doc ->
  {:ok, _} = FileSearchStores.wait_for_document(doc.name)
end)
```

### 4. Monitor Store Size

Keep track of document count and total size:

```elixir
{:ok, store} = FileSearchStores.get(store_name)

IO.puts("Documents: #{store.document_count}")
IO.puts("Total size: #{store.total_size_bytes} bytes")

# Set alerts for size limits
if store.total_size_bytes > 10_000_000_000 do
  IO.warn("Store approaching size limit")
end
```

### 5. Organize by Purpose

Create separate stores for different use cases:

```elixir
# Product documentation
{:ok, product_store} = create_store("Product Documentation")

# Customer support
{:ok, support_store} = create_store("Support Knowledge Base")

# Internal policies
{:ok, policy_store} = create_store("Company Policies")
```

### 6. Clean Up Unused Stores

Delete stores you no longer need:

```elixir
# List all stores
{:ok, all_stores} = FileSearchStores.list_all()

# Find old or unused stores
old_stores = Enum.filter(all_stores, fn store ->
  store.document_count == 0 or
  is_older_than_90_days?(store.create_time)
end)

# Delete them
Enum.each(old_stores, fn store ->
  FileSearchStores.delete(store.name, force: true)
end)
```

## Advanced Usage

### Custom Polling Logic

Implement custom waiting logic with callbacks:

```elixir
defmodule StoreManager do
  def create_and_monitor(config) do
    {:ok, store} = FileSearchStores.create(config, auth: :vertex_ai)

    {:ok, ready_store} = FileSearchStores.wait_for_active(
      store.name,
      poll_interval: 3000,
      timeout: 600_000,
      on_status: fn s ->
        Logger.info("Store #{s.name} state: #{s.state}")

        if s.state == :creating do
          notify_slack("Store creation in progress...")
        end
      end
    )

    Logger.info("Store ready!")
    {:ok, ready_store}
  end
end
```

### Parallel Store Creation

Create multiple stores in parallel:

```elixir
store_configs = [
  %CreateFileSearchStoreConfig{display_name: "Store 1"},
  %CreateFileSearchStoreConfig{display_name: "Store 2"},
  %CreateFileSearchStoreConfig{display_name: "Store 3"}
]

# Create all in parallel
tasks = Enum.map(store_configs, fn config ->
  Task.async(fn ->
    {:ok, store} = FileSearchStores.create(config, auth: :vertex_ai)
    {:ok, ready} = FileSearchStores.wait_for_active(store.name)
    ready
  end)
end)

# Wait for all
stores = Enum.map(tasks, &Task.await(&1, 600_000))
IO.puts("Created #{length(stores)} stores!")
```

### Conditional Document Import

Only import documents that meet certain criteria:

```elixir
defmodule DocumentImporter do
  def import_if_valid(store_name, file_path) do
    cond do
      not File.exists?(file_path) ->
        {:error, :file_not_found}

      File.stat!(file_path).size > 50_000_000 ->
        {:error, :file_too_large}

      not valid_mime_type?(file_path) ->
        {:error, :unsupported_type}

      true ->
        FileSearchStores.upload_to_store(
          store_name,
          file_path,
          auth: :vertex_ai
        )
    end
  end

  defp valid_mime_type?(path) do
    ext = Path.extname(path)
    ext in [".pdf", ".txt", ".md", ".html"]
  end
end
```

### Pagination Helper

List all stores with automatic pagination:

```elixir
defmodule StoreUtils do
  def list_all_with_details do
    {:ok, stores} = FileSearchStores.list_all(auth: :vertex_ai)

    Enum.map(stores, fn store ->
      %{
        name: store.name,
        display_name: store.display_name,
        documents: store.document_count,
        size_mb: div(store.total_size_bytes || 0, 1_000_000),
        state: store.state
      }
    end)
  end
end
```

## Error Handling

### Common Errors

```elixir
case FileSearchStores.create(config, auth: :vertex_ai) do
  {:ok, store} ->
    IO.puts("Created: #{store.name}")

  {:error, %{status: 403}} ->
    IO.puts("Permission denied - check IAM roles")

  {:error, %{status: 429}} ->
    IO.puts("Rate limited - retry with backoff")

  {:error, %{status: 404}} ->
    IO.puts("Project not found - check configuration")

  {:error, reason} ->
    IO.puts("Error: #{inspect(reason)}")
end
```

### Timeout Handling

Handle timeouts gracefully:

```elixir
case FileSearchStores.wait_for_active(store.name, timeout: 60_000) do
  {:ok, store} ->
    IO.puts("Store ready!")

  {:error, :timeout} ->
    IO.puts("Store creation is taking longer than expected")
    IO.puts("Check status manually with FileSearchStores.get/2")

  {:error, :store_creation_failed} ->
    IO.puts("Store creation failed - check logs")
end
```

### Retry Logic

Implement retry with exponential backoff:

```elixir
defmodule RetryHelper do
  def create_store_with_retry(config, max_attempts \\ 3) do
    do_create(config, 1, max_attempts)
  end

  defp do_create(config, attempt, max_attempts) do
    case FileSearchStores.create(config, auth: :vertex_ai) do
      {:ok, store} ->
        {:ok, store}

      {:error, %{status: 429}} when attempt < max_attempts ->
        wait_ms = :math.pow(2, attempt) * 1000 |> round()
        IO.puts("Rate limited, waiting #{wait_ms}ms...")
        Process.sleep(wait_ms)
        do_create(config, attempt + 1, max_attempts)

      {:error, reason} ->
        {:error, reason}
    end
  end
end
```

## API Reference

### FileSearchStores Functions

#### `create/2`

```elixir
@spec create(CreateFileSearchStoreConfig.t(), create_opts()) ::
  {:ok, FileSearchStore.t()} | {:error, term()}
```

Create a new file search store.

#### `get/2`

```elixir
@spec get(String.t(), store_opts()) ::
  {:ok, FileSearchStore.t()} | {:error, term()}
```

Retrieve a store by name.

#### `delete/2`

```elixir
@spec delete(String.t(), delete_opts()) :: :ok | {:error, term()}
```

Delete a store. Use `force: true` to delete stores with documents.

#### `list/1`

```elixir
@spec list(list_opts()) ::
  {:ok, ListFileSearchStoresResponse.t()} | {:error, term()}
```

List stores with optional pagination.

#### `list_all/1`

```elixir
@spec list_all(list_opts()) :: {:ok, [FileSearchStore.t()]} | {:error, term()}
```

Retrieve all stores across all pages.

#### `import_file/3`

```elixir
@spec import_file(String.t(), String.t(), import_opts()) ::
  {:ok, FileSearchDocument.t()} | {:error, term()}
```

Import an already-uploaded file into a store.

#### `upload_to_store/3`

```elixir
@spec upload_to_store(String.t(), String.t(), upload_opts()) ::
  {:ok, FileSearchDocument.t()} | {:error, term()}
```

Upload a file and import it into a store in one operation.

#### `wait_for_active/2`

```elixir
@spec wait_for_active(String.t(), wait_opts()) ::
  {:ok, FileSearchStore.t()} | {:error, term()}
```

Poll until store reaches `:active` state.

#### `wait_for_document/2`

```elixir
@spec wait_for_document(String.t(), wait_doc_opts()) ::
  {:ok, FileSearchDocument.t()} | {:error, term()}
```

Poll until document reaches `:active` state.

#### `get_document/2`

```elixir
@spec get_document(String.t(), store_opts()) ::
  {:ok, FileSearchDocument.t()} | {:error, term()}
```

Retrieve document metadata.

### Type Specifications

#### FileSearchStore

```elixir
%FileSearchStore{
  name: String.t(),
  display_name: String.t(),
  description: String.t(),
  state: :state_unspecified | :creating | :active | :deleting | :failed,
  create_time: String.t(),
  update_time: String.t(),
  document_count: integer(),
  total_size_bytes: integer(),
  vector_config: map()
}
```

#### FileSearchDocument

```elixir
%FileSearchDocument{
  name: String.t(),
  display_name: String.t(),
  state: :state_unspecified | :processing | :active | :failed,
  create_time: String.t(),
  update_time: String.t(),
  size_bytes: integer(),
  mime_type: String.t(),
  chunk_count: integer(),
  error: map()
}
```

## See Also

- [Files API Guide](files.md) - For uploading files before importing
- [RAG Concepts](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/retrieval-augmented-generation) - General retrieval-augmented generation patterns
- [Vertex AI Documentation](https://cloud.google.com/vertex-ai/docs) - Official Google Cloud docs