Skip to content

Implement KNNFewShot optimizer #11

@obie

Description

@obie

Overview

Implement KNNFewShot optimizer that dynamically selects demonstrations using k-nearest neighbor search based on input similarity.

Description

Unlike static few-shot selection, KNNFewShot selects the most relevant examples for each input at inference time. This leads to more contextually appropriate demonstrations and better performance.

Key Features to Implement

  • Embedding-based similarity search
  • Dynamic demonstration selection
  • Multiple similarity metrics
  • Efficient nearest neighbor search
  • Demonstration pool management

Implementation Requirements

1. Core Architecture

class KNNFewShot < Base
  def initialize(program:, embedder: nil, k: 5)
    @program = program
    @embedder = embedder || DefaultEmbedder.new
    @k = k
    @demonstration_pool = []
    @embeddings_cache = {}
  end
  
  def compile(dataset, metric)
    # Build demonstration pool with embeddings
    @demonstration_pool = build_pool(dataset, metric)
    
    # Create wrapped module with dynamic selection
    create_knn_modules(@program)
  end
  
  def select_demonstrations(input)
    input_embedding = @embedder.embed(serialize_input(input))
    
    # Find k nearest neighbors
    neighbors = find_nearest_neighbors(input_embedding, @k)
    
    # Return demonstrations
    neighbors.map { |n| n[:demonstration] }
  end
end

2. Embedding Support

class DefaultEmbedder
  def embed(text)
    # Use sentence-transformers via API or local model
    # Alternative: Use OpenAI embeddings
  end
end

class CustomEmbedder
  def initialize(model_name:)
    # Support for custom embedding models
  end
end

3. Similarity Metrics

  • Cosine similarity (default)
  • Euclidean distance
  • Manhattan distance
  • Custom metrics support

4. Efficient Search

  • Use Faiss or Annoy for large pools
  • Caching strategies
  • Batch processing support

Example Usage

# Create KNN optimizer
optimizer = Desiru::Optimizers::KNNFewShot.new(
  program: my_program,
  k: 5,
  embedder: Desiru::Embedders::OpenAI.new,
  similarity_metric: :cosine,
  pool_size: 100
)

# Compile with dataset
optimized_program = optimizer.compile(train_set, metric)

# At inference, demonstrations are selected dynamically
result = optimized_program.forward(
  question: "What is Ruby's creator's philosophy?"
)
# Automatically selects 5 most relevant examples about Ruby/philosophy

Configuration Options

  • k: Number of neighbors to retrieve
  • embedder: Embedding model to use
  • similarity_metric: How to measure similarity
  • pool_size: Maximum demonstration pool size
  • min_similarity: Minimum similarity threshold
  • diversity_penalty: Encourage diverse demonstrations

Advanced Features

  • Hybrid Selection: Combine similarity with diversity
  • Weighted Selection: Weight by similarity score
  • Clustering: Group similar demonstrations
  • Online Updates: Add new demonstrations dynamically

Testing Requirements

  • Test demonstration relevance
  • Performance benchmarks for search
  • Compare with static selection
  • Test different embedding models
  • Edge cases (no similar examples)

Dependencies

Consider:

  • ruby-openai for embeddings
  • numo-narray for vector operations
  • annoy-rb or similar for efficient search

Expected Benefits

  • 10-20% improvement over static few-shot
  • Better handling of diverse inputs
  • Reduced prompt engineering effort
  • Automatic adaptation to input distribution

Priority

Medium - Significant improvement over basic few-shot but requires embedding infrastructure

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions