-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Labels
enhancementNew feature or requestNew feature or requestmedium-priorityMedium priority featureMedium priority featureoptimizerOptimizer implementationOptimizer implementation
Description
Overview
Implement KNNFewShot optimizer that dynamically selects demonstrations using k-nearest neighbor search based on input similarity.
Description
Unlike static few-shot selection, KNNFewShot selects the most relevant examples for each input at inference time. This leads to more contextually appropriate demonstrations and better performance.
Key Features to Implement
- Embedding-based similarity search
- Dynamic demonstration selection
- Multiple similarity metrics
- Efficient nearest neighbor search
- Demonstration pool management
Implementation Requirements
1. Core Architecture
class KNNFewShot < Base
def initialize(program:, embedder: nil, k: 5)
@program = program
@embedder = embedder || DefaultEmbedder.new
@k = k
@demonstration_pool = []
@embeddings_cache = {}
end
def compile(dataset, metric)
# Build demonstration pool with embeddings
@demonstration_pool = build_pool(dataset, metric)
# Create wrapped module with dynamic selection
create_knn_modules(@program)
end
def select_demonstrations(input)
input_embedding = @embedder.embed(serialize_input(input))
# Find k nearest neighbors
neighbors = find_nearest_neighbors(input_embedding, @k)
# Return demonstrations
neighbors.map { |n| n[:demonstration] }
end
end2. Embedding Support
class DefaultEmbedder
def embed(text)
# Use sentence-transformers via API or local model
# Alternative: Use OpenAI embeddings
end
end
class CustomEmbedder
def initialize(model_name:)
# Support for custom embedding models
end
end3. Similarity Metrics
- Cosine similarity (default)
- Euclidean distance
- Manhattan distance
- Custom metrics support
4. Efficient Search
- Use Faiss or Annoy for large pools
- Caching strategies
- Batch processing support
Example Usage
# Create KNN optimizer
optimizer = Desiru::Optimizers::KNNFewShot.new(
program: my_program,
k: 5,
embedder: Desiru::Embedders::OpenAI.new,
similarity_metric: :cosine,
pool_size: 100
)
# Compile with dataset
optimized_program = optimizer.compile(train_set, metric)
# At inference, demonstrations are selected dynamically
result = optimized_program.forward(
question: "What is Ruby's creator's philosophy?"
)
# Automatically selects 5 most relevant examples about Ruby/philosophyConfiguration Options
k: Number of neighbors to retrieveembedder: Embedding model to usesimilarity_metric: How to measure similaritypool_size: Maximum demonstration pool sizemin_similarity: Minimum similarity thresholddiversity_penalty: Encourage diverse demonstrations
Advanced Features
- Hybrid Selection: Combine similarity with diversity
- Weighted Selection: Weight by similarity score
- Clustering: Group similar demonstrations
- Online Updates: Add new demonstrations dynamically
Testing Requirements
- Test demonstration relevance
- Performance benchmarks for search
- Compare with static selection
- Test different embedding models
- Edge cases (no similar examples)
Dependencies
Consider:
ruby-openaifor embeddingsnumo-narrayfor vector operationsannoy-rbor similar for efficient search
Expected Benefits
- 10-20% improvement over static few-shot
- Better handling of diverse inputs
- Reduced prompt engineering effort
- Automatic adaptation to input distribution
Priority
Medium - Significant improvement over basic few-shot but requires embedding infrastructure
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestmedium-priorityMedium priority featureMedium priority featureoptimizerOptimizer implementationOptimizer implementation