forked from blevesearch/go-faiss
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
Implement community-based diversification using Leiden clustering to ensure search results span different semantic modules of the codebase.
Background
The Python REFRAG implementation uses Leiden algorithm to cluster code into semantic communities, then enforces diversity constraints (max 5 results per community). This prevents all results coming from a single module.
Reference: refrag_ollama.py:2370-2400
Current State
We have basic file/folder diversity (max 2 per file, max 6 per folder), but this is structural, not semantic.
Features
Community Detection
- Use Leiden algorithm to cluster code into semantic communities
- Communities are based on semantic similarity (embedding proximity)
- Provides better module boundaries than file/folder structure
Diversity Enforcement
- Enforce
max_per_communityconstraint (default: 5) - Ensures results span multiple semantic modules
- More effective than file/folder diversity for large codebases
Community Visualization
- Print community map showing top modules
- Display community distribution in search results
- Help users understand codebase structure
Implementation Tasks
- Implement or integrate Leiden clustering algorithm
- Build community graph from chunk embeddings
- Add community metadata to chunks
- Implement diversity enforcement in search
- Add community visualization/summary
API Design
type SearchOptions struct {
// ... existing fields ...
// Community diversity
UseCommunity bool // Enable community-based diversity (default: true)
MaxPerCommunity int // Max results per community (default: 5)
}
type SearchResult struct {
// ... existing fields ...
// Community metadata
CommunityID int // Semantic community ID
CommunityInfo string // Community description
}Benefits
- Better cross-module coverage in search results
- Semantic boundaries vs structural boundaries
- Helps users discover related code in different modules
- Proven effective in Python implementation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels