Abstract, extensible framework for benchmarking vector databases and models across different datasets for Image Search and caption generation.
Install just the core framework (no adapters):
pip install imsearch_evalTriton adapters (for Triton Inference Server):
pip install imsearch_eval[triton]Weaviate adapters (includes Triton, as WeaviateAdapter uses TritonModelUtils):
pip install imsearch_eval[weaviate]All adapters:
pip install imsearch_eval[all]Development dependencies:
pip install imsearch_eval[dev]git clone https://github.com/waggle-sensor/imsearch_eval
cd imsearch_eval
pip install -e . # Core only
# Or with extras:
pip install -e ".[triton]"
pip install -e ".[weaviate]"
pip install -e ".[all]"from imsearch_eval import BenchmarkEvaluator
from imsearch_eval.adapters import WeaviateAdapter, TritonModelProvider
import tritonclient.grpc as TritonClient
# Initialize clients
weaviate_client = WeaviateAdapter.init_client(host="127.0.0.1", port="8080")
triton_client = TritonClient.InferenceServerClient(url="triton:8001")
# Create adapters
vector_db = WeaviateAdapter(
weaviate_client=weaviate_client,
triton_client=triton_client
)
model_provider = TritonModelProvider(triton_client=triton_client)
# Use in evaluator (requires a DatasetLoader implementation)
evaluator = BenchmarkEvaluator(
vector_db=vector_db,
model_provider=model_provider,
dataset_loader=dataset_loader, # Your DatasetLoader implementation
collection_name="my_collection",
query_method="clip_hybrid_query"
)The framework is organized into two main components:
- Framework (
imsearch_eval/framework/): Abstract interfaces and evaluation logic (dataset-agnostic) - Adapters (
imsearch_eval/adapters/): Shared concrete implementations for vector databases and models
imsearch_eval/
├── framework/ # Abstract interfaces and evaluation logic
│ ├── interfaces.py # VectorDBAdapter, ModelProvider, Query, DatasetLoader, etc.
│ ├── model_utils.py # ModelUtils abstract interface
│ └── evaluator.py # BenchmarkEvaluator class
│
└── adapters/ # Shared concrete implementations
├── __init__.py # Exports all adapters
├── triton.py # TritonModelProvider, TritonModelUtils
└── weaviate.py # WeaviateAdapter, WeaviateQuery
VectorDBAdapter: Abstract interface for vector databases- Methods:
init_client(),search(),create_collection(),delete_collection(),insert_data(),close()
- Methods:
ModelProvider: Abstract interface for model providers- Methods:
get_embedding(),generate_caption()
- Methods:
Query: Abstract interface for query classes (used by vector DB adapters)- Method:
query(near_text, collection_name, limit, query_method, **kwargs)- Generic query method - Each vector DB implementation can define its own query types via
query_methodparameter
- Method:
ModelUtils: Abstract interface for model utilities (inimsearch_eval.framework.model_utils)- Methods:
calculate_embedding(),generate_caption()
- Methods:
DatasetLoader: Abstract interface for dataset loadersDataLoader: Abstract interface for loading data into vector DBsConfig: Abstract interface for configuration/hyperparametersQueryResult: Container for query results
TritonModelUtils: Triton-based implementation ofModelUtilsinterfaceTritonModelProvider: Triton inference server model provider
Dependencies: tritonclient[grpc]
WeaviateQuery: ImplementsQueryinterface for Weaviate- Generic
query()method routes to specific methods based onquery_methodparameter - Also provides Weaviate-specific methods:
hybrid_query(),colbert_query(),clip_hybrid_query()
- Generic
WeaviateAdapter: ImplementsVectorDBAdapterinterface for Weaviate- Uses
WeaviateQueryinternally for search operations
- Uses
Dependencies: weaviate-client, tritonclient[grpc] (for embedding generation)
BenchmarkEvaluator: Main evaluation class that works with any combination of adapters and dataset loaders- Computes metrics: NDCG, precision, recall, accuracy
- Supports parallel query processing
Your DatasetLoader.load() must return a pandas DataFrame. Column names can differ, but the meaning of the fields below must stay constant because they’re used to compute metrics.
BenchmarkEvaluator gets the required column names from your DatasetLoader:
get_query_column()→ query textget_query_id_column()→ query/group id (unique id for each unique query)get_relevance_column()→ relevance label (1/0)get_metadata_columns()→ optional metadata copied into the per-query stats output
- Query text: The text sent to
VectorDBAdapter.search(...). - Query id: A stable identifier used to group rows belonging to the same query.
- Relevance label: Binary label for each row/item (1 = relevant, 0 = not relevant). Used for precision/recall/NDCG.
- Image: A file path/URL/bytes you use when building embeddings or generating captions (consumed by your
DataLoader/ adapter, not the core evaluator). - Ranking score(s): If your search results include a score column like
rerank_score,clip_score,score, ordistance, the evaluator will use the first one it finds to compute NDCG. - License / rights_holder: Useful when combining datasets, otherwise optional.
- Additional metadata: Any extra fields you want to use for result breakdowns (e.g., animalspecies category). These do not change the metrics; they’re just copied into the results.
-
Import adapters:
from imsearch_eval.adapters import WeaviateAdapter, TritonModelProvider
-
Initialize clients:
import tritonclient.grpc as TritonClient weaviate_client = WeaviateAdapter.init_client(host="127.0.0.1", port="8080") triton_client = TritonClient.InferenceServerClient(url="triton:8001")
-
Create adapters:
vector_db = WeaviateAdapter( weaviate_client=weaviate_client, triton_client=triton_client ) model_provider = TritonModelProvider(triton_client=triton_client)
-
Create dataset loader (you need to implement this):
from imsearch_eval import DatasetLoader import pandas as pd class MyDatasetLoader(DatasetLoader): def load(self, split="test", **kwargs) -> pd.DataFrame: # Load your dataset return dataset_df def get_query_column(self) -> str: return "query" def get_query_id_column(self) -> str: return "query_id" def get_relevance_column(self) -> str: return "relevant" def get_metadata_columns(self) -> list: return ["category", "type"]
-
Create evaluator and run:
from imsearch_eval import BenchmarkEvaluator dataset_loader = MyDatasetLoader() evaluator = BenchmarkEvaluator( vector_db=vector_db, model_provider=model_provider, dataset_loader=dataset_loader, collection_name="my_collection", query_method="clip_hybrid_query" # Query type for WeaviateQuery ) results, stats = evaluator.evaluate_queries(split="test")
The query_method parameter in BenchmarkEvaluator is passed to the Query.query() method:
- For Weaviate:
query_methodcan be"clip_hybrid_query","hybrid_query", or"colbert_query" - For other vector DBs: Implement your own query types in your
Queryimplementation - The
Query.query()method routes to the appropriate implementation based onquery_method
The ModelProvider and ModelUtils interfaces accept model_name parameters:
- Embedding models:
"clip","colbert","align"(for TritonModelProvider) - Caption models:
"gemma3","qwen2_5"(for TritonModelProvider) - Other implementations can define their own model names
-
Create a Query class implementing the
Queryinterface:from imsearch_eval import Query import pandas as pd class MyVectorDBQuery(Query): def query(self, near_text, collection_name, limit=25, query_method="vector", **kwargs): # Implement your query logic # query_method can be "vector", "keyword", "hybrid", etc. return pd.DataFrame(results)
-
Create an adapter implementing
VectorDBAdapter:from imsearch_eval import VectorDBAdapter, QueryResult class MyVectorDBAdapter(VectorDBAdapter): @classmethod def init_client(cls, **kwargs): # Initialize your vector DB client return client def __init__(self, client=None, **kwargs): if client is None: client = self.init_client(**kwargs) self.client = client self.query_instance = MyVectorDBQuery(client) def search(self, query, collection_name, limit=25, query_method="vector", **kwargs): df = self.query_instance.query(query, collection_name, limit, query_method, **kwargs) return QueryResult(df.to_dict('records')) # Implement other required methods...
-
Create ModelUtils implementation (optional but recommended):
from imsearch_eval.framework.model_utils import ModelUtils class MyModelUtils(ModelUtils): def calculate_embedding(self, text, image=None, model_name="default"): # Your embedding implementation return embedding def generate_caption(self, image, model_name="default"): # Your caption generation return caption
-
Create ModelProvider:
from imsearch_eval import ModelProvider class MyModelProvider(ModelProvider): def __init__(self, **kwargs): self.model_utils = MyModelUtils(**kwargs) def get_embedding(self, text, image=None, model_name="default"): return self.model_utils.calculate_embedding(text, image, model_name) def generate_caption(self, image, model_name="default"): return self.model_utils.generate_caption(image, model_name)
Create a loader implementing DatasetLoader:
from imsearch_eval import DatasetLoader
import pandas as pd
class MyDatasetLoader(DatasetLoader):
def load(self, split="test", **kwargs) -> pd.DataFrame:
# Load your dataset
return dataset_df
def get_query_column(self) -> str:
return "query"
def get_query_id_column(self) -> str:
return "query_id"
def get_relevance_column(self) -> str:
return "relevant"
def get_metadata_columns(self) -> list:
return []The framework uses abstract interfaces to ensure consistency and extensibility:
-
Framework defines interfaces (
imsearch_eval.framework.interfaces,imsearch_eval.framework.model_utils):VectorDBAdapter,ModelProvider,Query,ModelUtils,DatasetLoader, etc.- These define the contract that all implementations must follow
-
Adapters implement interfaces (
imsearch_eval.adapters):TritonModelUtilsimplementsModelUtilsTritonModelProviderimplementsModelProviderand usesTritonModelUtilsWeaviateQueryimplementsQueryWeaviateAdapterimplementsVectorDBAdapterand usesWeaviateQuery
-
Users use adapters:
- Import from
imsearch_eval.adapters(e.g.,from imsearch_eval.adapters import WeaviateAdapter, TritonModelProvider) - Use the abstract interfaces, not concrete implementations
- Easy to swap implementations without changing benchmark code
- Import from
from imsearch_eval.adapters import WeaviateQuery
import tritonclient.grpc as TritonClient
# WeaviateQuery implements the Query interface
triton_client = TritonClient.InferenceServerClient(url="triton:8001")
query_instance = WeaviateQuery(weaviate_client, triton_client)
# Use the generic query() method
results = query_instance.query(
near_text="search query",
collection_name="my_collection",
limit=25,
query_method="clip_hybrid_query" # Weaviate-specific query type
)
# Or use Weaviate-specific methods directly
results = query_instance.clip_hybrid_query("search query", "my_collection", limit=25)from imsearch_eval.adapters import TritonModelProvider, TritonModelUtils
import tritonclient.grpc as TritonClient
# TritonModelUtils implements the ModelUtils interface
triton_client = TritonClient.InferenceServerClient(url="triton:8001")
model_utils = TritonModelUtils(triton_client)
# Use the abstract methods
embedding = model_utils.calculate_embedding("text", image=None, model_name="clip")
caption = model_utils.generate_caption(image, model_name="gemma3")
# Or use via ModelProvider
model_provider = TritonModelProvider(triton_client)
embedding = model_provider.get_embedding("text", image=None, model_name="clip")
caption = model_provider.generate_caption(image, model_name="gemma3")- Dataset-Agnostic: Works with any dataset by implementing
DatasetLoader - Extensible: Easy to add new vector databases, models, and datasets
- Abstract Interfaces: Clean separation between evaluation logic and implementations
- Reusable: Framework code can be shared across all benchmarks
- Consistent: Same evaluation metrics and methodology
- Type Safe: Abstract interfaces ensure all implementations provide required functionality
- Flexible: Each implementation can define its own query types and model names