GitHub - gt-sse-center/iNatInqPerf

Project:

Package:

Development:

Overview

This project provides a modular benchmark pipeline for experimenting with different vector databases (FAISS, Qdrant, …).
It runs end-to-end:

Download → Hugging Face dataset (optionally export images + manifest)
Embed → Generate CLIP embeddings for images
Build → Construct indexes with multiple VectorDBs
Search → Profile queries (latency + Recall@K vs exact baseline)
Update → Test insertions & deletions (index maintenance)

All steps are run with uv as the package manager.

How to use `iNatInqPerf`

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Setup environment
uv venv .venv && source .venv/bin/activate
uv sync

# Run an end-to-end benchmark (FAISS IVF+PQ vectordb) on the INQUIRE dataset.
uv run python scripts/run_benchmark.py configs/inquire_benchmark.yaml

# Spin up a 3-node Weaviate cluster (shared Docker network + RAFT) and run the benchmark.
uv run python scripts/run_benchmark.py configs/inquire_benchmark_weaviate_cluster.yaml

# Spin up a 3-node Qdrant cluster (HTTP+gRPC+p2p) and run the benchmark.
uv run python scripts/run_benchmark.py configs/inquire_benchmark_qdrant_cluster.yaml

Distributed VectorDB Deployments

Benchmark-managed clusters. The configs/inquire_benchmark_weaviate_cluster.yaml and configs/inquire_benchmark_qdrant_cluster.yaml files include the container descriptions that container_context will launch automatically before each run. Make sure no identically named containers are already running, otherwise Docker will raise a name-conflict error.

The benchmarking code will

Download the specified dataset from the HuggingFace website.
Embed the images using a CLIP model.
Build a vector database index.
Perform a search for given queries to obtain query latency, and compute Recall@K vs FAISS Flat baseline..
Update the index.

Dataset Output Structure

data/raw/
  dataset_info.json
  state.json
  data-00000-of-00001.arrow
  images/
    00000000.jpg
    00000001.jpg
    ...
  images/manifest.csv   # [index,filename,label]

Supported Vector Databases

faiss.flat (exact)
faiss.ivfpq (IVF + OPQ + PQ)

Profiling Outputs

Latency statistics (avg, p50, p95)
Recall@K vs baseline
JSON metrics in .results/

Profiling with py-spy

Use py-spy to record flamegraphs during any step:

bash scripts/pyspy_run.sh search-faiss -- python src/inatinqperf/benchmark/benchmark.py search --vectordb faiss.ivfpq --hf_dir data/emb_hf --topk 10 --queries src/inatinqperf/benchmark/queries.txt

Outputs:

.results/search-faiss.svg (flamegraph)
.results/search-faiss.speedscope.json

Installation

Installation Method	Command
Via uv	`uv add inatinqperf`
Via pip	`pip install inatinqperf`

Development

Please visit Contributing and Development for information on contributing to this project.

Additional Information

Additional information can be found at these locations.

Title	Document	Description
Code of Conduct	CODE_OF_CONDUCT.md	Information about the norms, rules, and responsibilities we adhere to when participating in this open source community.
Contributing	CONTRIBUTING.md	Information about contributing to this project.
Development	DEVELOPMENT.md	Information about development activities involved in making changes to this project.
Governance	GOVERNANCE.md	Information about how this project is governed.
Maintainers	MAINTAINERS.md	Information about individuals who maintain this project.
Security	SECURITY.md	Information about how to privately report security issues associated with this project.

License

iNatInqPerf is licensed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
.github		.github
configs		configs
scripts		scripts
src		src
tests		tests
.copier-answers.yml		.copier-answers.yml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
BENCHMARKS.md		BENCHMARKS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contents

Overview

How to use `iNatInqPerf`

Distributed VectorDB Deployments

Dataset Output Structure

Supported Vector Databases

Profiling Outputs

Profiling with py-spy

Installation

Development

Additional Information

License

About

Uh oh!

Releases 30

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Contents

Overview

How to use iNatInqPerf

Distributed VectorDB Deployments

Dataset Output Structure

Supported Vector Databases

Profiling Outputs

Profiling with py-spy

Installation

Development

Additional Information

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 30

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

How to use `iNatInqPerf`

Packages