This project implements the method proposed by JΓ©gou et al. in their article, "Product Quantization for Nearest Neighbor Search" [1]. It replicates the experimental results presented in the paper and explores alternative strategies that extend and build upon the original approach.
To replicate the experiments, install the dependencies executing the following command:
pip install -r requirements.txtThe datasets can be be downloaded by running the following commands:
# Download the siftsmall dataset
wget ftp://ftp.irisa.fr/local/texmex/corpus/siftsmall.tar.gz -O siftsmall.tar.gz
tar -xvzf siftsmall.tar.gz
# Download the sift dataset
wget ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz -O sift.tar.gz
tar -xvzf sift.tar.gz
# Download the glove dataset
wget https://huggingface.co/stanfordnlp/glove/resolve/main/glove.6B.zip -O glove.6B.zip
unzip glove.6B.zip -d gloveThe project's directory structure includes the following main files and folders:
IR-proj
|ββ gist # stores the gist dataset
|ββ glove # stores the glove dataset
|ββ img # stores images used in notebooks
|ββ results # stores run notebooks
|ββ sift # stores the sift dataset
|ββ siftsmall # stores the siftsmall dataset
|ββ faiss_comparison.ipynb # compares the performance of the faiss library with the implemented method
|ββ fuzzyPQ_experiments.ipynb # experiments with fuzzy product quantization
|ββ large_scale_experiments.ipynb # large scale experiments
|ββ results_comparison.md # comparison of the results obtained in the original article
|ββ search_approaches.py # implementation of the search approaches
|ββ slides.pdf # slides presenting the main findings
|ββ small_scale_experiments.ipynb # small scale experiments
βββ utils.py # implementation of utility functions
[1] HervΓ© JΓ©gou, Matthijs Douze and Cordelia Schmid. "Product Quantization for Nearest Neighbor Search". IEEE transactions on pattern analysis and machine intelligence 33.1 (2010): 117-128.
This project was developed for the βInformation Retrievalβ course at the University of Pisa (a.y. 2024/2025).