This repository contains the source code for our work accepted at CCS 2025, titled: Panther: Private Approximate Nearest Neighbor Search in the Single Server Setting
The core source code for Panther is located in the experimental/panther directory.
The codes are still under heavy developments, and should not be used in any security sensitive product.
Our implementation is based on the SPU library, specifically on this commit, which depends on bazel for building the project. We have also integrated two additional libraries:
The setup follows the official build prerequisites for SPU on Linux :
Install gcc>=11.2, cmake>=3.22, ninja, nasm>=2.15, python>=3.9, bazelisk, xxd, lld, numpy
We provide the necessary Bazel commands to build and run Panther in the sections below. These commands have been tested on Ubuntu 22.04 (Linux)
How to use bazel:
We prefer to use bazelisk to install bazel.
- Linux: You can download Bazelisk binary on its Releases page and add it to your
PATHmanually. (e.g. copy it to/usr/local/bin/bazel). - Navigate to the
***/OpenPantherdirectory and run:
bazel --version Then you can run cd OpenPanther and use bazel to build Panther unit test and e2e test.
bazel build //... or bazel build ... to build all targets. SPU is a large, feature-rich library, and building everything will compile many irrelevant components — this process may take a long time.
Troubleshooting: Ninja Not Found
If you encounter the following error:
CMake Error: CMake was unable to find a build program corresponding to "Ninja".Please install ninja using:
sudo apt install ninja-buildThis section describes the structure of the Panther module, located at experimental/panther. The key components are:
-
protocols/: Implementations of core cryptographic subprotocols:-
protocols/customize_pir/: Customized multi-query PIR protocol -
protocols/*.cc: Implementations of other subprotocols (top-$k$, truncation, distance computation, etc.)
-
-
demo/: End-to-end demonstrations of Panther on datasets: Deep10M and SIFT(1M). -
dataset/: The cluster information from the KMeans model and the data required for KNN queries. -
benchmark/: Test files for hybrid building blocks implemented by combining multiple cryptographic subprotocols. Includes randomized end-to-end (e2e) tests. -
k-means/:$k$ -means training, accuracy evaluation, and script for converting the model into the required input format. -
BUILD.bazel: Bazel build configuration file for compiling the Panther framework. -
throttle.sh: Script for network bandwidth throttling, used to simulate different network conditions during performance evaluation.
We provide a random data version and real data version for end-to-end evaluation. The random version is only used in performance test, it lets the user quickly reproduce the performance result without downloading the dataset or
- A 1M dataset requires 64 GB of memory.
- A 10M dataset requires 256 GB of memory.
- If resources are limited, consider running unit tests.
# Sift client
bazel run -c opt //experimental/panther:random_panther_client --copt=-DTEST_SIFT
# Sift server
bazel run -c opt //experimental/panther:random_panther_server --copt=-DTEST_SIFT
# Amazon client
bazel run -c opt //experimental/panther:random_panther_client --copt=-DTEST_AMAZON
# Amazon server
bazel run -c opt //experimental/panther:random_panther_server --copt=-DTEST_AMAZON
# Deep1M client
bazel run -c opt //experimental/panther:random_panther_client --copt=-DTEST_DEEP1M
# Deep1M server
bazel run -c opt //experimental/panther:random_panther_server --copt=-DTEST_DEEP1M
# client
bazel run -c opt //experimental/panther:random_panther_client_10M
# server
bazel run -c opt //experimental/panther:random_panther_server_10MIf you want to run the real data version:
Step 1. Download the target dataset.
Step 2. Download the pretrained convert_model_to_input.py to convert either sift.pth or deep10M.pth into the Panther input format. The converted output will be saved in the directory: experimental/panther/dataset/.
-
✅ Recommendation: Use the pretrained
$k$ -means model, which already includes clustering assignments and centroid information. This is the preferred option, as it saves time and ensures compatibility with Panther’s expected input format. -
⚠️ Alternative: You can also choose to train the$k$ -means model yourself using the provided training code. However, please note that:- The code may lack optimization and could be computationally expensive.
- Training might take a long time, especially on large datasets like SIFT or Deep1B.
- After training, you’ll need to manually tune Panther’s parameters (e.g., number of clusters, posting list configuration) based on your model’s output (e.g., centroids, cluster sizes).
Step 3. Build and run the Panther demo code.
We provide two datasets, which are sourced from ann-benchmarks:
| Dataset | Dimensions | Train size | Test size | Neighbors | Distance | Download |
|---|---|---|---|---|---|---|
| DEEP1B | 96 | 9,990,000 | 10,000 | 100 | Angular | HDF5 (3.6GB) |
| SIFT | 128 | 1,000,000 | 10,000 | 100 | Euclidean | HDF5 (501MB) |
Please download sift.pth and deep10M.pth into experimental/panther/dataset.
We have reproduced the k-means clustering algorithm from SANNS, where the implementation of the k-means clustering component relies on the FAISS library, a commonly used library in ANNS.
We provides the dependencies for dataset processing and plaintext k-means algorithm here.
conda create -n panther python=3.12.2
conda activate panther
conda install numpy==1.26.4
conda install h5py==3.14.0
conda install pytorch==2.3.0
conda install pytorch::faiss-cpuWe recommend using these versions to ensure compatibility and reproducibility, using conda.
Train model (Not required)
# <dataset>: sift or deep10M
python3 ./experimental/panther/k-means/sanns_kmeans.py <dataset> Test model accuracy (Not required)
# <dataset>: sift or deep10M
python3 ./experimental/panther/k-means/accuracy_test.py <dataset> Make sure that the /experimental/panther/dataset directory contains *.pth and *.hdf5 files.
# <dataset>: sift or deep10M
python3 ./experimental/panther/k-means/convert_model_to_input.py <dataset> For the SIFT dataset, the output directory structure will be:
/experimental/panther/dataset/
├── sift/
│ ├── sift.pth
│ ├── sift-128-euclidean.hdf5
│ ├── sift_dataset.txt
│ ├── sift_test.txt
│ ├── sift_centroids.txt
│ ├── sift_neighbors.txt
│ ├── sift_ptoc.txt
│ └── sift_stash.txt
For Sift:
# build the sift demo
# client
bazel build -c opt //experimental/panther:panther_client
# server
bazel build -c opt //experimental/panther:panther_server
# Run the sift demo
# client
bazel run //experimental/panther:panther_client
# server
bazel run //experimental/panther:panther_serverFor Deep1B_10M:
# build the deep1B demo
# client
bazel build -c opt //experimental/panther:panther_client_deep10M
# server
bazel build -c opt //experimental/panther:panther_server_deep10M
# Run the deep1B demo
# client
bazel run //experimental/panther:panther_client_deep10M
# server
bazel run //experimental/panther:panther_server_deep10MOur framework integrates multiple cryptographic subprotocols. We provide unit tests to verify the correctness of each component in isolation. We hope these subprotocols can be easily reused in other work.
In cases where the evaluator has constrained computational resources, it is feasible to test each subprotocol separately. The complete framework is composed of multiple subprotocols, connected via lightweight local computation for intermediate data processing.
Run Unit Test:
# Running Distance Computation Tests
bazel run -c opt //experimental/panther:dist_cmp_test
bazel run -c opt //experimental/panther:dist_cmp_ss_test.# Running Custom Multi-query PIR Test
bazel run -c opt //experimental/panther/protocol/customize_pir:seal_mpir_test# Running SS-based Argmax Test
bazel run -c opt //experimental/panther:batch_min_test
# Running Bitwidth Adjustment Test
bazel run -c opt //experimental/panther:bitwidth_adjust_test# Running GC-based Top-K Test
# follow emp-toolkit style
# requires launching server and client separately
# server
bazel run //experimental/panther:topk_test 1 1111
# client
bazel run //experimental/panther:topk_test 2 1111# Running Mixed (SS and GC) Top-k Test
# requires launching server and client separately
# server
bazel run -c opt //experimental/panther:topk_benchmark -- -rank=1
# client
bazel run -c opt //experimental/panther:topk_benchmark -- -rank=0# Running Running Distance Computation Tests with Truncation
# requires launching server and client separately
# server
bazel run -c opt //experimental/panther:distance_benchmark -- -rank=1
# client
bazel run -c opt //experimental/panther:distance_benchmark -- -rank=0