This README file provides instructions for reproducing the experimental results in the paper "Suda: An Efficient and Secure Unbalanced Data Alignment Framework for Vertical Privacy-Preserving Machine Learning" (Usenix Security 2025).
Intel(R) Xeon(R) Platinum 8260 ++
memory 500 GB ++
Debian GNU/Linux 10 (buster) ++
gcc 11.5.0++
clang 18.1.8++
cd src/third_party
bash gmp.get
bash ntl.get
bash libsodium.get
bash seal.get
bash pailliercryptolib.get
bash volepsi.get
bash setup.sh
cmake -S . -B build
cmake --build build -j
When executing this script, you may face an error about 'pip install sklearn', and you can resolve it by setting this env variable to true
export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True
After installing all the dependencies and third-party libraries, you can run the following command to run a simple functionality test. In this command, 20 refers to the larger data size test_ps.txt and test_pc.txt.
./build/bin/psi_to_share_test 20 1024 100 0 test_ps.txt & ./build/bin/psi_to_share_test 20 1024 100 1 test_pc.txt
Evaluate the efficiency of Suda over different data settings. The results are illustrated in Table 2, Table 3, Table 4 and part of Table 1 (secure data alignment part) in the paper.
Run the following script:
bash run_psi_to_share_test_size.sh
The running results are stored in the result/data_size_ps.txt and result/data_size_pc.txt.
Run the following script:
bash run_psi_to_share_test_payload.sh
The running results are stored in the result/feature_dimension_ps.txt and result/feature_dimension_pc.txt.
Run the following script:
bash run_psi_to_share_test_interratio.sh
The running results are stored in the result/intersection_size_ps.txt and result/intersection_size_pc.txt.
Firstly, download and preprocess the datasets:
cd python
bash preprocess_dataset.sh
Then run the following script:
bash run_psi_to_share_using_files_test.sh
The running results are stored in the result/CFIX_ps.txt, result/CFIX_pc.txt for the Character Font Images dataset, and result/SVHN_ps.txt, result/SVHN_pc.txt for the SVHN dataset.
Evaluate the performance of secure training using the outputs of secure data alignment. The results are illustrated in part of Table 1 (secure training part).
After running the scripts in Part of Table 1 (secure data alignment part), run the following script:
bash run_mpclr.sh
Run the following script:
bash run_batchpir_test.sh
The running results are stored in the result/batch_pir.txt.