This project investigates the application of Split Learning for Privacy-Preserving Record Linkage, aiming to identify the same entity across different databases without compromising privacy, using Reference sets (publicly available data collections). This method has minimal impact on matching performance compared to a traditional centralized SVM-based approach.
SL_training_data_generator.py: Generates data for training and testing.
SL_test_data_generatror.py: Responsible for the creation of test datasets for Split Learning models.
local_training_data_generator.py: Generates data for local training (no Split Learning and Reference Set here).
local_test_data_generatror.py: Responsible for the creation of test datasets for local models.
tester.py: Tests saved models on datasets.