This repository provides the code and dataset for our research on patch presence test through patch code localization in binaries.
The project is organized as follows:
-
data_process/Contains scripts for preprocessing, feature generation, and setting configurations. -
saved/Stores experimental results and intermediate files generated during detection tasks. -
org_data.py/Stores the dataset files, binaries, patches. -
gen_cfg.py/Generate control flow graphs for functions in dataset. -
gen_sigs.pyGenerate signatures for patch. -
main.pyThe main entry for experiments.
Before running PLocator, please ensure that your environment is correctly configured:
-
Operating System: The experiments were conducted on Ubuntu 22.04. Although other versions may work, we recommend using Ubuntu 18.04 or later to avoid unexpected issues.
-
IDA Pro: PLocator utilizes IDA Pro 7.5 for extracting binary features. Ensure that IDA Python is set to use Python 3, and install additional required packages for IDA as follows:
pip install -r requirements_ida.txt --target=<ida_path>/python/3
Before running, update the paths in
data_process/settings.py:IDA_PATH = "/path/to/ida" IDA64_PATH = "/path/to/ida64"
-
Python Environment We recommend using Python 3.8 with conda. Create a new environment and activate it:
conda create --name plocator python=3.8 conda activate plocator
Install the necessary dependencies:
pip install -r requirements.txt
-
ctag PLocator uses ctag for extracting source code from repository patches. Make sure to give execution permission:
chmod +x data_process/ctags
The binary dataset is available for download via figshare. After downloading, uncompress the dataset in the root directory. The folder structure is as follows:
org_data/
binaries/
openssl/ # Project directory
X86/ # Architecture directory
O0/ # Optimization level
OpenSSL_1_0_1f # Version
openssl-1.0.1f # Original binary
openssl-1.0.1f.strip # Stripped binary (our target)
dataset_wo_irr.csv # Dataset excluding irrelevant functions (1,090 test cases)
dataset_with_irr.csv # Dataset with irrelevant functions (27,250 test cases)
detect_list_4.csv # This list contains 730 test cases, excluding those used for threshold selection and cases that are specific to certain compilation options.
detect_list_4_rq3.csv # This list includes 15,750 test cases, excluding those used for threshold selection and cases that are specific to certain compilation options.
Important Note: When testing with stripped binaries, it is essential to pre-compute the binary code similarities between the sub-functions called by the vulnerable/fixed functions and those called by the target functions to properly match the
CALLtype anchors. PLocator uses jTrans as the BCSD tool to perform this task. For more details, please refer to jTrans on GitHub.
Follow these steps to generate signatures from patches and binaries:
-
Clone the project repositories (e.g., OpenSSL):
git clone https://github.com/openssl/openssl.git /data/Dataset/source_code/
-
Use
gen_sigs.pyto generate signatures:python gen_sigs.py
This script performs three main tasks:
- Parse patch code from the original commit.
- Match patch code with reference source code versions.
- Identify and generate signatures from binaries.
All generated signatures will be saved in the directory corresponding to the patch.
The Control Flow Graphs (CFGs) are generated based on the test cases in the dataset, including vulnerable, fixed, and irrelevant functions. Use the following command to generate CFGs:
python gen_cfg.py -dataset org_data/detect_list_4_all.csvThis command generates CFGs for the test cases.
The main script for vulnerability detection performs the following tasks:
- Filtering irrelevant functions.
- Matching anchor paths.
- Verifying patch paths.
- Classifying functions.
Run the detection script for RQ1:
python main.py -dataset "org_data/detect_list_4_all.csv" -save "saved/rq1/plocator/"Run the detection script for RQ3:
python main.py -dataset "org_data/detect_list_4_all_rq3.csv" -save "saved/rq3/plocator/"If you find this work useful, please cite our paper:
@article{Dong2025PLocatorFP,
title={PLocator: Fine-Grained Patch Presence Test in Binaries via Patch Code Localization},
author={Chaopeng Dong and Jingdong Guo and Shouguo Yang and Yang Xiao and Yi Li and Hong Li and Zhi Li and Limin Sun},
journal={ACM Transactions on Software Engineering and Methodology},
year={2025},
}For any questions or issues regarding the code or dataset, please feel free to contact:
- Chaopeng Dong: dongchaopeng@iie.ac.cn