Skip to content

The official repository of PLocator, accepted by ToSEM 2025.

Notifications You must be signed in to change notification settings

GentleCP/PLocator-public

Repository files navigation

PLocator

This repository provides the code and dataset for our research on patch presence test through patch code localization in binaries.


📁 Project Structure

The project is organized as follows:

  • data_process/ Contains scripts for preprocessing, feature generation, and setting configurations.

  • saved/ Stores experimental results and intermediate files generated during detection tasks.

  • org_data.py/ Stores the dataset files, binaries, patches.

  • gen_cfg.py/ Generate control flow graphs for functions in dataset.

  • gen_sigs.py Generate signatures for patch.

  • main.py The main entry for experiments.


⚙️ Environment Setup

Before running PLocator, please ensure that your environment is correctly configured:

  • Operating System: The experiments were conducted on Ubuntu 22.04. Although other versions may work, we recommend using Ubuntu 18.04 or later to avoid unexpected issues.

  • IDA Pro: PLocator utilizes IDA Pro 7.5 for extracting binary features. Ensure that IDA Python is set to use Python 3, and install additional required packages for IDA as follows:

    pip install -r requirements_ida.txt --target=<ida_path>/python/3

    Before running, update the paths in data_process/settings.py:

    IDA_PATH = "/path/to/ida"
    IDA64_PATH = "/path/to/ida64"
  • Python Environment We recommend using Python 3.8 with conda. Create a new environment and activate it:

    conda create --name plocator python=3.8
    conda activate plocator

    Install the necessary dependencies:

    pip install -r requirements.txt
  • ctag PLocator uses ctag for extracting source code from repository patches. Make sure to give execution permission:

    chmod +x data_process/ctags

🗃️ Dataset

The binary dataset is available for download via figshare. After downloading, uncompress the dataset in the root directory. The folder structure is as follows:

org_data/
  binaries/
    openssl/   # Project directory
      X86/    # Architecture directory
        O0/    # Optimization level
          OpenSSL_1_0_1f    # Version
            openssl-1.0.1f   # Original binary
            openssl-1.0.1f.strip   # Stripped binary (our target)
  dataset_wo_irr.csv  # Dataset excluding irrelevant functions (1,090 test cases)
  dataset_with_irr.csv  # Dataset with irrelevant functions (27,250 test cases)
  detect_list_4.csv  # This list contains 730 test cases, excluding those used for threshold selection and cases that are specific to certain compilation options.
  detect_list_4_rq3.csv  # This list includes 15,750 test cases, excluding those used for threshold selection and cases that are specific to certain compilation options.

Important Note: When testing with stripped binaries, it is essential to pre-compute the binary code similarities between the sub-functions called by the vulnerable/fixed functions and those called by the target functions to properly match the CALL type anchors. PLocator uses jTrans as the BCSD tool to perform this task. For more details, please refer to jTrans on GitHub.


🛠️ Signature Generation

Follow these steps to generate signatures from patches and binaries:

  1. Clone the project repositories (e.g., OpenSSL):

    git clone https://github.com/openssl/openssl.git /data/Dataset/source_code/
  2. Use gen_sigs.py to generate signatures:

    python gen_sigs.py

    This script performs three main tasks:

    • Parse patch code from the original commit.
    • Match patch code with reference source code versions.
    • Identify and generate signatures from binaries.

    All generated signatures will be saved in the directory corresponding to the patch.


🔄 Experiments

1. CFG Generation

The Control Flow Graphs (CFGs) are generated based on the test cases in the dataset, including vulnerable, fixed, and irrelevant functions. Use the following command to generate CFGs:

python gen_cfg.py -dataset org_data/detect_list_4_all.csv

This command generates CFGs for the test cases.

2. Patch Presence Test

The main script for vulnerability detection performs the following tasks:

  1. Filtering irrelevant functions.
  2. Matching anchor paths.
  3. Verifying patch paths.
  4. Classifying functions.

Run the detection script for RQ1:

python main.py -dataset "org_data/detect_list_4_all.csv" -save "saved/rq1/plocator/"

Run the detection script for RQ3:

python main.py -dataset "org_data/detect_list_4_all_rq3.csv" -save "saved/rq3/plocator/"

📜 Citation

If you find this work useful, please cite our paper:

@article{Dong2025PLocatorFP,
  title={PLocator: Fine-Grained Patch Presence Test in Binaries via Patch Code Localization},
  author={Chaopeng Dong and Jingdong Guo and Shouguo Yang and Yang Xiao and Yi Li and Hong Li and Zhi Li and Limin Sun},
  journal={ACM Transactions on Software Engineering and Methodology},
  year={2025},
}

📬 Contact

For any questions or issues regarding the code or dataset, please feel free to contact:

About

The official repository of PLocator, accepted by ToSEM 2025.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published