KEENHash: Hashing Programs into Function-Aware Embeddings for Large-Scale Binary Code Similarity Analysis (ISSTA'25)

Description

The repository provides the KEENHash-generated program embeddings to the binaries from coreutils, diffutils, and findutils which are used in our study and DeepBinDiff. To generate the embeddings for your own binaries, please refer to BinaryAI. You can log in and create an API key to use the binaryai-sdk for the embedding generation.

The directory structure is as follows:

.
├── data    # The directory stores the program embeddings to binaries
└── script  # The directory contains the script for evaluating the similarity between binaries

Note: We also provide the long version of the KEENHash paper.

Environment

You can use the following command of conda to install the required packages and activate the environment:

conda env create -f environment.yaml
conda activate keenhash

Evaluate Similarity between Binary Programs

You can use the following command to evaluate the similarity between two binaries:

python script/evaluate_similarity.py --query <path to the query program embedding>  --repo <path to the repo program embedding> --embedding <KEENHash method> [--lsh <number of hash functions>]

For example:

python script/evaluate_similarity.py --query data/coreutils-5.93-O0_basename.json  --repo data/coreutils-5.93-O3_ln.json --embedding stru

To get more information, you can use the following command:

python script/evaluate_similarity.py -h

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
script		script
.flake8		.flake8
.gitignore		.gitignore
KEENHash_Long_Version.pdf		KEENHash_Long_Version.pdf
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KEENHash: Hashing Programs into Function-Aware Embeddings for Large-Scale Binary Code Similarity Analysis (ISSTA'25)

Description

Environment

Evaluate Similarity between Binary Programs

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Rroscha/KEENHash

Folders and files

Latest commit

History

Repository files navigation

KEENHash: Hashing Programs into Function-Aware Embeddings for Large-Scale Binary Code Similarity Analysis (ISSTA'25)

Description

Environment

Evaluate Similarity between Binary Programs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages