Skip to content

Commit 82bc979

Browse files
authored
Merge pull request #9 from topcue/opti
Add compiler versions and optimization level option
2 parents e2859dd + e7c4c8c commit 82bc979

2 files changed

Lines changed: 113 additions & 23 deletions

File tree

README.md

Lines changed: 75 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,31 @@
1-
# Description
1+
# BinKit 2.0
2+
23
BinKit is a binary code similarity analysis (BCSA) benchmark. BinKit provides
34
scripts for building a cross-compiling environment, as well as the compiled
4-
dataset. The original dataset includes 1,352 distinct combinations of compiler
5-
options of 8 architectures, 5 optimization levels, and 13 compilers. We
6-
currently tested this code in Ubuntu 16.04.
5+
dataset. The current dataset includes 1,904 distinct combinations of compiler
6+
options of 8 architectures, 6 optimization levels, and 23 compilers. It includes
7+
371,928 binaries.
8+
9+
The main improvements of the latest version of BinKit compared to the paper
10+
version of BinKit are as follows: Additional support for relatively newer
11+
compiler versions for major compilation options, and support for Ofast
12+
optimization option.
13+
14+
In particular, BinKit now includes GCC and Clang versions up to 11 and 13,
15+
respectively. Currently, a total of 6 optimization options (O0, O1, O2, O3, Os,
16+
Ofast) are supported. see the [Currently supported compile
17+
options](https://github.com/SoftSec-KAIST/BinKit#currently-supported-compile-options)
18+
section below for more detailed options.
19+
20+
In Binkit 2.0 dataset, the gsl package misses 8 binaries with Ofast option due
21+
to compiler bugs. See the [Missing binaries](https://github.com/SoftSec-KAIST/BinKit#Missing-binaries)
22+
part of the [Issues](https://github.com/topcue/tmp#issues) section for more
23+
information.
24+
25+
## BinKit 1.0 (paper version)
26+
The original dataset includes 1,352 distinct combinations of compiler options of
27+
8 architectures, 5 optimization levels, and 13 compilers. It includes 243,128
28+
binaries. We tested this code in Ubuntu 16.04.
729

830
For more details, please check [our
931
paper](https://0xdkay.me/pub/2020/kim-arxiv2020.pdf).
@@ -19,7 +41,13 @@ You can download our dataset and toolchain as below. The link will be changed to
1941
[//]: # (Cloning this repository also downloads below pre-compiled dataset and toolchain
2042
with `git-lfs`. Please use `GIT_LFS_SKIP_SMUDGE=1` to skip the download.)
2143

22-
### Dataset
44+
### Dataset (latest version)
45+
46+
- [BinKit 2.0 dataset](https://drive.google.com/file/d/1TrjFnv6BMpVEXYukVxrhlQ78S0NPKEXa/view?usp=share_link)
47+
48+
### Dataset (old)
49+
Below datasets are for reproduction of paper
50+
2351
- [Normal dataset](https://drive.google.com/file/d/1K9ef-OoRBr0X5u8g2mlnYqh9o1i6zFij/view?usp=sharing)
2452
- [SizeOpt dataset](https://drive.google.com/file/d/1QgwbEfd8vdzg5glNZFL7dg4l4hrkoWO3/view?usp=sharing)
2553
- [Noinline dataset](https://drive.google.com/file/d/1wt7GY-DDp8J_2zeBBVUrcfWIyerg_xLO/view?usp=sharing)
@@ -63,23 +91,35 @@ Below data is only used for our evaluation.
6391
- O2
6492
- O3
6593
- Os
94+
- Ofast
6695

6796
### Compilers
68-
- gcc-4.9.4
69-
- gcc-5.5.0
70-
- gcc-6.4.0
71-
- gcc-7.3.0
72-
- gcc-8.2.0
73-
- clang-4.0
74-
- clang-5.0
75-
- clang-6.0
76-
- clang-7.0
77-
- clang-8.0
78-
- clang-9.0
79-
- clang-obfus-fla (Obfuscator-LLVM - FLA)
80-
- clang-obfus-sub (Obfuscator-LLVM - SUB)
81-
- clang-obfus-bcf (Obfuscator-LLVM - BCF)
82-
- clang-obfus-all (Obfuscator-LLVM - FLA + SUB + BCF)
97+
- gcc
98+
- gcc-4.9.4
99+
- gcc-5.5.0
100+
- gcc-6.4.0
101+
- gcc-6.5.0
102+
- gcc-7.3.0
103+
- gcc-8.2.0
104+
- gcc-9.4.0
105+
- gcc-10.3.0
106+
- gcc-11.2.0
107+
- clang
108+
- clang-4.0.0
109+
- clang-5.0.2
110+
- clang-6.0.1
111+
- clang-7.0.1
112+
- clang-8.0.0
113+
- clang-9.0.1
114+
- clang-10.0.1
115+
- clang-11.0.1
116+
- clang-12.0.1
117+
- clang-13.0.0
118+
- clang-obfus
119+
- clang-obfus-fla (Obfuscator-LLVM - FLA)
120+
- clang-obfus-sub (Obfuscator-LLVM - SUB)
121+
- clang-obfus-bcf (Obfuscator-LLVM - BCF)
122+
- clang-obfus-all (Obfuscator-LLVM - FLA + SUB + BCF)
83123

84124
# How to use
85125
### 1. Configure the environment in `scripts/env.sh`
@@ -126,7 +166,7 @@ You can download the source code of GNU packages of your interest as below.
126166
- You must give *ABSOLUTE PATH* for `--base_dir`.
127167

128168
```bash
129-
$ source scripts/env
169+
$ source scripts/env.sh
130170
$ python gnu_compile_script.py \
131171
--base_dir "/home/dongkwan/binkit/dataset/gnu" \
132172
--num_jobs 8 \
@@ -137,7 +177,7 @@ $ python gnu_compile_script.py \
137177
You can compile only the packages or compiler options of your interest as below.
138178

139179
```bash
140-
$ source scripts/env
180+
$ source scripts/env.sh
141181
$ python gnu_compile_script.py \
142182
--base_dir "/home/dongkwan/binkit/dataset/gnu" \
143183
--num_jobs 8 \
@@ -148,7 +188,7 @@ $ python gnu_compile_script.py \
148188
You can check the compiled binaries as below.
149189

150190
```bash
151-
$ source scripts/env
191+
$ source scripts/env.sh
152192
$ python compile_checker.py \
153193
--base_dir "/home/dongkwan/binkit/dataset/gnu" \
154194
--num_jobs 8 \
@@ -194,6 +234,16 @@ $ python gnu_compile_script.py \
194234
If compilation fails, you may have to adjust the number of jobs for parallel
195235
processing in the step 1, which is machine-dependent.
196236

237+
### Missing binaries
238+
239+
In Binkit 2.0 dataset, the gsl package misses 8 binaries with Ofast option due
240+
to compiler bugs. Clang-8 and clang-9 induce compiler hang bug when compiling
241+
the gsl package for 32bit ARM with Ofast option. We reported this issue to
242+
bug-gsl and llvm-project respectively. However, bug-gsl did not reply, and the
243+
llvm-project replied that these versions are not currently supported. The bug
244+
reporting links are respectively as follows:
245+
[bug-gsl](https://lists.gnu.org/archive/html/bug-gsl/2023-02/msg00000.html),
246+
[llvm-project](https://github.com/llvm/llvm-project/issues/60692)
197247

198248
# Authors
199249
This project has been conducted by the below authors at KAIST.
@@ -218,3 +268,5 @@ paper](https://ieeexplore.ieee.org/document/9813408) when using BinKit.
218268
doi={10.1109/TSE.2022.3187689}
219269
}
220270
```
271+
272+

config/extened_opti.yml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
opti:
2+
- O0
3+
- O1
4+
- O2
5+
- O3
6+
- Os
7+
- Ofast
8+
9+
arch:
10+
- x86_32
11+
- x86_64
12+
- arm_32
13+
- arm_64
14+
- mips_32
15+
- mips_64
16+
- mipseb_32
17+
- mipseb_64
18+
19+
compiler:
20+
- gcc-4.9.4
21+
- gcc-5.5.0
22+
- gcc-6.5.0
23+
- gcc-7.3.0
24+
- gcc-8.2.0
25+
- gcc-9.4.0
26+
- gcc-10.3.0
27+
- gcc-11.2.0
28+
- clang-4.0
29+
- clang-5.0
30+
- clang-6.0
31+
- clang-7.0
32+
- clang-8.0
33+
- clang-9.0
34+
- clang-10.0
35+
- clang-11.0
36+
- clang-12.0
37+
- clang-13.0
38+

0 commit comments

Comments
 (0)