Classification of WebAssembly Binaries Using Machine Learning

This repository contains the code for my Master's thesis at KTH Royal Institute of Technology, exploring machine learning approaches for classifying vulnerabilities in WebAssembly binaries.

Read the full thesis on DiVA

Overview

WebAssembly (Wasm) is increasingly used beyond the browser in server-side applications and services. While Wasm provides sandboxing, vulnerabilities from source languages like C/C++ can still be exploited within the sandbox. This project develops ML-based classifiers to automatically detect vulnerability patterns in Wasm binaries.

The dataset used is based on the Juliet Test Suite, compiled to WebAssembly.

Approaches

ROCKET-inspired (Primary Approach)

Located in code/models/ROCKET-inspired/

Adapts the ROCKET (RandOm Convolutional KErnel Transform) time-series classification method for binary analysis. Random convolutional kernels extract features from tokenized Wasm instructions, which are then classified using a linear classifier.

Graph-based (GCN)

Located in code/models/graph-based/

Uses Graph Convolutional Networks on control-flow graphs extracted from Wasm binaries. Includes experimental variants:

dual-layered/ - Hierarchical graph approach
MLP-based/ - MLP aggregation variant

Project Structure

code/
├── embedding/          # Word2Vec embedding for Wasm instructions
├── preprocessing/      # Data preprocessing and tokenization
└── models/
    ├── ROCKET-inspired/    # ROCKET-based classification
    └── graph-based/        # GCN-based classification

Requirements

Python 3.12+
PyTorch
torch-geometric (for GCN models)
NumPy, Pandas, Matplotlib, Seaborn
scikit-learn
Gensim (for Word2Vec)
Numba
tqdm

Additional Resources

RESEARCH_NOTES.md - Literature review and research notes
project-description.md - Initial project description
images/ - Conceptual diagrams

Author

Wilhelm Jansson

License

This project was developed as part of a Master's thesis at KTH Royal Institute of Technology.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.vscode		.vscode
code		code
images		images
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
RESEARCH_NOTES.md		RESEARCH_NOTES.md
project-description.md		project-description.md
project-plan.pdf		project-plan.pdf
project-plan.tex		project-plan.tex
pyvenv.cfg		pyvenv.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification of WebAssembly Binaries Using Machine Learning

Overview

Approaches

ROCKET-inspired (Primary Approach)

Graph-based (GCN)

Project Structure

Requirements

Additional Resources

Author

License

About

Uh oh!

Releases

Packages

Languages

WilleMahMille/Master-thesis

Folders and files

Latest commit

History

Repository files navigation

Classification of WebAssembly Binaries Using Machine Learning

Overview

Approaches

ROCKET-inspired (Primary Approach)

Graph-based (GCN)

Project Structure

Requirements

Additional Resources

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages