This repository contains the code for my Master's thesis at KTH Royal Institute of Technology, exploring machine learning approaches for classifying vulnerabilities in WebAssembly binaries.
WebAssembly (Wasm) is increasingly used beyond the browser in server-side applications and services. While Wasm provides sandboxing, vulnerabilities from source languages like C/C++ can still be exploited within the sandbox. This project develops ML-based classifiers to automatically detect vulnerability patterns in Wasm binaries.
The dataset used is based on the Juliet Test Suite, compiled to WebAssembly.
Located in code/models/ROCKET-inspired/
Adapts the ROCKET (RandOm Convolutional KErnel Transform) time-series classification method for binary analysis. Random convolutional kernels extract features from tokenized Wasm instructions, which are then classified using a linear classifier.
Located in code/models/graph-based/
Uses Graph Convolutional Networks on control-flow graphs extracted from Wasm binaries. Includes experimental variants:
dual-layered/- Hierarchical graph approachMLP-based/- MLP aggregation variant
code/
├── embedding/ # Word2Vec embedding for Wasm instructions
├── preprocessing/ # Data preprocessing and tokenization
└── models/
├── ROCKET-inspired/ # ROCKET-based classification
└── graph-based/ # GCN-based classification
- Python 3.12+
- PyTorch
- torch-geometric (for GCN models)
- NumPy, Pandas, Matplotlib, Seaborn
- scikit-learn
- Gensim (for Word2Vec)
- Numba
- tqdm
RESEARCH_NOTES.md- Literature review and research notesproject-description.md- Initial project descriptionimages/- Conceptual diagrams
Wilhelm Jansson
This project was developed as part of a Master's thesis at KTH Royal Institute of Technology.