Skip to content

WilleMahMille/Master-thesis

Repository files navigation

Classification of WebAssembly Binaries Using Machine Learning

This repository contains the code for my Master's thesis at KTH Royal Institute of Technology, exploring machine learning approaches for classifying vulnerabilities in WebAssembly binaries.

Read the full thesis on DiVA

Overview

WebAssembly (Wasm) is increasingly used beyond the browser in server-side applications and services. While Wasm provides sandboxing, vulnerabilities from source languages like C/C++ can still be exploited within the sandbox. This project develops ML-based classifiers to automatically detect vulnerability patterns in Wasm binaries.

The dataset used is based on the Juliet Test Suite, compiled to WebAssembly.

Approaches

ROCKET-inspired (Primary Approach)

Located in code/models/ROCKET-inspired/

Adapts the ROCKET (RandOm Convolutional KErnel Transform) time-series classification method for binary analysis. Random convolutional kernels extract features from tokenized Wasm instructions, which are then classified using a linear classifier.

Graph-based (GCN)

Located in code/models/graph-based/

Uses Graph Convolutional Networks on control-flow graphs extracted from Wasm binaries. Includes experimental variants:

  • dual-layered/ - Hierarchical graph approach
  • MLP-based/ - MLP aggregation variant

Project Structure

code/
├── embedding/          # Word2Vec embedding for Wasm instructions
├── preprocessing/      # Data preprocessing and tokenization
└── models/
    ├── ROCKET-inspired/    # ROCKET-based classification
    └── graph-based/        # GCN-based classification

Requirements

  • Python 3.12+
  • PyTorch
  • torch-geometric (for GCN models)
  • NumPy, Pandas, Matplotlib, Seaborn
  • scikit-learn
  • Gensim (for Word2Vec)
  • Numba
  • tqdm

Additional Resources

  • RESEARCH_NOTES.md - Literature review and research notes
  • project-description.md - Initial project description
  • images/ - Conceptual diagrams

Author

Wilhelm Jansson

License

This project was developed as part of a Master's thesis at KTH Royal Institute of Technology.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages