Repository for the project of the course "Algorithms for Massive Datasets".
The aim of this project is the implementation from scratch of a system to find frequent item-sets in a movie database.
In particular the chosen algorithm is the parallel version of SON algorithm applied with A-priori using SPARK.
The dataset used to perform this analysis is letterboxd (available on Kaggle), which is a collection of movies and tv shows obtained from the website letterboxd.com.