Movie Recommendation System

This group project aims to predict missing entries in a movie-user ratings matrix from the MovieLens dataset. This repository implements and compares several matrix completion algorithms for rating prediction on the dataset. The codebase is organized into modular components for data preprocessing, matrix completion algorithms, evaluation metrics, and utilities.

Documents

For context and overview of the work achieved one can refer to these resources :

Short report presenting our main ideas and results (report.pdf)
Slides for the project presentation (slides.pdf)

Structure

src/
├── matrix_completion_methods/ # PCA, Kernel PCA, Factorization, Baseline
├── preprocessing/ # Data preprocessing utilities
├── metrics/ # RMSE and accuracy computation
├── parsing/ # CLI argument parsing
└── tuning.py # Hyperparameter tuning
sandbox/ # Experimental scripts
docs/ # Report, slides, and figures
notebooks/ # Exploratory analysis
generate.py # Main experiment launcher
requirements.txt

Preprocessing

Minimal preprocessing utilities are provided in src/preprocessing/data_preprocessor.py.

The DataPreprocessor supports splitting/fusing rating matrices, filtering sparse users/items, and per-user (or per-item) normalization/denormalization.
Use the --filter_tables and --normalize arguments in generate.py to enable these steps.

Implemented Methods

Matrix completion algorithms implementing different matric completion approaches :

AverageCompletion: Simple baseline method using row/column averages to fill missing values.
MatrixFactorisation: Advanced method using matrix factorization with two different algorithms (Alternating Least Squares and Gradient-based optimization)
IterativePCA: Iterative PCA-based imputation that alternates between estimating missing entries and computing a low-rank PCA reconstruction until convergence.
IterativeKernelPCA: Extension of the previous method, using kernel techniques to try to capture nonlinear behavior in the data.

All methods inherit from a shared MatrixCompletionMethod base class with a unified API:

fit(X_train, mask)
complete(X_train, mask)

Usage

Install dependencies

pip install -r requirements.txt

Run experiments

Matrix Factorization:

python generate.py \
    --method MatrixFactorisation \
    --fitting_algorithm gd \
    --k 20 \
    --n_iter 50 \
    --lambda_reg 0.1 \
    --mu_reg 0.1 \
    --learning_rate_U 0.005 \
    --learning_rate_I 0.005 \
    --filter_tables \
    --min_ratings_user 10 \
    --min_ratings_movies 10 \
    --test_size 0.2 \
    --normalize True \
    --verbose True

Iterative PCA:

python generate.py \
    --method IterativePCA \
    --k 20 \
    --n_iter 30 \
    --filter_tables \
    --min_ratings_user 10 \
    --min_ratings_movies 10 \
    --test_size 0.2 \
    --normalize True \
    --verbose True

Iterative Kernel PCA:

python generate.py \
    --method IterativeKernelPCA \
    --k 20 \
    --gamma 0.1 \
    --alpha 0.1 \
    --n_iter 30 \
    --filter_tables \
    --min_ratings_user 10 \
    --min_ratings_movies 10 \
    --test_size 0.2 \
    --normalize True \
    --verbose True

The script trains on ratings_train.npy, saves the completed rating matrix to output.npy in the working directory and prints RMSE / accuracy on the provided ratings_test.npy.

Tuning

The file tuning.pyis a script for hyper-parameter selection and model validation. It provides basic cross validation and tuning tools such as K-folds and grid search.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Recommendation System

Documents

Structure

Preprocessing

Implemented Methods

Usage

Install dependencies

Run experiments

Matrix Factorization:

Iterative PCA:

Iterative Kernel PCA:

Tuning

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
notebooks		notebooks
sandbox		sandbox
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
generate.py		generate.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Movie Recommendation System

Documents

Structure

Preprocessing

Implemented Methods

Usage

Install dependencies

Run experiments

Matrix Factorization:

Iterative PCA:

Iterative Kernel PCA:

Tuning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages