This project demonstrates collaborative filtering for recommendation systems using three matrix decomposition techniques: Singular Value Decomposition (SVD), CUR decomposition, and PQ decomposition (Matrix Factorization). These techniques help in dimensionality reduction and latent feature extraction, improving the scalability and accuracy of recommendation systems.
- SVD Decomposition: Computes singular values and explores the loss of data with varying latent factors.
- CUR Decomposition: Approximates the original matrix using selected columns and rows, with tunable latent dimensions.
- PQ Matrix Factorization: Learns user and item latent vectors using gradient descent to minimize prediction error.
Before running the code, ensure you have the following dependencies installed:
numpypandasmatplotlibseabornscikit-learn
To install them, you can run:
pip install numpy pandas matplotlib seaborn scikit-learnThe project uses the MovieLens dataset for movie ratings:
- Path: Place the dataset in the
data/ratings.csv. - Structure: The dataset should include
userId,movieId, andratingcolumns.
-
Clone this repository:
git clone https://github.com/yourusername/collaborative-filtering.git cd collaborative-filtering -
Place the dataset file (
ratings.csv) in thedata/ml-latest-small/directory. -
Run the script:
python collaborative_filtering.py
- Displays the top 20 singular values.
- Plots the loss of data against the number of latent factors (
k). - Time taken for SVD decomposition is logged.
- Computes CUR approximation of the matrix.
- Plots the reconstruction loss for varying
k(latent factors). - Time taken for CUR decomposition is logged.
- Performs matrix factorization using gradient descent.
- Logs training and test mean squared errors (MSE).
- Time taken for PQ decomposition is logged.
Top 20 singular values are [9032.38102201 4265.13020478 2962.83432586 2856.37494764 2441.34461236 2269.55931732 2169.8992637 1848.47223494 1701.69413469 1528.15832014 1476.74413397 1449.77168211 1432.00119537 1413.20720491 1319.28764566 1281.82058619 1213.72797731 1203.99401803 1198.55552737 1135.24246017]
Two key plots are generated:
-
Loss vs. Latent Factors (SVD):
- Visualizes data reconstruction loss as latent factors (
k) increase.
- Visualizes data reconstruction loss as latent factors (
-
Loss vs. Latent Factors (CUR):
- Visualizes CUR reconstruction loss for varying
k.
- Visualizes CUR reconstruction loss for varying