Skip to content

Kriiiiss/ML_Recommendation_System

Repository files navigation

MovieLens Recommendation System

Python notebooks that clean the MovieLens ratings data, explore the catalog, and train recommender models ranging from bias-only baselines to a LightGBM regressor.

Generated artifacts:

  • ratings_without_timestamp.csv: ratings with the timestamp column dropped.
  • df_movies_final.csv: movie metadata with cleaned titles, extracted year, and genre one-hot columns.
  • df_movies_with_score.csv: merges df_movies_final with per-movie mean/median ratings and interaction counts.

Running the Project

  1. Preprocess metadata: run preprocess.ipynb to create df_movies_final.csv and ratings_without_timestamp.csv.
  2. Build movie-level stats: run Rating.ipynb to create df_movies_with_score.csv.
  3. Explore: run analysis.ipynb for plots on popularity, genre influence, and rating distributions.
  4. Train models: run training.ipynb. It shuffles ratings, splits 60/20/20 (train/valid/test), and fits:
    • Bias-only baseline (user/item biases over global mean).
    • Latent factor model with user/item biases plus K latent dimensions.
    • MLPRegressor using user/movie averages, encoded year, and genre indicators.
    • LightGBMRegressor on the same feature set.

Reported Metrics (MSE)

  • LightGBM on a 5k-user subset: ~0.685 MSE when including the year feature; ~0.688 without year.
  • LightGBM on a 2k-user subset without year: ~0.825 MSE. Further metrics (bias-only and latent factors) are printed in training.ipynb during execution.

Experiment Report

For a concise narrative of the experiments and results, see the PDF report: Exploring User-Movie Interactions and Metadata for Rating Prediction.pdf.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors