A machine learning project that predicts age ratings (e.g., PG, R, TV-MA) for movies and TV shows using metadata and content descriptions from major streaming platforms.
This project applies Multinomial Naive Bayes, a text classification algorithm, to predict age ratings for shows and movies across Netflix, Disney+, Hulu, and Amazon Prime Video. By automating the classification based on descriptions, the model assists in improving viewer safety and content moderation.
Source: Kaggle
π These datasets contain:
title,type,description,cast,genre,release_year,duration,rating, and more- The
ratingcolumn is the target variable
Note: Raw data files are excluded due to size. See data/README.md for download instructions.
- Missing values handled
- Text normalization (lowercasing, punctuation removal, stopword removal)
- Feature selection focused on
descriptionand other key metadata
CountVectorizerandTF-IDFused to transform descriptions into feature vectors
MultinomialNB(fromsklearn.naive_bayes)- Trained on processed description data
- Tested using multiple evaluation metrics
- Accuracy
- Precision
- Recall
- F1 Score
- Confusion Matrix
- Python
- Jupyter Notebook
- Pandas, NumPy
- Scikit-learn