🎬 Movie Recommender System

📖 Overview

This project is a Content-Based Movie Recommender System that analyzes movie metadata to suggest similar content. Using the TMDB 5000 Movies dataset, the system processes over 5,000 films to find meaningful connections between genres, cast, crew, and plot descriptions.

🧠 Machine Learning & NLP Workflow

To build this engine, I implemented a robust Natural Language Processing (NLP) pipeline:

1. Data Engineering

Feature Extraction: Combined overview, genres, keywords, cast (top 3 actors), and crew (director) into a single "tags" column.
Text Preprocessing: Applied lowercasing and handled special characters to ensure consistency.

2. Vectorization (Bag of Words)

Technique: Used CountVectorizer from scikit-learn.
Strategy: Converted text tags into 5,000-dimensional numerical vectors, removing standard English stop words to focus on unique movie identifiers.

3. Similarity Measurement (Cosine Similarity)

Instead of Euclidean distance, I utilized Cosine Similarity to measure the distance between movie vectors.
The Logic: In high-dimensional space, the angle between vectors (cosine) is a more accurate representation of content similarity than the straight-line distance.

🏗️ Engineering & Deployment Challenges

Git LFS Integration: The similarity matrix (similarity.pkl) exceeded standard Git limits. I implemented Git LFS to track and version large model weights seamlessly.
Optimization: Migrated the app from dynamic cloud downloading to local pre-bundled assets, reducing the application boot time by 90%.
API Integration: Integrated the TMDB API to dynamically fetch movie posters based on ID, enhancing the visual experience.

📂 Project Structure

├── app.py                # Main Streamlit UI & Logic
├── model.ipynb           # Data Analysis, Preprocessing & Model Training
├── movie_list.pkl        # Processed Movie DataFrame
├── similarity.pkl        # Pre-computed Similarity Matrix (via Git LFS)
├── requirements.txt      # Python Dependencies
└── README.md             # Project Documentation

🚀 How to Run Locally

Clone the repository:

git clone https://github.com/Resham011/Movie-Recommendation-System.git
cd Movie-Recommendation-System

Install Dependencies:

pip install -r requirements.txt

Run the Application:

streamlit run app.py

🌐 Live Demos

Platform	Link
🤗 Hugging Face Spaces	Movie-Recommendation-System
☁️ Streamlit Cloud	Live App

🛠️ Tech Stack

Core: Python, Pandas, NumPy
Machine Learning: Scikit-Learn (CountVectorizer, Cosine Similarity)
Web Framework: Streamlit
Version Control: Git & Git LFS
Hosting: Hugging Face Spaces & Streamlit Cloud

👤 Author

Resham

💼 LinkedIn: linkedin.com/in/resham-3b438a281
🐙 GitHub: github.com/Resham011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Movie Recommender System

📖 Overview

🧠 Machine Learning & NLP Workflow

1. Data Engineering

2. Vectorization (Bag of Words)

3. Similarity Measurement (Cosine Similarity)

🏗️ Engineering & Deployment Challenges

📂 Project Structure

🚀 How to Run Locally

🌐 Live Demos

🛠️ Tech Stack

👤 Author

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
model.ipynb		model.ipynb
movie_list.pkl		movie_list.pkl
requirements.txt		requirements.txt
similarity.pkl		similarity.pkl

Folders and files

Latest commit

History

Repository files navigation

🎬 Movie Recommender System

📖 Overview

🧠 Machine Learning & NLP Workflow

1. Data Engineering

2. Vectorization (Bag of Words)

3. Similarity Measurement (Cosine Similarity)

🏗️ Engineering & Deployment Challenges

📂 Project Structure

🚀 How to Run Locally

🌐 Live Demos

🛠️ Tech Stack

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages