This project is a Content-Based Movie Recommender System that analyzes movie metadata to suggest similar content. Using the TMDB 5000 Movies dataset, the system processes over 5,000 films to find meaningful connections between genres, cast, crew, and plot descriptions.
To build this engine, I implemented a robust Natural Language Processing (NLP) pipeline:
- Feature Extraction: Combined
overview,genres,keywords,cast(top 3 actors), andcrew(director) into a single "tags" column. - Text Preprocessing: Applied lowercasing and handled special characters to ensure consistency.
- Technique: Used
CountVectorizerfromscikit-learn. - Strategy: Converted text tags into 5,000-dimensional numerical vectors, removing standard English stop words to focus on unique movie identifiers.
- Instead of Euclidean distance, I utilized Cosine Similarity to measure the distance between movie vectors.
- The Logic: In high-dimensional space, the angle between vectors (cosine) is a more accurate representation of content similarity than the straight-line distance.
- Git LFS Integration: The similarity matrix (
similarity.pkl) exceeded standard Git limits. I implemented Git LFS to track and version large model weights seamlessly. - Optimization: Migrated the app from dynamic cloud downloading to local pre-bundled assets, reducing the application boot time by 90%.
- API Integration: Integrated the TMDB API to dynamically fetch movie posters based on ID, enhancing the visual experience.
βββ app.py # Main Streamlit UI & Logic
βββ model.ipynb # Data Analysis, Preprocessing & Model Training
βββ movie_list.pkl # Processed Movie DataFrame
βββ similarity.pkl # Pre-computed Similarity Matrix (via Git LFS)
βββ requirements.txt # Python Dependencies
βββ README.md # Project Documentation
- Clone the repository:
git clone https://github.com/Resham011/Movie-Recommendation-System.git
cd Movie-Recommendation-System- Install Dependencies:
pip install -r requirements.txt- Run the Application:
streamlit run app.py| Platform | Link |
|---|---|
| π€ Hugging Face Spaces | Movie-Recommendation-System |
| βοΈ Streamlit Cloud | Live App |
- Core: Python, Pandas, NumPy
- Machine Learning: Scikit-Learn (CountVectorizer, Cosine Similarity)
- Web Framework: Streamlit
- Version Control: Git & Git LFS
- Hosting: Hugging Face Spaces & Streamlit Cloud
Resham
- πΌ LinkedIn: linkedin.com/in/resham-3b438a281
- π GitHub: github.com/Resham011