This project is a content-based movie recommender built from The Movies Dataset on Kaggle (rounakbanik).
The system loads the raw CSVs, cleans and normalizes the metadata, stores everything in a structured SQLite database, and builds sparse feature vectors for each movie using TF-IDF (keywords), multilabel encodings (genres, companies), and one-hot encoding (collections).
Cosine similarity is used to score intra movie relationships or generate a user profile from multiple inputs and deliver recommendations based on the profile.
A FastAPI backend serves the recommender, handling all interaction through a frontend UI and API endpoints.
The goal of this project was to build a clean, modular recommendation engine that mirrors how real systems are designed. The data pipeline (data ingestion, preprocessing, database modeling, feature generation, and serving layer) was modeled to resemble what you’d see in an actual production setup. To show how these kinds of systems may be used in real business environments, the recommender is wrapped in a full-stack application and deployed via Render.
-
Content-based recommendations built from movie metadata including keywords, genres, collections, and production companies.
-
TF-IDF keyword vectors combined with multilabel and one hot encoded features to represent each movie in a sparse vector space.
-
User profile generation that averages feature vectors from multiple selected movies to create a personalized recommendation baseline.
-
Cosine similarity scoring to measure how closely movies match the user profile or each other.
-
Normalized relational (SQLite) database with purposeful many to many relationships for genres, companies, collections, and keywords.
-
FastAPI backend that serves recommendations, handles movie search, and drives the frontend UI.
-
Similarity matrix caching in the validation pipeline to speed up repeated evaluation.
-
Precision at 5 validation tools to evaluate the quality of recommendations.
-
Deployed via Render as a full stack application.
https://content-based-recommendation-system-ofm5.onrender.com/
The system follows a standard pipeline used in metadata-driven similarity models. The workflow moves from raw metadata, through preprocessing and structured storage, into feature generation and vector similarity computation, and finally into the serving layer that presents similarity rankings to users in an easily digestible format.
Raw CSV Data
↓
Preprocessing and Normalization
↓
SQLite Database (movies, genres, collections, companies, keywords)
↓
Feature Engineering (TF-IDF, multilabel, one hot encoding)
↓
Combined Sparse Feature Matrix
↓
Recommender Engine (cosine similarity and user profiling)
↓
FastAPI Backend (search and recommend endpoints)
↓
Frontend UI (movie selection and results)
↓
Render Deployment
-
Python 3.12 – Core language used for data preprocessing, feature engineering, similarity computation, and application logic.
-
FastAPI – Backend framework that exposes search and recommendation endpoints and serves the frontend interface.
-
SQLite – Structured relational database used to store cleaned movie metadata and many-to-many relationships.
-
Pandas / NumPy – Used for data loading, cleaning, numerical operations, and DataFrame manipulation.
-
Scikit-learn – Provides TF-IDF vectorization, multilabel encoders, one-hot encoding, and cosine similarity utilities.
-
SciPy – Powers sparse matrix operations and efficient vector space handling.
-
Jinja2 Templates – Renders UI pages for interacting with the recommender.
-
Render – Deployment platform hosting the full-stack application.
git clone https://github.com/swish0621/Content-Based-Recommendation-System.git
cd Content-Based-Recommendation-System
python3 -m venv venv
source venv/bin/activate # macOS / Linux
# or
venv\Scripts\activate # Windows
pip install -r requirements.txt
python -m db
python -m validation.validation
uvicorn main:app --reload
http://127.0.0.1:8000
-
Designed a modular data pipeline from ingestion to serving layer.
-
Implemented sparse vector similarity search with TF-IDF and cosine similarity.
-
Modeled normalized relational data with many-to-many relationships.
-
Built a full-stack deployed application using FastAPI and Render.
-
Developed validation tools (Precision@5) to evaluate model quality.
Content-Based-Recommendation-System/
│── data_processing/ # Ingestion, cleaning, feature generation
│── db/ # Database schema, load scripts, and CRUD
│── feature/ # Recommender engine logic
│── validation/ # Precision@5 evaluation tools
│── templates/ # Jinja2 frontend pages
│── static/ # CSS
│── main.py # FastAPI app entry point
│── movies.db # SQLite database
│── requirements.txt