An intelligent, scalable Recommendation System built with FastAPI, LightGBM, and Annoy, designed to serve personalized movie suggestions in real time.
This project demonstrates how to combine content-based filtering (via embeddings) with learning-to-rank models for fine-tuned personalization.
✅ Hybrid Recommendation System
- Uses Annoy (Approximate Nearest Neighbors) for fast candidate generation based on movie metadata (genres, actors, keywords).
- Uses LightGBM (LambdaRank) to rank the candidates based on user-specific behavioral patterns.
✅ Event-based Data Pipeline
- Collects user interactions (
viewed,liked,rated, etc.) in theeventstable. - Generates training labels automatically from
ratings(or event scores).
✅ Fast Model Training
- Annoy index for scalable similarity search.
- LightGBM LambdaRank model for optimized ranking.
✅ Modular Codebase
- DAO layer for all DB interactions.
- Service layer for model logic and ML pipeline.
- REST endpoints for training and recommendation.
✅ Extensible Architecture
- Easily add new metadata fields (e.g., directors, tags).
- Switch between test and production databases.
- Compatible with AWS / Docker deployment.
| Component | Technology |
|---|---|
| Backend Framework | FastAPI(Recommendor) / Express (Expose to customer) |
| Database | PostgreSQL |
| ORM / Data Access | SQLAlchemy |
| ML Model | LightGBM (LambdaRank Objective) |
| Approximate Nearest Neighbors | Annoy |
| Feature Engineering | Pandas, NumPy |
| Environment | Python 3.10+ |
| Model Serving | FastAPI service layer |
| Visualization / Docs | Swagger UI (FastAPI built-in) |
| Deployment Ready For | Docker / Uvicorn / Gunicorn |
git clone https://github.com/your-username/movie-recommender.git cd movie-recommender
python -m venv venv source venv/bin/activate # (Linux / macOS) venv\Scripts\activate # (Windows)
pip install -r requirements.txt
Create a .env file in the project root with the following content: DATABASE_URL=postgresql://username:password@localhost:5432/movies_db
cd backend
npm install
npm run dev
-
Ensure PostgreSQL is running and create the database: psql -U postgres -c "CREATE DATABASE movies_db;"
-
Create tables npx drizzle-kit generate --name=db_init npx drizzle-kit migrate
-
Seed dummy data using cron files
- create some dummy users
- import movies data from
https://www.themoviedb.org/ - create some dummy ratings
uvicorn main:app --reload
Access the Swagger UI at: http://localhost:8000/docs
✅ User Embeddings: Incorporate matrix factorization or neural embeddings to better capture user–movie relationships.
✅ Hybrid Re-Ranking: Blend collaborative signals with content-based attributes (genre, keywords, cast).
✅ Context-Aware Recommendations: Integrate time-based and situational relevance (recent activity, seasonality).
✅ Feedback Loop: Implement reinforcement learning or Bayesian updates to fine-tune model weights.
✅ Scalability: Move to distributed training (LightGBM on GPU or Dask) and serve via model registry (e.g., MLflow).
✅ CI/CD Pipeline: Add automated retraining and deployment workflows (Cron Jobs + Docker + AWS ECS).
flowchart TD
A[User Interactions] -->|events, ratings| B[(PostgreSQL DB)]
B -->|fetch metadata| C[MoviesDAO and EventsDAO]
C --> D[Feature Preparation Layer]
D --> E[Annoy Index Builder]
E --> F[Annoy Index File]
D --> G[LightGBM Trainer]
G --> H[Ranker Model File]
F & H --> I[Recommendation Service]
I --> J[FastAPI Endpoints]
J --> K[Client or Frontend]