GitHub - Magicherry/Movies-Recommendation: StreamX — A personalized recommender system with Matrix Factorization (SGD, ALS), SVD, and TextCNN, featuring end-to-end preprocessing, evaluation (MAE, RMSE, NDCG), and a full-stack web app.

StreamX - Movies Recommender System

A full-stack application implementing a custom Recommender System with a modern web interface.

Features

Robust Recommendation Engine
- Matrix Factorization: Custom implementations trained with Numba-accelerated Stochastic Gradient Descent (SGD, Option 1) and Sparse CSR-block Alternating Least Squares (MF-ALS, Option 4). Both feature massive loop parallelization and dynamic Early Stopping for millisecond-scale processing on 27M+ ratings.
- Deep Neural CF: Hybrid deep learning model with Text CNN for title feature extraction.
- Matrix SVD: Closed-form SVD latent factors calibrated with Ridge/Lasso regression.
Automated Data Processing
- Per-user random 80/20 data split for reliable training and testing.
Comprehensive Evaluation Metrics
- Rating Prediction: MAE, RMSE
- Top-K Recommendations: Precision@10, Recall@10, F-measure@10, NDCG@10
Structural Data Analysis
- Distribution profiling, feature influence analysis, latent factor interpretation, and synthetic data generation.
Modern Web Interface (Next.js + Django)
- Browse library with multi-genre filtering and sorting.
- Detailed movie pages with metadata and similar movie suggestions.
- Personalized user profiles showcasing rating history and top recommendations.
- Dynamic TMDB API integration for rich image enrichment (posters and backdrops).

UX Preview

Home Page	Top Picks

Library	Movie Detail	Community

User Profile	Actor Detail	Settings

Project Structure

dataset/          # Raw MovieLens data (e.g. ml-latest/)
backend/          # Django REST API
frontend/         # Next.js web application
models/           # ML model code and generated artifacts (option1, option2, option3_ridge, option3_lasso, option4, splits)
scripts/          # Training, evaluation, enrichment, and report generation
analysis/         # Final report (final_report.md), figures, and JSON/CSV artifacts

Getting Started

Quick start (after environment setup)

Once the Python venv and dependencies are installed (Step 1 below) and the model is trained (Step 2), you can start both backend and frontend with one command:

Windows (PowerShell): .\start.ps1
macOS / Linux: ./start.sh

This starts the Django API on port 8001 and the Next.js app on port 3001, and opens the app in your browser.

1. Python Environment Setup

Requires Python 3.11 (or a compatible 3.x version). Create and activate a virtual environment, then install dependencies:

# macOS / Linux
python -m venv .venv
source .venv/bin/activate

# Windows
python -m venv .venv
.venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Train the Model

Train on MovieLens data under dataset/ml-latest/ (or pass --dataset-dir). Model files are written to models/artifacts/<model-type>/, and split metadata is shared in models/artifacts/.

Quick default run:

python -m scripts.train_and_evaluate --dataset-dir dataset/ml-latest --top-k 10

Model-specific quick runs:

python -m scripts.train_and_evaluate --model-type option1 --dataset-dir dataset/ml-latest
python -m scripts.train_and_evaluate --model-type option2 --dataset-dir dataset/ml-latest
python -m scripts.train_and_evaluate --model-type option3_ridge --dataset-dir dataset/ml-latest
python -m scripts.train_and_evaluate --model-type option3_lasso --dataset-dir dataset/ml-latest
python -m scripts.train_and_evaluate --model-type option4 --dataset-dir dataset/ml-latest

Detailed parameter presets and copy-ready commands are documented in @TRAINING_PARAMETERS.md.

GPU note: Option 2 is PyTorch-based and auto-selects cuda/mps when available. Training, deep-model dependencies, and plotting tools are all included in:
pip install -r requirements.txt

The script caches a shared train/test split in models/artifacts/splits/ so all models are evaluated on the same holdout split. Use --force-resplit to regenerate.

3. Fetch Movie Information (Optional but Recommended)

Enrich movie records with posters, backdrops, overviews, and cast/director data from TMDB. The script reads movies.csv from dataset/ml-latest (or --dataset-dir) and writes movies_enriched.csv beside it. If the dataset copy is unavailable, it falls back to models/artifacts. Run after training (Step 2).

Get a free API key at TMDB and create a .env in the project root:
```
TMDB_API_KEY=your_api_key_here
```
Run the scraper:
```
python -m scripts.scrape_tmdb
```

4. Start the Backend API

Start the Django development server:

cd backend
python manage.py runserver 8001

⚡ Performance Architecture — Lazy Loading: The backend uses a lazy model loading strategy optimized for the full MovieLens 27M dataset. On startup, the server only scans which model files exist on disk (instant) rather than loading all 5 model pickle files into memory (which would take 30+ seconds).

First request after startup: The active model (~100–200 MB) is loaded on-demand when the first API request arrives. Expect a ~15–25 second wait on the very first request (or when switching to a new model via Settings).

All subsequent requests: Served from memory in < 5 ms. Models stay cached until the server restarts.

Switching models: When you switch the active model in the frontend Settings page, the newly selected model is loaded on-demand. This takes ~15–25 seconds for the first request with that model, then all subsequent requests are instant.

User history and rating statistics are computed dynamically via Pandas DataFrames rather than pre-built Python dictionaries, reducing memory usage from ~5 GB to ~500 MB for the full dataset.

Warnings you can safely ignore (they do not affect functionality):

Pandas requires version '2.10.2' or newer of 'numexpr' ...
nopython is set for njit and is ignored ...
You have 2 unapplied migration(s) ...

Key Endpoints:

GET /api/health — API health check
GET /api/movies — Paginated movies with search and genre filters
GET /api/movie/<id> — Movie detail and metadata
GET /api/recommend/<user_id> — Top-K recommendations for a user
GET /api/users — User list
GET /api/user/<user_id>/history — User rating history
GET /api/predict/<user_id>/<item_id> — Predicted rating for a user–item pair
GET /api/search — Full-text movie search
GET /api/stats — Database statistics
GET /api/model-config — Loaded model configuration
TMDB and scrape endpoints for image enrichment (see backend api/urls.py for full list)

5. Start the Frontend Application

In a new terminal, start the Next.js application:

cd frontend
npm install
npm run dev -- -p 3001

(Optional) If you need to specify a custom backend URL:

NEXT_PUBLIC_API_BASE_URL="http://localhost:8001/api" npm run dev -- -p 3001

Access the application at: http://localhost:3001

Deploy on Render

This repository includes a render.yaml blueprint for a two-service deployment:

streamx-backend (Django API, Python)
streamx-frontend (Next.js UI, Node.js)

1) Connect the repository

In Render, create a Blueprint service and point it to this repository. Render will detect render.yaml and propose both services.

2) Configure frontend API URL

Set frontend env var NEXT_PUBLIC_API_BASE_URL to your backend public URL:

https://<your-backend-service>.onrender.com/api

3) Model/data storage on Render

For the free-tier blueprint, the backend uses STREAMX_DATA_DIR=/tmp/streamx.

On first boot, backend/start_render.sh seeds that directory from models/artifacts/.
Runtime updates (for example active_model.txt, movies_enriched.csv, and scrape_state.json) are written there while the instance is alive.
Note: /tmp is ephemeral on free tier, so data may reset after restart/redeploy.

4) Required backend environment variables

SECRET_KEY (generated in blueprint by default)
DEBUG=False
ALLOWED_HOSTS=.onrender.com (or your custom domain list)
TMDB_API_KEY (optional, required for TMDB scraping endpoints)

Technical Notes

The data loader supports both csv and dat MovieLens formats.
The recommender algorithm is built from scratch and does not rely on black-box recommendation libraries.
The analysis pipeline is designed to support course-style interpretation questions, not only predictive metrics.
The UI features a responsive design, glass-morphism effects, and dynamic filtering components.

Acknowledgements

Special thanks to the open-source projects and communities that made this possible:

MovieLens for the core datasets used in model training and evaluation.
TMDB API for providing rich movie metadata and high-quality image assets.
Next.js & Django for powering the frontend and backend architectures respectively.
pandas & NumPy for efficient data manipulation and computation.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
FinalReport		FinalReport
backend		backend
docs		docs
frontend		frontend
models		models
preview		preview
reference/movie_recommender		reference/movie_recommender
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
render.yaml		render.yaml
requirements.txt		requirements.txt
start.ps1		start.ps1
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StreamX - Movies Recommender System

Features

UX Preview

Project Structure

Getting Started

Quick start (after environment setup)

1. Python Environment Setup

2. Train the Model

3. Fetch Movie Information (Optional but Recommended)

4. Start the Backend API

5. Start the Frontend Application

Deploy on Render

1) Connect the repository

2) Configure frontend API URL

3) Model/data storage on Render

4) Required backend environment variables

Technical Notes

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StreamX - Movies Recommender System

Features

UX Preview

Project Structure

Getting Started

Quick start (after environment setup)

1. Python Environment Setup

2. Train the Model

3. Fetch Movie Information (Optional but Recommended)

4. Start the Backend API

5. Start the Frontend Application

Deploy on Render

1) Connect the repository

2) Configure frontend API URL

3) Model/data storage on Render

4) Required backend environment variables

Technical Notes

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages