A content-based movie recommendation system which was built using :
- First tags were extracted from the dataset such as Director's name , Actor's name , Genre and Overview
- Then I performed data tranformation and remove space in between words
- Next step was to combine all the extracted features in one line called "Tags"
- Then i extracted 5k commong words and capared each movie tags with them to form a 5000x5000 matrix
- I used Text vectorization to convert matrix of each tags into verctor(cordinates) and then we will plot in a 2d space
- Now we have to calculate distance between to vectors in 2d space to calculate the similarity
- I used Cosine distance which is angle between two vectors point
- I did not used Eucladian distance is not a good measure to find distance between two points when we have high dimensionality points
- Again we will plot a 5000x5000 matrix of all the similarity between movies
- After sorting and enumirating values in matrix we get most simmilar movies
- ✅ Content-based movie recommendations using cosine similarity
- ✅ Movie posters from TMDB API
- ✅ YouTube trailer links
- ✅ Interactive Streamlit web interface
- ✅ Dockerized for easy deployment
- ✅ 4800+ movies database
- Backend: Python, Pandas, NumPy, Scikit-learn, NLTK
- Frontend: Streamlit
- API: TMDB (The Movie Database)
- Containerization: Docker, Docker Compose
- Machine Learning: Bag of Words, Cosine Similarity
- Docker and Docker Compose installed
- OR Python 3.11+ (for local development)
# Build and start the container
docker-compose up --build
# Run in detached mode
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the container
docker-compose downThe app will be available at: http://localhost:8501
# Build the image
docker build -t movie-recommender .
# Run the container
docker run -p 8501:8501 movie-recommender
# Run in detached mode
docker run -d -p 8501:8501 --name movie-app movie-recommender
# Stop the container
docker stop movie-app
docker rm movie-apppip install -r requirements.txtstreamlit run app.pyThe app will open automatically in your browser at http://localhost:8501
recomender_system/
├── app.py # Main Streamlit application
├── movies.pkl # Preprocessed movie data
├── similarity.pkl # Similarity matrix
├── tmdb_5000_movies.csv # Raw movie dataset
├── tmdb_5000_credits.csv # Raw credits dataset
├── recomender_system.ipynb # Model training notebook
├── Dockerfile # Docker configuration
├── docker-compose.yml # Docker Compose configuration
├── requirements.txt # Python dependencies
├── .dockerignore # Docker ignore file
└── README.md # This file
- Data Processing: Movies are processed using NLP techniques (stemming, bag of words)
- Feature Extraction: Extracts genres, keywords, cast, crew, and overview
- Vectorization: Converts text features into numerical vectors
- Similarity Calculation: Uses cosine similarity to find similar movies
- Recommendation: Returns top 5 most similar movies with posters and trailers
- Open the app in your browser
- Select a movie from the dropdown
- Click "Show Recommendation"
- View 5 similar movies with:
- Movie posters
- Watch Trailer buttons (opens YouTube)
You can customize the Streamlit configuration by setting these environment variables:
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0
STREAMLIT_SERVER_HEADLESS=true
STREAMLIT_BROWSER_GATHER_USAGE_STATS=falseThe app uses TMDB API for fetching posters and trailers. The API key is currently hardcoded in app.py. For production:
- Get your API key from TMDB
- Replace the API key in
app.pyor use environment variables
Container won't start:
# Check logs
docker-compose logs
# Rebuild without cache
docker-compose build --no-cache
docker-compose upPort already in use:
# Change port in docker-compose.yml
ports:
- "8502:8501" # Use 8502 insteadSome movies might not have posters or trailers due to:
- Movie ID mismatch with TMDB database
- Movie removed from TMDB
- Network issues
- API rate limits
The recommendations will still work correctly.
- Source: TMDB 5000 Movie Dataset
- Movies: ~4800 movies
- Features: genres, keywords, cast, crew, overview
- User authentication
- Save favorite movies
- Collaborative filtering
- Deploy to cloud (AWS/Azure/GCP)
- Add movie ratings and reviews
- Real-time search with autocomplete
- Mobile responsive design
Created with passion by Shrish Mishra
⭐ If you find this project useful, please consider giving it a star!