A sophisticated music recommendation engine that combines collaborative filtering and content-based filtering to provide personalized song suggestions based on playlist analysis and lyrical content.
- Hybrid Recommendation Engine: Combines user behavior patterns with song content analysis
- Memory Efficient: Smart caching system that computes similarity matrices once, then loads instantly
- Spotify Dataset Integration: Built on a sample of the Spotify Million Playlist Dataset
- Lyrics Analysis: Uses TF-IDF vectorization to find songs with similar themes
- Customizable Parameters: Adjustable alpha values to balance different recommendation approaches
- Analyzes patterns in the Spotify Million Playlist Dataset
- Finds songs that are frequently listened to together
- Uses cosine similarity between tracks based on playlist co-occurrence
- Analyzes song lyrics using TF-IDF vectorization
- Finds songs with similar lyrical themes and emotional content
- Uses cosine similarity between TF-IDF vectors
- Combines both approaches using a weighted average
- Alpha parameter controls the balance:
α = 1.0: Pure collaborative filteringα = 0.0: Pure content-based filteringα = 0.6: 60% CF + 40% content-based
# Required Python packages
pip install -r requirements.txt# First run (computes and saves similarity matrices)
python hybrid_recommender.py
# Force recomputation if needed
python hybrid_recommender.py --recomputespotify_million_playlist_dataset_challenge/
├── hybrid_recommender.py # Main recommendation system (STANDALONE!)
├── geniusAPI.py # Optional API integration (NOT NEEDED!)
├── similarity_matrices_*.pkl # Cached similarity matrices (auto-generated)
├── requirements.txt # Python dependencies
├── playlists.json # Playlist data
└── README.md # This file
from hybrid_recommender import *
# Load the system
cf_sim, content_sim, unique_tracks = main()
# Your playlist URIs
my_playlist_uris = [
"spotify:track:1234567890abcdef",
"spotify:track:abcdef1234567890"
]
# Get recommendations
recommendations = get_recommendations_by_uris(
my_playlist_uris, cf_sim, content_sim, unique_tracks,
n_recommendations=20, alpha=0.6
)
# Display results
for idx, row in recommendations.iterrows():
print(f"{idx+1}. {row['track_name']} by {row['artist_name']} (Score: {row['score']:.4f})")# Run the interactive demo
cf_sim, content_sim, unique_tracks = main()
interactive_demo(cf_sim, content_sim, unique_tracks)The system includes a built-in interactive demo that lets you:
- Try sample playlists (rock, pop, mixed)
- Create playlists from track names (e.g., "Bohemian Rhapsody|Queen")
- Use custom Spotify track URIs
- Adjust recommendation parameters (alpha, number of recommendations)
- Music Discovery: Find new songs similar to your favorites
- Playlist Creation: Generate themed playlists automatically
- Mood Matching: Find songs with similar lyrical themes
- Genre Exploration: Discover music in your preferred styles
- Research: Analyze music recommendation algorithms
- "I want sad breakup songs" → Find lyrically similar emotional tracks
- "I love this indie playlist" → Discover more indie artists
- "I want workout music" → Find high-energy tracks with similar themes
pip install -r requirements.txt- Reduce
n_nearest_neighborsparameter (default: 100) - Use
--recomputeto clear cached matrices - Ensure you have at least 4GB RAM available
- First run is always slow (computing matrices)
- Subsequent runs should be fast (loading from disk)
- Check if similarity matrices exist in your directory
- Nearest Neighbors: KNN with cosine similarity
- TF-IDF Vectorization: Text processing for lyrics
- Sparse Matrices: Memory-efficient similarity storage
- Hybrid Scoring: Weighted combination of CF and content scores
- Lyrics Cleaning: Removes metadata, contributors, translations
- Text Normalization: Converts to lowercase, removes stop words
- Similarity Computation: Efficient sparse matrix operations
- Source: Spotify Million Playlist Dataset
- Tracks: 66,000+ unique songs
- Coverage: Multiple genres, languages, and time periods
Happy recommending! 🎵
Built with ❤️ for music discovery and machine learning enthusiasts.