Spotify Web Scraper crawls through the Spotify web interface and extracts artist logo information.
- Beautiful Soup - Parse HTML code
- PyMongo - Python MongoDB Driver
- Selenium - Browser Automation Tool
Install Dependencies with pip
pip install beautifulsoup4
pip install pymongo
pip install selenium
Alternatively you can install the requirement.txt
pip install -r requirement.txt
The scraper will look for config.py file at the root with the following parameters to connect to your database
# config.py
DATABASE_CONFIG = {
'host': 'mongodb://{0}:{1}@MONGODB_HOST/DB_COLLECTION_NAME',
'dbuser': '', # Database user
'dbuserpassword': '', # Database password
}Since Spotify webpages are generated dynamically, we need a headless browser to generate the webpages dynamically from the source. I have chosen the Chrome WebDriver which works great in this instance.
Download and save the Chrome WebDriver at the root level.