A Python project for scraping song lyrics from AZLyrics and performing basic text analysis on the collected data. This project includes data collection, cleaning, tokenization, and simple statistics about the lyrics of different artists.
Note: This repository supported the research and writing of a personal blog post analyzing differences in lyrical style between artists from different genres.
- Exploratory analysis of lyrics: https://novamind.github.io/exploratory-analysis-of-lyrics.html
- Eminem vs Miley Cyrus. NLP: https://novamind.github.io/eminem-vs-miley-cyrus-nlp.html
Song lyrics are rich in emotion, storytelling, and linguistic patterns. This project demonstrates how to:
- Scrape song lyrics from AZLyrics using BeautifulSoup.
- Collect metadata like album info and song details.
- Tokenize and analyze lyrics to understand patterns in language.
- Compare lyrics across different genres (pop vs rap).
We focus on Eminem (rap) and Miley Cyrus (pop) as case studies.
- Web scraping: Pull song lyrics and metadata (title, album, featured artists).
- Text preprocessing: Clean lyrics and tokenize words.
- Basic statistics:
- Total number of songs scraped
- Vocabulary size (unique words)
- Token frequency
- Most common and rare words
- Sentiment analysis
- Topic modeling
- Visualizations (word clouds, token frequency plots)
- Batch scraping of multiple artists