A project where I scraped data from an online book website and analyzed it using Python, SQL, and data visualization tools to uncover key trends and insights.
This project demonstrates the complete data pipeline โ from web scraping raw book data to extracting insights using SQL and Python. It includes:
- Scraping book titles, prices, availability, ratings, and categories
- Cleaning and structuring data using Python (Pandas)
- Loading data into a SQLite/PostgreSQL database
- Performing SQL queries to extract meaningful insights
- Visualizing data patterns with Matplotlib/Seaborn/Plotly
| Tool | Purpose |
|---|---|
Python |
Core scripting and data analysis |
BeautifulSoup / Requests |
Web scraping |
Pandas |
Data cleaning & manipulation |
SQLite or PostgreSQL |
Data storage & SQL queries |
Matplotlib / Seaborn |
Data visualization |
Jupyter Notebook |
Project documentation |
Here are a few insights extracted:
- ๐ธ Average book price across all categories
- ๐ Most common book categories
- โญ Distribution of ratings
- ๐ซ Out-of-stock vs In-stock books
- ๐ Category-wise pricing trends
(More insights are available in the analysis notebook.)