Role: Team Member (1 of 3)
Focus Areas: Web Scraping · SQL Insights · Exploratory Data Analysis (EDA)
This build-week project (Masai School) focused on scraping book data from the web, analyzing it using SQL, and deriving insights through Exploratory Data Analysis (EDA).
- Python (web scraping, data cleaning, EDA via Jupyter Notebook)
- SQL (data storage and querying:
.sqlscripts) - CSV (intermediate storage)
- Jupyter Notebook (
.ipynb) - Optional: PowerPoint (
.pptx) for presentations
- Web Scraping: Used Python to extract book details (title, price, ratings, etc.) into
book_data.csv. - Data Cleaning: Handled missing values, duplicates, and formatting errors.
- SQL Analysis: Used
.sqlscripts (BooksData_Insights.sql) to derive insights (e.g., availability, expensive books, etc.). - EDA & Visualization: Conducted in notebook (
EDA.ipynb) — included charts like histograms, boxplot, and summary tables. - Presentation: Summarized findings in a PowerPoint (
Web-Scraping-SQL-Insights-and-EDA.pptx).
- Price Distribution: Majority of books are priced under £30.
- Ratings: Most books have 1–3 star ratings; 4–5 star books are higher priced.
- Availability: Most books are fully available.
- Price vs Rating: Higher-rated books (4–5 stars) generally have higher prices, indicating correlation between quality and price.
| Name | Role |
|---|---|
| Tanmay Manna | Team Lead — Web Scraping |
| Diya Shah | SQL Insights |
| Prince Raj Gupta | Exploratory Data Analysis (EDA) |
# Clone repo
git clone https://github.com/princerg/webscraping-sql-eda.git
cd webscraping-sql-eda
# Install dependencies
pip install -r requirements.txt
# Run Jupyter Notebook
jupyter notebook notebooks/EDA.ipynb
# Execute SQL insights
# Open BooksData_Insights.sql in MySQL Workbench and run queries