🕷️ WebScraper

A simple full-stack web scraping application built using Node.js, Express, Puppeteer.
It can scrape data from any JavaScript-rendered website and display it in a clean frontend UI.

🚀 Features

Scrapes JavaScript-rendered (SPA / React / Vue) websites using Puppeteer
Backend API built with Express.js
Frontend fetches scraped data and displays it
Works for any website by changing URL and selectors
CORS enabled for frontend-backend communication

📁 Project Structure

webscraper/
│
├── index.js        # Backend scraper API
├── package.json
│
├── index.html      # Frontend UI
├── app.js          # Frontend fetch logic
└── README.md

⚙️ Installation

Clone the repository:

git clone https://github.com/tanmoyPaul12/webscraper.git
cd webscraper

Install dependencies:

npm install

▶️ Run Backend

node index.js

Backend runs at:

http://localhost:8000/results

🌐 Run Frontend

Open index.html using Live Server or any local server:

http://127.0.0.1:5500/index.html

The frontend fetches data from the backend and displays it.

📊 What It Scrapes

This scraper can extract data from any JavaScript-rendered website, not just this portfolio.

In this project, my own portfolio website was used only for testing and demonstration purposes, but the scraper can be easily adapted to scrape:

Blogs
News sites
Product pages
Profile pages
Any SPA / React / Vue / JS-rendered site

By changing the target URL and selectors, you can scrape different websites without changing the core logic.

🧠 Why Puppeteer?

The target websites are JavaScript-rendered, so libraries like Cheerio cannot scrape them correctly.
Puppeteer loads the page like a real browser and extracts data after rendering.

🔐 Legal Notice

This project is for educational purposes only.
Always check website terms and robots.txt before scraping.

📌 Future Improvements

Add multi-page crawling
Save scraped data to JSON or database
Add refresh button
Add loading animation
Add export feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕷️ WebScraper

🚀 Features

📁 Project Structure

⚙️ Installation

▶️ Run Backend

🌐 Run Frontend

📊 What It Scrapes

🧠 Why Puppeteer?

🔐 Legal Notice

📌 Future Improvements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
app.js		app.js
index.html		index.html
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

tanmoyPaul12/webscraper

Folders and files

Latest commit

History

Repository files navigation

🕷️ WebScraper

🚀 Features

📁 Project Structure

⚙️ Installation

▶️ Run Backend

🌐 Run Frontend

📊 What It Scrapes

🧠 Why Puppeteer?

🔐 Legal Notice

📌 Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages