WebScraper Pro 🕸️

A configurable Python web scraping tool that extracts structured data from multiple webpages and exports the results to CSV.
Built for automation, data collection, and Upwork-style client projects.

✨ Features

Scrapes multiple pages using a URL pattern with {page}
Fully configurable via JSON (no code changes needed)
Extracts data using CSS selectors (quotes, authors, tags, or any other fields)
Saves clean structured data to CSV
Logs scraping progress to logs/scraper.log
Easy CLI interface for clients and non-technical users

🧱 Project Structure

webscraper_pro/
├─ README.md
├─ LICENSE
├─ requirements.txt
├─ .gitignore
├─ data/
│  ├─ sample_urls.txt
│  └─ output/
├─ logs/
├─ webscraper/
│  ├─ __init__.py
│  ├─ config_example.json
│  ├─ cli.py
│  ├─ scraper.py
│  ├─ parser.py
│  └─ storage.py

⚙️ Configuration

Example config file: webscraper/config_example.json

{
    "base_url": "https://quotes.toscrape.com/page/{page}/",
    "start_page": 1,
    "end_page": 3,
    "selectors": {
        "quote": ".quote .text",
        "author": ".quote .author",
        "tags": ".quote .tags .tag"
    }
}

Fields explained:

base_url — must contain {page} so scraper can iterate
start_page / end_page — scraping range
selectors — CSS selectors for each extracted field

You can modify this JSON to scrape any website, not just quotes.

▶️ How to Run

Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Run the scraper:

python -m webscraper.cli --config webscraper/config_example.json --output data/output/quotes.csv

Result:

Fetches pages 1–3
Extracts quotes, authors, and tags
Saves them to data/output/quotes.csv

📜 License

This project is licensed under the MIT License.
You are free to use, modify, distribute, and incorporate the code into your own projects.

See the full license in the included LICENSE file.

📝 Notes

This project is for demonstration and educational purposes.
Always respect website terms of service and robots.txt when scraping real websites.
The scraper is modular and easy to extend for more complex automation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebScraper Pro 🕸️

✨ Features

🧱 Project Structure

⚙️ Configuration

▶️ How to Run

📜 License

📝 Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
webscraper		webscraper
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

WebScraper Pro 🕸️

✨ Features

🧱 Project Structure

⚙️ Configuration

▶️ How to Run

📜 License

📝 Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages