Skip to content

nbilabsystems/webscraper_pro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebScraper Pro 🕸️

Python Scraping BeautifulSoup Requests License

A configurable Python web scraping tool that extracts structured data from multiple webpages and exports the results to CSV.
Built for automation, data collection, and Upwork-style client projects.


✨ Features

  • Scrapes multiple pages using a URL pattern with {page}
  • Fully configurable via JSON (no code changes needed)
  • Extracts data using CSS selectors (quotes, authors, tags, or any other fields)
  • Saves clean structured data to CSV
  • Logs scraping progress to logs/scraper.log
  • Easy CLI interface for clients and non-technical users

🧱 Project Structure

webscraper_pro/
├─ README.md
├─ LICENSE
├─ requirements.txt
├─ .gitignore
├─ data/
│  ├─ sample_urls.txt
│  └─ output/
├─ logs/
├─ webscraper/
│  ├─ __init__.py
│  ├─ config_example.json
│  ├─ cli.py
│  ├─ scraper.py
│  ├─ parser.py
│  └─ storage.py

⚙️ Configuration

Example config file: webscraper/config_example.json

{
    "base_url": "https://quotes.toscrape.com/page/{page}/",
    "start_page": 1,
    "end_page": 3,
    "selectors": {
        "quote": ".quote .text",
        "author": ".quote .author",
        "tags": ".quote .tags .tag"
    }
}

Fields explained:

  • base_url — must contain {page} so scraper can iterate
  • start_page / end_page — scraping range
  • selectors — CSS selectors for each extracted field

You can modify this JSON to scrape any website, not just quotes.


▶️ How to Run

Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Run the scraper:

python -m webscraper.cli --config webscraper/config_example.json --output data/output/quotes.csv

Result:

  • Fetches pages 1–3
  • Extracts quotes, authors, and tags
  • Saves them to data/output/quotes.csv

📜 License

This project is licensed under the MIT License.
You are free to use, modify, distribute, and incorporate the code into your own projects.

See the full license in the included LICENSE file.


📝 Notes

  • This project is for demonstration and educational purposes.
  • Always respect website terms of service and robots.txt when scraping real websites.
  • The scraper is modular and easy to extend for more complex automation.

About

Configurable Python web scraping tool using Requests, BeautifulSoup, and lxml. Extracts structured data from multiple pages and exports clean CSV files. Ideal for automation and data collection.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages