PyScrape

Simple web-scraping utility that extracts table data from a target page and downloads linked documents.

Files

Quickstart (Windows)

Create and activate a virtual env:
- python -m venv .venv
- .venv\Scripts\activate
Install deps:
- pip install -r requirements.txt
Run the scraper:
- python app.py

Behavior

Scrapes the specified URL in app.py, extracts the first (or first wikitable) HTML table into CSV, and saves documents (PDF/DOC/TXT) referenced from the page into downloads/.
Sends a descriptive User-Agent header (modify in app.py) — respect site robots and rate limits.
Handles HTTP errors via response.raise_for_status() and prints simple progress/errors.

Configuration

Notes

Wikimedia and many sites block default UAs; keep a descriptive UA with contact info.
Respect robots.txt and site terms. Add exponential backoff / 429 retry handling for production use.

License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
app.py		app.py

Provide feedback