Deprecated Notice: This project has been deprecated. Please check out the improved version of the scraper at WebScraper.
A Python-based web scraping tool designed to extract and convert HTML content into LaTeX format for seamless integration into documents.
-
Clone the repository:
git clone https://github.com/kgruiz/WebScraper-Old.git
-
Navigate to the project directory:
cd WebScraper-Old -
Install the required dependencies:
pip install requests beautifulsoup4 tqdm pypandoc weasyprint
-
Convert a single HTML file to LaTeX:
python HTMLtoLatex.py path/to/input.html
-
Download web pages as PDFs:
python Downloader.py urlList.json
-
Flatten directory structure:
python Scraper.py