WebScraper-Old

Deprecated Notice: This project has been deprecated. Please check out the improved version of the scraper at WebScraper.

A Python-based web scraping tool designed to extract and convert HTML content into LaTeX format for seamless integration into documents.

Installation

Clone the repository:

git clone https://github.com/kgruiz/WebScraper-Old.git

Install the required dependencies:

pip install requests beautifulsoup4 tqdm pypandoc weasyprint

Convert a single HTML file to LaTeX:

python HTMLtoLatex.py path/to/input.html

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Examples Book		Examples Book
Package List		Package List
Typst Docs		Typst Docs
Typst Tutorial		Typst Tutorial
Universe		Universe
__pycache__		__pycache__
.DS_Store		.DS_Store
Converter.py		Converter.py
DirStructure.py		DirStructure.py
Downloader.py		Downloader.py
README.md		README.md
Scraper.py		Scraper.py
main.py		main.py
urlList.json		urlList.json
urlStructure.json		urlStructure.json