Tool for parsing a URL webpage into JSON + RDF.
- Python:
3.10 geckodriverorchromedriver
-
Install
urlscrubwithpippython3.10 -m pip install urlscrub
-
Install
geckodriver-
Download Firefox and install.
-
Linux (Ubuntu):
sudo apt-get install firefox
-
-
Unzip
geckodriver/geckodriver.exefile into a preferred directory. -
Append the directory containing
geckodriverto yourPATHvariable. (Guide)
-
-
Install
chromedriver-
Download Google Chrome and install.
-
Find the version of Google Chrome you have installed.
-
Download
chromedriver.zipwith the most corresponding version number.- Exact version number not required (Ex: chromedriver
102.0.5005.61w/ Google Chrome102.0.5005.115)
- Exact version number not required (Ex: chromedriver
-
Unzip
chromedriver/chromedriver.exefile into a preferred directory. -
Append the directory containing
chromedriverto yourPATHvariable. (Guide)
-
-
Command:
urlscrub --skip-cookies --driver "chrome" -l "https://www.amazon.com/All-new-Kindle-Oasis-now-with-adjustable-warm-light/dp/B07GRSK3HC"
-
Response:
{ "results": [ { "type": "product", "productTitle": "Kindle Oasis \u2013 With adjustable warm light", "availability": "In Stock.", "rating": "19,734 ratings", "imageURL": "https://m.media-amazon.com/images/I/614TlIaYBvL._AC_SX679_.jpg" } ] }
-
Appending directories to your
PATHenvironment variable.- Windows Guide
- Linux:
-
Append path to your
.bashrc/.zshrcexport PATH="<geckodriver_dir>/:$PATH"
-