The reason I was doing this is because I don't want to pay the expensive fee to see the certification dumps lol.
https://webscraping.fyi/ ^ https://webscraping.fyi/overview/browser-automation/ ^ I am currently using browser automation instead of http clients ^ https://webscraping.fyi/overview/languages/#http-clients
- Install node version manager:
- curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
^^^^^^ The following wont work if ~/.profile exist, Then you need to manually add to .bashrc by doing the following:
- echo 'export NVM_DIR="$HOME/.nvm" [ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh" # This loads nvm [ -s "$NVM_DIR/bash_completion" ] && . "$NVM_DIR/bash_completion" # This loads nvm bash_completion' >> ~/.bashrc
- Make .bashrc take effect immediately by sourcing:
- source ~/.bashrc
- Install Node 20:
- nvm install 20
- Verify node:
- node -v
- Install Typescript:
- npm install -g typescript
- Install Dependencies:
- npm install
- Compile Typescript and launch with node with sample arguments:
- tsc && node ./build/index.js
-
Rescrape pages that result in dirty data, need to update / merge existing data. ^ Partially added for Answers, never needed for question & discussions
-
Error handling for missing src images
-
Column lineage with airflow & dbt
-
Answer cannot be scraped: https://www.examtopics.com/discussions/oracle/view/92435-exam-1z0-071-topic-1-question-24-discussion/ ^ The answers was already scraped, but it is contained within questions
-
Need to handle
-
https://www.examtopics.com/assets/media/exam-media/04351/0002400002.jpg
-
Apparently there was nothing wrong with my scraping code, the image's src just did not appear, meaning the resource did not lead.
-
Need to rescrape images from 103, 119, 120, 127, 128, 131, 133, 146, 166, 228, 236, 245, 256
-
replace pngMost with png
-
if pngMost exist, then replace 'pngMost' with 'png' & replace 'Voted' with ''
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
--remote-debugging-port=9222
--user-data-dir=/tmp/chrome-profile
Apparently if you edit class="popup-overlay show" to "popup-overla show", the popup will break
/html/body/div[2]/div/div[4]/div/div[1]/div[2]/p
/html/body/div[2]/div/div[4]/div/div[1]/div[2]/p/img
https://www.examtopics.com/assets/media/exam-media/04351/0000200001.png
/html/body/div[2]/div/div[4]/div/div[1]/div[2]/div[2]/ul/li[1]/text()
/html/body/div[2]/div/div[4]/div/div[1]/div[2]/div[2]/ul/li[2]/text()
/html/body/div[2]/div/div[4]/div/div[1]/div[2]/div[2]/ul/li[3]/text()
/html/body/div[2]/div/div[4]/div/div[1]/div[2]/div[2]/ul/li[4]/text()
/html/body/div[2]/div/div[4]/div/div[1]/div[2]/div[2]/ul/li[5]/text()
/html/body/div[2]/div/div[4]/div/div[2]/div[2]/div/div/div[2]/div[1]/div/div[2]/div[2]
/html/body/div[2]/div/div[4]/div/div[2]/div[2]/div/div/div[2]/div[2]/div/div[2]/div[2]
/html/body/div[2]/div/div[4]/div/div[2]/div[2]/div/div/div[2]/div[1]/div/div[2]/div[3]/span[2]/span
/html/body/div[2]/div/div[4]/div/div[2]/div[2]/div/div/div[2]/div[2]/div/div[2]/div[3]/span[2]/span