Skip to content

jonathan-kee/examTopicScraper

Repository files navigation

EXAMTOPICSCRAPER

The reason I was doing this is because I don't want to pay the expensive fee to see the certification dumps lol.

Theory on webscraping

https://webscraping.fyi/ ^ https://webscraping.fyi/overview/browser-automation/ ^ I am currently using browser automation instead of http clients ^ https://webscraping.fyi/overview/languages/#http-clients

Project Setup

  1. Install node version manager:

^^^^^^ The following wont work if ~/.profile exist, Then you need to manually add to .bashrc by doing the following:

  • echo 'export NVM_DIR="$HOME/.nvm" [ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh" # This loads nvm [ -s "$NVM_DIR/bash_completion" ] && . "$NVM_DIR/bash_completion" # This loads nvm bash_completion' >> ~/.bashrc
  1. Make .bashrc take effect immediately by sourcing:
  • source ~/.bashrc
  1. Install Node 20:
  • nvm install 20
  1. Verify node:
  • node -v
  1. Install Typescript:
  • npm install -g typescript
  1. Install Dependencies:
  • npm install
  1. Compile Typescript and launch with node with sample arguments:
  • tsc && node ./build/index.js

Features to add

  • Rescrape pages that result in dirty data, need to update / merge existing data. ^ Partially added for Answers, never needed for question & discussions

  • Error handling for missing src images

  • Column lineage with airflow & dbt

List of bugs to fix

Launch browser that google does not capcha

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
--remote-debugging-port=9222
--user-data-dir=/tmp/chrome-profile

How to remove popup block

Apparently if you edit class="popup-overlay show" to "popup-overla show", the popup will break

Question with Screenshot (Unsure how to deal with images in question)

/html/body/div[2]/div/div[4]/div/div[1]/div[2]/p

Screenshot

/html/body/div[2]/div/div[4]/div/div[1]/div[2]/p/img

Question Screenshot full link

https://www.examtopics.com/assets/media/exam-media/04351/0000200001.png

Answers

/html/body/div[2]/div/div[4]/div/div[1]/div[2]/div[2]/ul/li[1]/text()

/html/body/div[2]/div/div[4]/div/div[1]/div[2]/div[2]/ul/li[2]/text()

/html/body/div[2]/div/div[4]/div/div[1]/div[2]/div[2]/ul/li[3]/text()

/html/body/div[2]/div/div[4]/div/div[1]/div[2]/div[2]/ul/li[4]/text()

/html/body/div[2]/div/div[4]/div/div[1]/div[2]/div[2]/ul/li[5]/text()

Discussion texts

/html/body/div[2]/div/div[4]/div/div[2]/div[2]/div/div/div[2]/div[1]/div/div[2]/div[2]

/html/body/div[2]/div/div[4]/div/div[2]/div[2]/div/div/div[2]/div[2]/div/div[2]/div[2]

Discussion upvotes

/html/body/div[2]/div/div[4]/div/div[2]/div[2]/div/div/div[2]/div[1]/div/div[2]/div[3]/span[2]/span

/html/body/div[2]/div/div[4]/div/div[2]/div[2]/div/div/div[2]/div[2]/div/div[2]/div[3]/span[2]/span

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published