This project contains a Node.js script (dothething.js) that scrapes the Kobo blog to find the latest weekly book deals and extracts details for each daily deal, including the ISBN.
- Node.js (version compatible with ES Modules and async/await)
- npm (usually comes with Node.js)
- Google Chrome browser installed (as the script uses ChromeDriver)
- Clone the repository:
git clone https://github.com/samlam369/Kobo-Crawler.git cd Kobo-Crawler - Install the dependencies:
npm install
Run the script from the project's root directory:
node dothething.jsThe script will:
- Launch a Chrome browser instance.
- Navigate to the Kobo blog (
https://www.kobo.com/zh/blog). - Find the link to the latest "【一週99書單】" (Weekly 99 Deals) post.
- Navigate to that post.
- Extract the daily book deals (date, title, author, sales copy, link, cover image).
- Navigate to each book's page to retrieve its ISBN (Book ID).
- Print the collected daily deals (including ISBNs) as a JSON array to the console.
- Exit with code 0 on success or code 1 on error.
To potentially speed up the script and reduce bandwidth, you can disable image loading by setting the LOAD_IMAGES environment variable to false:
Windows (cmd.exe):
set LOAD_IMAGES=false && node dothething.jsWindows (PowerShell):
$env:LOAD_IMAGES="false"; node dothething.jsLinux/macOS:
LOAD_IMAGES=false node dothething.jsThe script includes error handling. If any step fails (e.g., page structure changes, elements not found, ISBN missing), it will log an error message to the console and exit with a status code of 1.
This script is designed to be run in CI/CD environments like Jenkins.
- It exits with a non-zero status code (1) on failure, which Jenkins can use to determine build status.
- A
Jenkinsfileis included in the repository as a starting point for pipeline configuration. - Ensure the Jenkins execution environment has Node.js, npm, and Google Chrome installed.
- Consider using the
LOAD_IMAGES=falseenvironment variable in Jenkins jobs for efficiency.
This project is licensed under the ISC License. See the LICENSE file for details.