Skip to content

Latest commit

Β 

History

History
94 lines (68 loc) Β· 2.06 KB

File metadata and controls

94 lines (68 loc) Β· 2.06 KB

OLX Scraper

DISCLAIMER: THIS PROJECT IS FOR EDUCATIONAL PURPOSES ONLY

A Node.js application that scrapes OLX.pl listings using Puppeteer, providing a convenient way to search and monitor offers with a command-line interface.

Features

  • πŸ” Scrapes OLX.pl listings with customizable search parameters
  • πŸ“± Supports both desktop and mobile view configurations
  • πŸ“Š Displays results in a clean, formatted table
  • πŸ’Ύ Optional JSON export of search results
  • πŸ”„ Sorting options (newest first, oldest first, or no sorting)
  • ⚑ Fast and efficient scraping with Puppeteer
  • πŸ›‘οΈ Built-in error handling and validation

Prerequisites

  • Node.js (latest LTS version recommended)
  • npm or yarn

Installation

  1. Clone the repository
  2. Install dependencies:
npm install
  1. Build the project:
npm run build

Usage

Run the application:

npm run start

The application will prompt you to:

  1. Enter an OLX.pl URL to scrape
  2. Choose sorting options (if URL doesn't contain sorting parameters)
  3. Decide whether to save results to a JSON file

Configuration

The scraper can be configured through the constants.ts file:

  • Viewport settings
  • Timeout duration
  • Headless browser mode
  • Custom CSS selectors

TODO

  • Add support for more websites
  • Implement advanced filtering options
  • Support for proxy rotation
  • Implement CAPTCHA solving
  • Add a nextjs frontend for better UX
  • Implement hono.js backend for API access

Project Structure

src/
β”œβ”€β”€ config/          # Configuration files
β”œβ”€β”€ errors/         # Custom error definitions
β”œβ”€β”€ scrapers/       # Scraping logic
β”œβ”€β”€ types/          # TypeScript type definitions
└── utils/          # Utility functions

Technologies Used

  • TypeScript
  • Puppeteer
  • Zod (for validation)
  • cli-table3 (for display formatting)
  • ora (for loading spinners)
  • prompts (for user input)

Error Handling

The application includes robust error handling for:

  • Invalid URLs
  • Network issues
  • Scraping failures
  • Data validation errors