Skip to content

[FEATURE] OLX Native API Integration #13

@LXSCA7

Description

@LXSCA7

[Feature] OLX Native API Integration (JSON Scraper)

Is your feature request related to a problem? Please describe.
Currently, GOrimpo relies on HTML parsing (Scraping) to fetch data from OLX. While functional, this approach is brittle as it depends on CSS selectors that frequently change. Moreover, parsing full HTML documents is resource-heavy for the VPS and slower compared to direct data fetching.

Describe the solution you'd like
I want to implement a new Scraper Adapter that communicates directly with OLX's internal API (REST or GraphQL). By fetching JSON data directly:

  1. We eliminate the need for heavy HTML parsing.
  2. We get more precise data (structured fields for price, location, and condition).
  3. We significantly reduce the chance of the scraper breaking due to UI updates.

Describe alternatives you've considered

  • Continuing with the current Playwright/Goquery approach (Stable but resource-intensive).
  • Using a hybrid approach (HTML for discovery, API for details), which is still sub-optimal.

Technical Details (Optional)
Since we are using Hexagonal Architecture, we can implement this as a new adapter within the internal/adapters/scraper/olx/ package. We should create a struct that satisfies the Scraper interface but uses http.Client to hit the API endpoints instead of parsing HTML.

Additional context
Initial research suggests that OLX uses a GraphQL endpoint for their modern frontend. Identifying the required headers (like X-Device-Id or specific User-Agents) will be key to avoiding 403 Forbidden errors. This move is a prerequisite for scaling GOrimpo to a SaaS model with high concurrency.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestinternalReserved for project maintainers. Please do not open PRs for this issue.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions