[Feature] OLX Native API Integration (JSON Scraper)
Is your feature request related to a problem? Please describe.
Currently, GOrimpo relies on HTML parsing (Scraping) to fetch data from OLX. While functional, this approach is brittle as it depends on CSS selectors that frequently change. Moreover, parsing full HTML documents is resource-heavy for the VPS and slower compared to direct data fetching.
Describe the solution you'd like
I want to implement a new Scraper Adapter that communicates directly with OLX's internal API (REST or GraphQL). By fetching JSON data directly:
- We eliminate the need for heavy HTML parsing.
- We get more precise data (structured fields for price, location, and condition).
- We significantly reduce the chance of the scraper breaking due to UI updates.
Describe alternatives you've considered
- Continuing with the current Playwright/Goquery approach (Stable but resource-intensive).
- Using a hybrid approach (HTML for discovery, API for details), which is still sub-optimal.
Technical Details (Optional)
Since we are using Hexagonal Architecture, we can implement this as a new adapter within the internal/adapters/scraper/olx/ package. We should create a struct that satisfies the Scraper interface but uses http.Client to hit the API endpoints instead of parsing HTML.
Additional context
Initial research suggests that OLX uses a GraphQL endpoint for their modern frontend. Identifying the required headers (like X-Device-Id or specific User-Agents) will be key to avoiding 403 Forbidden errors. This move is a prerequisite for scaling GOrimpo to a SaaS model with high concurrency.
[Feature] OLX Native API Integration (JSON Scraper)
Is your feature request related to a problem? Please describe.
Currently, GOrimpo relies on HTML parsing (Scraping) to fetch data from OLX. While functional, this approach is brittle as it depends on CSS selectors that frequently change. Moreover, parsing full HTML documents is resource-heavy for the VPS and slower compared to direct data fetching.
Describe the solution you'd like
I want to implement a new Scraper Adapter that communicates directly with OLX's internal API (REST or GraphQL). By fetching JSON data directly:
Describe alternatives you've considered
Technical Details (Optional)
Since we are using Hexagonal Architecture, we can implement this as a new adapter within the
internal/adapters/scraper/olx/package. We should create a struct that satisfies theScraperinterface but useshttp.Clientto hit the API endpoints instead of parsing HTML.Additional context
Initial research suggests that OLX uses a GraphQL endpoint for their modern frontend. Identifying the required headers (like
X-Device-Idor specificUser-Agents) will be key to avoiding 403 Forbidden errors. This move is a prerequisite for scaling GOrimpo to a SaaS model with high concurrency.