A focused data extraction tool that collects customer and brand information from Go10.co.uk into clean, structured datasets. It helps teams centralize partner details, visual assets, and metadata for research, monitoring, and catalog building.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for g10-parser-spider you've just found your team — Let’s Chat. 👆👆
This project extracts structured customer and brand data from Go10.co.uk pages, turning unstructured listings into usable datasets. It solves the problem of manually collecting partner and brand information spread across pages and formats. It’s built for analysts, marketers, and product teams who need reliable partner data at scale.
- Parses customer and brand entities from Go10 pages
- Normalizes names, images, descriptions, and links
- Preserves contextual metadata like dates and identifiers
- Produces consistent JSON outputs ready for integration
| Feature | Description |
|---|---|
| Entity detection | Identifies whether a record represents a customer or a brand. |
| Rich metadata capture | Extracts names, images, descriptions, dates, and links. |
| Structured output | Delivers clean JSON arrays for easy parsing and storage. |
| Scalable crawling | Handles multiple URLs in a single run efficiently. |
| Stable access | Supports proxy configuration for reliable data retrieval. |
| Field Name | Field Description |
|---|---|
| type | Entity type such as customer or brand. |
| name | Display name of the customer or brand. |
| post_id | Internal identifier associated with the listing. |
| date | Publication or listing date when available. |
| image_url | URL of the associated image or logo. |
| description | Textual description of the entity. |
| search_query | Query term used to locate the entity. |
| first_link | Primary external link related to the entity. |
[
{
"type": "customer",
"name": "firstclass",
"post_id": "15527",
"date": "2021-02-18T10:18:49+00:00",
"image_url": "https://www.go10.co.uk/wp-content/uploads/2019/04/firstclass.jpg",
"description": null,
"search_query": "firstclass",
"first_link": "https://firstclass.com"
},
{
"type": "brand",
"name": "Hover-1",
"post_id": null,
"date": null,
"image_url": "https://www.go10.co.uk/wp-content/uploads/2024/08/hover-1-square.png",
"description": "Number 1 brand of hoverboards and e-scooters.",
"search_query": "Hover-1",
"first_link": "https://hover-1.com"
}
]
g10 parser spider/
├── src/
│ ├── runner.py
│ ├── extractors/
│ │ ├── go10_entity_parser.py
│ │ └── html_utils.py
│ ├── outputs/
│ │ └── json_exporter.py
│ └── config/
│ └── settings.example.json
├── data/
│ ├── inputs.sample.txt
│ └── sample_output.json
├── requirements.txt
└── README.md
- Business development teams use it to track Go10 partners so they can identify outreach opportunities faster.
- Market researchers use it to analyze e-mobility brands so they can map competitors and positioning.
- Content teams use it to build brand directories so they can enrich websites with accurate assets.
- E-commerce analysts use it to monitor partner listings so they can spot updates and changes early.
Does it support multiple Go10 URLs in one run? Yes, you can provide an array of URLs, and the scraper will process each sequentially into a single dataset.
What happens if some fields are missing on a page? Missing values are returned as null, keeping the output schema consistent and predictable.
Can the output be integrated into databases or pipelines? Yes, the structured JSON format is suitable for direct ingestion into databases, dashboards, or ETL workflows.
Is proxy usage required? Proxy configuration is optional but recommended for stable access when running at scale.
Primary Metric: Processes an average Go10 page in under 2 seconds with full entity extraction.
Reliability Metric: Maintains a successful extraction rate above 98% across mixed customer and brand pages.
Efficiency Metric: Handles dozens of URLs per run with minimal memory overhead.
Quality Metric: Delivers consistently structured records with high field completeness for downstream use.
