A set of python classes that scrape business listing websites.
- Alimarket web scraper: The python class here scrapes the specified page of alimarket.es for business entries, visits the specific pages to scrape further information, and saves the results in a
.csvfile. It takes input of business area and number of pages, which references the number of pages of business listings to scrape. - ElEconomista web scraper: The python class takes in a
.csvof business names (which should be in a single column, no header, and no other information in the.csv) and searches the ElEconomista website for each business. If a possible page is found, the link to possible page is saved. Then the scraper visits that page and attempts to scrape contact information. Any businesses for which no search result can be found or no contact information is found will not be included in the output file. Results are saved in a.csvfile.
- Verify you have the packages listed in
requirements.txt - See
.ipynbfiles for sample code on usage. For example,alimarket-scraper-class.ipynbcontains sample code for usage in lower cells.
- The
.jsonfiles contain headers for use with requests library. These are indexed'0'to'49'. - The
.gitignoreis just to ignore.csvfiles in the directory. A.csvfile is necessary as input for eleconomista webscraper.