webscraping

A set of python classes that scrape business listing websites.

Alimarket web scraper: The python class here scrapes the specified page of alimarket.es for business entries, visits the specific pages to scrape further information, and saves the results in a .csv file. It takes input of business area and number of pages, which references the number of pages of business listings to scrape.
ElEconomista web scraper: The python class takes in a .csv of business names (which should be in a single column, no header, and no other information in the .csv) and searches the ElEconomista website for each business. If a possible page is found, the link to possible page is saved. Then the scraper visits that page and attempts to scrape contact information. Any businesses for which no search result can be found or no contact information is found will not be included in the output file. Results are saved in a .csv file.

Verify you have the packages listed in requirements.txt
See .ipynb files for sample code on usage. For example, alimarket-scraper-class.ipynb contains sample code for usage in lower cells.

The .json files contain headers for use with requests library. These are indexed '0' to '49'.
The .gitignore is just to ignore .csv files in the directory. A .csv file is necessary as input for eleconomista webscraper.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alimarket-scraper-class.ipynb		alimarket-scraper-class.ipynb
alimarket-scraper-class.py		alimarket-scraper-class.py
eleconomista_scraper_class.py		eleconomista_scraper_class.py
eleconomista_webscraper.ipynb		eleconomista_webscraper.ipynb
headers.json		headers.json
headers_2.json		headers_2.json
requirements.txt		requirements.txt

Provide feedback