GitHub - JassielMG/web-scraper-ML

This project consist on extrating data from differents categoryes on the web site Mercado Libre, to performace project it build a spider to get the data necessary to download data and resctructred the data in file with csv format.

A navigation with selenium is simulated through the links to the pages of product categories, the spider extracts all the links of each product browsing through the first 10 pages.

the description, price, discount price, brand and an image link are extracted from the links of all the products by category as if the spider ate its collection; this data is extracted with BeutifulSoup implementing css selectors and regular expressions.

Once the data has been extracted, it is given a structure to be saved as files in csv format in the path of our file system.

Intructions:

Download the project from the repository on your computer.
Install all dependeces necessary with pip executing the requeriments.txt file.
Run the main.py file.
Wait the data completed the extracting and downloaded of the files csv.
once completed the download check the files in folder Files_Downloaded.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Files_Downloaded		Files_Downloaded
expected_files		expected_files
README.md		README.md
__init__.py		__init__.py
crawler.py		crawler.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages