Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
20 changes: 20 additions & 0 deletions Data/Airbnb_Cancun.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Titles,Price,Rating,Evaluation,Amenities,Links
"Céntrico estudio hasta para 3 personas, Carey #3 🌴",$420 MXN por noche,"['3 huéspedes', 'estudio', '2 camas', '1 baño', 'Wifi', 'Aire acondicionado', 'Cocina']",5.0,12 evaluaciones,https://www.airbnb.mx/rooms/51815427?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Beautiful Flat downtown,$800 MXN por noche,"['3 huéspedes', '1 habitación', '2 camas', '1 baño', 'Wifi', 'Aire acondicionado', 'Cocina']",4.47,49 evaluaciones,https://www.airbnb.mx/rooms/45796417?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Cozy Room by the Lagoon,$624 MXN por noche,"['2 huéspedes', '1 habitación', '1 cama', '1 baño', 'Wifi', 'Aire acondicionado', 'Alberca', 'Cocina']",4.89,194 evaluaciones,https://www.airbnb.mx/rooms/26236425?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Departamento nuevo en la zona hotelera,"$1,123 MXN por noche","['2 huéspedes', '1 habitación', '1 cama', '1 baño', 'Wifi', 'Aire acondicionado', 'Alberca']",4.79,121 evaluaciones,https://www.airbnb.mx/rooms/40843862?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Beautiful suite Cancun,$450 MXN por noche,"['3 huéspedes', '1 habitación', '1 cama', '1 baño', 'Wifi', 'Aire acondicionado']",4.77,13 evaluaciones,https://www.airbnb.mx/rooms/51082606?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
"Céntrico Estudio con hermosa terraza, #6.",$470 MXN por noche,"['4 huéspedes', 'estudio', '2 camas', '1 baño', 'Wifi', 'Aire acondicionado', 'Cocina']",4.98,81 evaluaciones,https://www.airbnb.mx/rooms/40371464?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Bright modern loft#1 w/rooftop pool by Pto Juarez,$915 MXN por noche,"['3 huéspedes', '1 habitación', '2 camas', '1 baño', 'Wifi', 'Aire acondicionado', 'Alberca', 'Cocina']",4.89,18 evaluaciones,https://www.airbnb.mx/rooms/51488526?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Seafront ocean view apartment. Great location,"$1,534 MXN por noche","['2 huéspedes', '1 habitación', '1 cama', '1 baño completo y uno de tocador', 'Wifi', 'Aire acondicionado', 'Alberca', 'Cocina']",4.87,15 evaluaciones,https://www.airbnb.mx/rooms/41067627?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
S3❤StudiosRubia-near🏖+AC❄+WiFi + ♛Bd+Work ✔,$437 MXN por noche,"['2 huéspedes', '1 habitación', '1 cama', '1 baño', 'Wifi', 'Aire acondicionado', 'Cocina']",4.76,49 evaluaciones,https://www.airbnb.mx/rooms/44595987?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Beautiful house with private pool,$776 MXN por noche,"['4 huéspedes', '1 habitación', '1 cama', '1 baño', 'Wifi', 'Aire acondicionado', 'Alberca', 'Cocina']",4.41,46 evaluaciones,https://www.airbnb.mx/rooms/47349263?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
"Frente al mar, hermosa playa, nuevo 03","$1,746 MXN por noche","['2 huéspedes', '1 habitación', '1 cama', '1 baño', 'Wifi', 'Aire acondicionado', 'Alberca']",4.98,47 evaluaciones,https://www.airbnb.mx/rooms/44938840?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Beautiful suite,$450 MXN por noche,"['2 huéspedes', '1 habitación', '1 baño', 'Wifi', 'Aire acondicionado']",4.60,5 evaluaciones,https://www.airbnb.mx/rooms/51001198?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Beautiful Suite Cancun,$450 MXN por noche,"['2 huéspedes', '1 habitación', '1 cama', '1 baño', 'Wifi', 'Aire acondicionado']",4.47,49 evaluaciones,https://www.airbnb.mx/rooms/39774536?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Beautiful Double suite Cancun,$420 MXN por noche,"['4 huéspedes', '1 habitación', '2 camas', '1 baño', 'Wifi', 'Aire acondicionado']",4.51,43 evaluaciones,https://www.airbnb.mx/rooms/39693741?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
"Nueva suite privada #1,Acceso independiente,Centro",$560 MXN por noche,"['2 huéspedes', '1 habitación', '1 cama', '1 baño', 'Wifi', 'Aire acondicionado']",4.88,259 evaluaciones,https://www.airbnb.mx/rooms/30952852?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Beautiful suite downtown Cancun,$559 MXN por noche,"['2 huéspedes', 'estudio', '1 cama', '1 baño', 'Wifi', 'Aire acondicionado']",4.56,85 evaluaciones,https://www.airbnb.mx/rooms/39692908?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Jardines de la Costa una experiencia magnifica,$623 MXN por noche,"['4 huéspedes', '1 habitación', '2 camas', '1 baño', 'Wifi', 'Aire acondicionado', 'Alberca', 'Cocina']",4.86,106 evaluaciones,https://www.airbnb.mx/rooms/34721797?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Departamento en el corazón de la zona hotelera.,$933 MXN por noche,"['2 huéspedes', '1 habitación', '1 baño completo y uno de tocador', 'Wifi', 'Aire acondicionado', 'Cocina']",4.71,14 evaluaciones,https://www.airbnb.mx/rooms/51624590?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
Great beach studio! only two guests,"$1,265 MXN por noche","['2 huéspedes', '1 habitación', '1 cama', '1 baño', 'Wifi', 'Aire acondicionado', 'Alberca', 'Cocina']",4.85,157 evaluaciones,https://www.airbnb.mx/rooms/30849389?adults=2&previous_page_section_name=1000&federated_search_id=484f82f9-6347-4013-b34b-745377d47091
16 changes: 16 additions & 0 deletions Data/Código_Postal.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
id,d_codigo,d_asenta,d_tipo_asenta,d_mnpio,d_estado,d_ciudad,d_cp,c_estado,c_oficina,c_cp,c_tipo_asenta,c_mnpio,id_asenta_cpcons,d_zona,c_cve_ciudad
54367,42950,Centro,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,0872,Urbano,09
54368,42950,La Cruz,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,0921,Urbano,09
54369,42952,Educación,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,2083,Urbano,09
54370,42952,Villa Jardín,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,5539,Urbano,09
54371,42952,Valle San Pedro,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,5540,Urbano,09
54372,42952,Tlaxcoapan,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,5546,Urbano,09
54373,42952,Industrial,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,2084,Urbano,09
54374,42952,La Vega,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,1082,Urbano,09
54375,42952,Carrizos,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,5545,Urbano,09
54376,42953,Magisterial,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,5541,Urbano,09
54377,42953,Tlaxcoapan,Barrio,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,02,074,5542,Urbano,09
54378,42953,Lomas de Tlaxcoapan,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,1005,Urbano,09
54379,42954,Ciudadela,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,2086,Urbano,09
54380,42954,Morelos,Fraccionamiento,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,21,074,5543,Urbano,09
54381,42954,Morelos,Colonia,Tlaxcoapan,Hidalgo,Tlaxcoapan,42951,13,42951,,09,074,2087,Urbano,09
73 changes: 15 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,24 @@
![IronHack Logo](https://s3-eu-west-1.amazonaws.com/ih-materials/uploads/upload_d5c5793015fec3be28a63c4fa3dd4d55.png)
# Web-Project

# Project: API & Web Data Scraping and Web Data Pipeline
Web Scraping.

## Overview
En Web Scraping utilicé la página de Airbnb, donde se arrojó un archivo con los datos de la primera página con el filtro de Superanfitriones siendo Cancún, México el destino turístico.

The goal of this project is for you to practice what you have learned in the APIs and Web Scraping chapter of this program. For this project, you will choose both an API to obtain data from and a web page to scrape. For the API portion of the project will need to make calls to your chosen API, successfully obtain a response, request data, convert it into a Pandas data frame, and export it as a CSV file. For the web scraping portion of the project, you will need to scrape the HTML from your chosen page, parse the HTML to extract the necessary information, and either save the results to a text (txt) file if it is text or into a CSV file if it is tabular data.
Se inspeccionó la estructura del código de la página para observar que datos se podían extraer y fueron los siguientes:

Aditionally, after you obtain both CSV files you will practice what you have learned in the Intermediate Python and Data Engineering chapter of this program. You will need to import the CSV files and use your newly-acquired skills to build a data pipeline that processes the data and produces a result. You should demonstrate your proficiency with the tools we covered (functions, list comprehensions, string operations, and error handling) in your pipeline.
- Título
- Precio
- Rating
- Evaluaciones
- Amedidades
- Link

**You will be working individually for this project**, but we'll be guiding you along the process and helping you as you go. Show us what you've got!
Fue laborioso el buscar la clase de cada una de ellas porque se encontraban anidadas. Después de tener la información se realizó la limpieza de cada una de ellas para que se tuviera de la forma más clara posible. Lo más desafiante para mi fue realizar las funciones, pero revisando e investigando dentro de los temas dados anteriormente pude lograr realizarlo.

---
API Scraping.

## Technical Requirements
Uno de los obstáculos para este tema fue encontrar alguna página que pudiera dar acceso a su información. Se encontró una página de la SEPOMEX gratuita donde se pueden encontrar los cógidos postales del país y se decidió realizar un DataFrame de un municipio del Estado de Hidalgo.

The technical requirements for this project are as follows:
Encontrar los datos de esta manera fue más sencillo en comparación de Web Scraping. Al final solo se realizó la limpieza de estos para poder ser llevados a un DF, y al final a un archivo .csv.

* You must obtain data from an API using Python.
* You must scrape and clean HTML from a web page using Python.
* The results should be two files - one containing the tabular results of your API request and the other containing the results of your web page scrape.
* Your code should be saved in a Jupyter Notebook and your results should be saved in a folder named output.

* You must construct a data pipeline with the majority of your code wrapped in functions.
* Each data pipeline stage should be covered: acquisition, wrangling, analysis, and reporting.
* You must demonstrate all the topics we covered in the chapter (functions, list comprehensions, string operations, and error handling) in your processing of the data.
* There should be some data set that gets imported and some result that gets exported.
* Your code should be saved in a Python executable file (.py), your data should be saved in a folder named data, and your results should be saved in a folder named output.

* You should include a README.md file that describes the steps you took and your thought process for obtaining data from the API and web page.


## Necessary Deliverables

The following deliverables should be pushed to your Github repo for this chapter.

* **A Jupyter Notebook (.ipynb) file** that contains the code used to work with your API and scrape your web page.
* **An output folder** containing the outputs of your API and scraping efforts.
* **A Python (.py) code file** that contains the code for your data pipeline.
* **A data folder** containing your data set.
* **An output folder** containing the output of your data pipeline.
* **A ``README.md`` file** containing a detailed explanation of your approach and code for retrieving data from the API and scraping the web page as well as your results, obstacles encountered, and lessons learned.

## Suggested Ways to Get Started

* **Find an API to work with** - a great place to start looking would be [API List](https://apilist.fun/) and [Public APIs](https://github.com/toddmotto/public-apis). If you need authorization for your chosen API, make sure to give yourself enough time for the service to review and accept your application. Have a couple back-up APIs chosen just in case!
* **Find a web page to scrape** and determine the content you would like to scrape from it - blogs and news sites are typically good candidates for scraping text content, and [Wikipedia](https://www.wikipedia.org/) is usually a good source for HTML tables (search for "list of...").
* **Examine the data and come up with a deliverable** before diving in and applying any methods to it.
* **Break the project down into different steps** - note the steps covered in the API and web scraping lessons, try to follow them, and make adjustments as you encounter the obstacles that are inevitable due to all APIs and web pages being different.
* **Use the tools in your tool kit** - your knowledge of intermediate Python as well as some of the things you've learned in previous chapters. This is a great way to start tying everything you've learned together!
* **Work through the lessons in class** & ask questions when you need to! Think about adding relevant code to your project each night, instead of, you know... _procrastinating_.
* **Commit early, commit often**, don’t be afraid of doing something incorrectly because you can always roll back to a previous version.
* **Consult documentation and resources provided** to better understand the tools you are using and how to accomplish what you want.

## Useful Resources API and Web Data Scraping

* [Requests Library Documentation: Quickstart](http://docs.python-requests.org/en/master/user/quickstart/)
* [BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
* [Stack Overflow Python Requests Questions](https://stackoverflow.com/questions/tagged/python-requests)
* [StackOverflow BeautifulSoup Questions](https://stackoverflow.com/questions/tagged/beautifulsoup)

## Useful Resources Web Data Pipeline

* [Python Functional Programming How To Documentation](https://docs.python.org/3.7/howto/functional.html)
* [Python List Comprehensions Documentation](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions)
* [Python Errors and Exceptions Documentation](https://docs.python.org/3/tutorial/errors.html)
* [StackOverflow String Operation Questions](https://stackoverflow.com/questions/tagged/string+python)
Al terminar con este proyecto ha quedado más claro los temas utilizados para el mismo. Así como de los métodos que en su momento fueron utilizados.
31 changes: 31 additions & 0 deletions your-code/API_Scraping.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import requests
import time
from pathlib import Path

import pandas as pd

res = requests.get('http://sepomex.icalialabs.com/api/v1/zip_codes?city=tlaxcoapan')

res.json()

url = 'http://sepomex.icalialabs.com/api/v1/zip_codes?city=tlaxcoapan'
ciudades = []
for i in range(15):
res = requests.get(url)
ciudad = res.json()
ciudad.keys()
ciudades.append(ciudad['zip_codes'])

df = pd.DataFrame(ciudades[0])

filename = "Código_Postal.csv"
filename = "Data/"+ filename

if not Path(filename).is_file():
df.to_csv(filename, index=False)
print(f"{filename} saved.")
else:
print('File already exists.')


print("Finished")
Loading