Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .ipynb_checkpoints/README-checkpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
This project was a great challenge and achievement for me, since it was individual and I tried to push myself to the next level integrating Selenium.
I decided to retrieve data by scraping an educational resource web page (https://www.oercommons.org/).
My biggest challenge was interacting with the web page via Selenium. I tried to apply the filters the web page's search engine had available without success. However, it was very interesting getting to know different techniques to make a more accurate interaction, like WebDriverWait, using XPATH, ActionsChains, among others. Also, it made me practice and realize the importance of administering your time and working on a deliverable that may not be your dream result, but solves what you need to do and meets quality standards. Finally, synthetizing the code into functions represented a challenge for me, since I am still working on the understanding of functions and requires a lot of attention to detail, understanding of your own code, and knowing how to read errors in order to be successful.

7,274 changes: 7,274 additions & 0 deletions .ipynb_checkpoints/main-checkpoint.ipynb

Large diffs are not rendered by default.

48 changes: 3 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,4 @@
![IronHack Logo](https://s3-eu-west-1.amazonaws.com/ih-materials/uploads/upload_d5c5793015fec3be28a63c4fa3dd4d55.png)
This project was a great challenge and achievement for me, since it was individual and I tried to push myself to the next level integrating Selenium.
I decided to retrieve data by scraping an educational resource web page (https://www.oercommons.org/).
My biggest challenge was interacting with the web page via Selenium. I tried to apply the filters the web page's search engine had available without success. However, it was very interesting getting to know different techniques to make a more accurate interaction, like WebDriverWait, using XPATH, ActionsChains, among others. Also, it made me practice and realize the importance of administering your time and working on a deliverable that may not be your dream result, but solves what you need to do and meets quality standards. Finally, synthetizing the code into functions represented a challenge for me, since I am still working on the understanding of functions and requires a lot of attention to detail, understanding of your own code, and knowing how to read errors in order to be successful.

# Project: API and Web Data Scraping

## Overview

The goal of this project is for you to practice what you have learned in the APIs and Web Scraping chapter of this program. For this project, you will choose both an API to obtain data from and a web page to scrape. For the API portion of the project will need to make calls to your chosen API, successfully obtain a response, request data, convert it into a Pandas data frame, and export it as a CSV file. For the web scraping portion of the project, you will need to scrape the HTML from your chosen page, parse the HTML to extract the necessary information, and either save the results to a text (txt) file if it is text or into a CSV file if it is tabular data.

**You will be working individually for this project**, but we'll be guiding you along the process and helping you as you go. Show us what you've got!

---

## Technical Requirements

The technical requirements for this project are as follows:

* You must obtain data from an API using Python.
* You must scrape and clean HTML from a web page using Python.
* The results should be two files - one containing the tabular results of your API request and the other containing the results of your web page scrape.
* Your code should be saved in a Jupyter Notebook and your results should be saved in a folder named output.
* You should include a README.md file that describes the steps you took and your thought process for obtaining data from the API and web page.

## Necessary Deliverables

The following deliverables should be pushed to your Github repo for this chapter.

* **A Jupyter Notebook (.ipynb) file** that contains the code used to work with your API and scrape your web page.
* **An output folder** containing the outputs of your API and scraping efforts.
* **A ``README.md`` file** containing a detailed explanation of your approach and code for retrieving data from the API and scraping the web page as well as your results, obstacles encountered, and lessons learned.

## Suggested Ways to Get Started

* **Find an API to work with** - a great place to start looking would be [API List](https://apilist.fun/) and [Public APIs](https://github.com/toddmotto/public-apis). If you need authorization for your chosen API, make sure to give yourself enough time for the service to review and accept your application. Have a couple back-up APIs chosen just in case!
* **Find a web page to scrape** and determine the content you would like to scrape from it - blogs and news sites are typically good candidates for scraping text content, and [Wikipedia](https://www.wikipedia.org/) is usually a good source for HTML tables (search for "list of...").
* **Break the project down into different steps** - note the steps covered in the API and web scraping lessons, try to follow them, and make adjustments as you encounter the obstacles that are inevitable due to all APIs and web pages being different.
* **Use the tools in your tool kit** - your knowledge of intermediate Python as well as some of the things you've learned in previous chapters. This is a great way to start tying everything you've learned together!
* **Work through the lessons in class** & ask questions when you need to! Think about adding relevant code to your project each night, instead of, you know... _procrastinating_.
* **Commit early, commit often**, don’t be afraid of doing something incorrectly because you can always roll back to a previous version.
* **Consult documentation and resources provided** to better understand the tools you are using and how to accomplish what you want.

## Useful Resources

* [Requests Library Documentation: Quickstart](http://docs.python-requests.org/en/master/user/quickstart/)
* [BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
* [Stack Overflow Python Requests Questions](https://stackoverflow.com/questions/tagged/python-requests)
* [StackOverflow BeautifulSoup Questions](https://stackoverflow.com/questions/tagged/beautifulsoup)
61 changes: 61 additions & 0 deletions educational_resources_oer_df.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
,Title,Subject,Material Type,Date Added,Authors
0,01.04.2020.pdf,Information Science,Unit of Study,05/29/2020,['Dr. Bharat Singh Meena']
1,#01 Java Tutorial: Unser Hello World Programm,Career and Technical Education,Lesson,06/16/2015,['Jörg Amelunxen']
2,#02 Java Tutorial: Methoden / Funktionen,Career and Technical Education,Unit of Study,06/18/2015,['Jörg Amelunxen']
3,#03 Java Tutorial: Variablen,Career and Technical Education,Lesson,06/18/2015,['Jörg Amelunxen']
4,#04 Java Tutorial: Schleifen / Loops,Career and Technical Education,Lesson,06/18/2015,[]
5,#05 Java Tutorial: Fallunterscheidung / if,Career and Technical Education,Lesson,06/18/2015,['Jörg Amelunxen']
6,#06 Java Tutorial: Klassen,Career and Technical Education,Lesson,06/18/2015,['Jörg Amelunxen']
7,#07 Java Tutorial: Vererbung,Career and Technical Education,Lecture,01/01/2010,['JavaWeb and more (Jörg Amelunxen)']
8,#08 Java Tutorial: Dynamische Datenstrukturen,Career and Technical Education,Lecture,01/01/2010,['JavaWeb and more (Jörg Amelunxen)']
9,#09 Java Tutorial: Rekursion,Career and Technical Education,Lecture,01/01/2010,['JavaWeb and more (Jörg Amelunxen)']
10,0-Kindergarten Eureka Math,Mathematics,Activity/LabAssessment,04/07/2021,['Liberty Public Schools']
11,100 Free Web Tools for Elementary Teachers,Education,Reading,10/17/2014,['John Costilla']
12,100 People: A World Portrait,World CulturesWorld HistorySocial ScienceCultural Geography,Activity/LabDiagram/IllustrationInteractiveLessonReadingTeaching/Learning Strategy,01/31/2018,[]
13,100 Word Memoir (OER Commons Version),English Language ArtsComposition and RhetoricReading Literature,Homework/Assignment,05/11/2021,['Sarah Lyons']
14,100th Day of School,Mathematics,Interactive,10/11/2020,['Drew Penn']
15,100th Day of School Activities,Mathematics,Activity/Lab,02/16/2011,['Terry Kawas']
16,101 Ways To Kickstart Your Day In A Positive Way,"Health, Medicine and NursingCommunicationEducationPsychology",Teaching/Learning Strategy,07/31/2020,['Susan Spellman CannErin Luong']
17,10.2 SQ 3. What points of view did Enlightenment Thinkers have about government?,World History,Primary Source,08/29/2018,[]
18,10 Amazing Science Tricks Using Liquid,Applied Science,Lesson,02/01/2016,[]
19,10 FRED Activities in 10 Minutes,Economics,Activity/LabLessonLesson Plan,09/11/2019,['Mark Bayles']
20,#10 Java Tutorial: Interfaces #neue Version,Career and Technical Education,Lecture,01/01/2010,['JavaWeb and more (Jörg Amelunxen)']
21,#10 Randomized Synthesis Project,Computer Science,Activity/LabLesson,09/23/2019,['Boot up PD']
22,10 Steps to Start Your Business,Business and Communication,Full Course,10/09/2018,[]
23,10 Things You Can Do with ArcGIS Online and Story Maps,Physical Geography,Activity/LabData Set,10/30/2017,['Joseph J. Kerski']
24,10X Bigg,Mathematics,Activity/LabTeaching/Learning Strategy,08/08/2019,['Konrad Z']
25,10X Bigg,Mathematics,Activity/LabTeaching/Learning Strategy,08/08/2019,['Nancy Edwards']
26,10X Bigg,Mathematics,Activity/LabTeaching/Learning Strategy,08/08/2019,['Damien Toh']
27,10X Bigg,Mathematics,Activity/LabTeaching/Learning Strategy,08/08/2019,['JR Dingwall']
28,10X Bigg,Mathematics,Activity/LabTeaching/Learning Strategy,08/08/2019,['Jody Bauer']
29,10X Bigg,Mathematics,Activity/LabTeaching/Learning Strategy,08/08/2019,['Sayak Bhattacharyya']
30,10X Bigg,Mathematics,Activity/LabTeaching/Learning Strategy,08/08/2019,['Robyn Vsetecka']
31,10X Bigg,Mathematics,Activity/LabTeaching/Learning Strategy,08/08/2019,['Raquel Vazquez']
32,10X Bigg,Mathematics,Activity/LabTeaching/Learning Strategy,08/08/2019,['Keith Mann']
33,10X Bigg,Mathematics,Activity/LabLesson PlanTeaching/Learning Strategy,08/08/2019,['Anne Collier']
34,10X Bigger!,Mathematics,Activity/LabTeaching/Learning Strategy,01/28/2016,['Admin']
35,10X Bigger! - Remix2,Mathematics,Activity/LabTeaching/Learning Strategy,08/08/2019,['Laurie Wyatt']
36,10 for the Win!,Mathematics,Lesson Plan,11/28/2017,['Carmen Blackley']
37,10th Grade CK-12 Biology Text,Life Science,Full Course,01/28/2016,['John Kinney']
38,10th Grade ELA: Information Fluency,Education,Activity/LabAssessmentHomework/AssignmentLessonLesson PlanTeaching/Learning StrategyUnit of Study,08/08/2019,['Crystal HurtBeth Kabes']
39,10th Grade English,Arts and Humanities,Activity/LabDiagram/IllustrationHomework/AssignmentLesson PlanReading,10/15/2014,['Christopher Arnett']
40,10th's and Decimals,Mathematics,Lecture,07/24/2008,['U.S. Department of EducationWNET']
41,1.10 Classroom Culture: Routines & Procedures,Education,Primary Source,07/23/2018,['Beth Kabes']
42,1.1.1 Introduction to mineral resources,Biology,Lesson Plan,02/19/2020,['NAJMUDDDEEN ALHASSAN']
43,1.1 Anat,Life Science,Diagram/IllustrationLecture NotesLessonReading,08/08/2019,['Douglas Hathaway']
44,1.1 Anatomia e Fisiologia do sistema respiratorio,Life Science,Diagram/IllustrationLecture NotesLessonReading,08/08/2019,['Antonio Archetti']
45,1.1 Anatomy and Physiology of Respiratory System,Life Science,Diagram/IllustrationLecture NotesLessonReading,02/03/2018,['Paul Hudson']
46,#11 Dance Fever,Computer Science,Activity/LabLesson,09/23/2019,['Boot up PD']
47,1.1 - Introduction to Earth Science_May,Applied Science,Activity/Lab,05/10/2019,['Chris Omasits']
48,1.1: Introduction to Microbiology,Biology,Lecture Notes,08/22/2019,[]
49,#11 Java Tutorial: Abstrakte Klassen #neue Version,Career and Technical Education,Lecture,01/01/2010,['JavaWeb and more (Jörg Amelunxen)']
50,1.1 Study of Life,Biology,Module,08/05/2019,['Urbi Ghosh']
51,11th Grade Resume Writing and Mock Interview,Education,Lesson,09/24/2020,['Rachael HaverstickElizabeth Kline']
52,12.1 Patterns of Inhertance Mendelian Genetics,Life Science,Module,08/05/2019,['Urbi Ghosh']
53,12.2 Mendelian Genetics (dominance / recessive traits),Biology,Module,08/06/2019,['Urbi Ghosh']
54,1234: What is a Rube Goldberg Machine?,Architecture and DesignEngineeringReading Informational Text,Homework/Assignment,08/27/2019,['Wendee Mullikin']
55,12.4.1 Non-Mendelian Genetics (video) 3 types of dominance,Biology,Module,08/05/2019,['Urbi Ghosh']
56,"12.4.2 Non-Mendelian Genetics (video) Genetic recombination, X-linked traits",Biology,Module,08/05/2019,['Urbi Ghosh']
57,"12.4 Epistasis, Y linked patterns of inheritance, multiple alleles, ABO blood group (part 3)",Biology,Module,08/05/2019,['Urbi Ghosh']
58,#12 Animated Card,Computer Science,Activity/LabLesson,09/23/2019,['Boot Up PD']
59,#12 Java Tutorial: GUI - Unser erstes Fenster #neue Version,Career and Technical Education,Lecture,01/01/2010,['JavaWeb and more (Jörg Amelunxen)']
Loading