-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathweb_scraping.py
More file actions
24 lines (17 loc) · 904 Bytes
/
web_scraping.py
File metadata and controls
24 lines (17 loc) · 904 Bytes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Web scraping is the process of gathering data from the internet automatically
# what we do in web scraping is we grab a html section from a website
# as every websit is unique, so is it's web scraping script.
# A web-site consists of 3 parts: HTML CSS and JS
# HTML is used to create the basic layout and structure of the website
# CSS is used to pprovide designs
# JavaScript is used to control the interactive elements of the website.
# for web scraping we need to install 3 libraries: requests, lxml and bs4
import requests
import lxml
import bs4
# now let us open a webpage. Example.com--->>> It is a demo webpage
web_page=requests.get("http://example.com/")
#print(web_page.text) # This line actually prints out the raw html text
# To make the text more readable, we will use the bs4 or the Beautiful Soup
soup=bs4.BeautifulSoup(web_page.text,"lxml")
print(soup)