Data_scrapping

Projects of data web scrapping

Garfield comic strip scrapper

Requisites:

Python 3 code tested with the libraries (it may work with other versions):

beautifulsoup4==4.6.3
requests==2.20.1
urllib3==1.24.1

Library installation process:

pip install beautifulsoup4==4.6.3 requests==2.20.1 urllib3==1.24.1

Tests and checks

To verify the correct performance of the code, there is a small test to be checked. Sometimes the web updates the layout and the code stop working.

To launch the test:

python comic_strips/test/functionality_check.py

If no error are triggered, the the complete code should work, as now is explained.

Execution:

There are two versions of the code:

New daily comic strip grabber:

With the file comic_strip_grabber.py daily Spanish comics strips are gathered.

Usage:

Configure the date parameters inside the file and execute it

python comic_strips/comic_strip_grabber.py

or give it parameters (start dates and end date) on the call with format '%Y/%m/%d':

python comic_strips/comic_strip_grabber.py "2017/01/01" "2017/01/02"

First old english comic strips

With the file classic_emg_first_strips.py older English comics strips are gathered. The launch is the same as the previous one:

Configure the date parameters inside the file and execute it

python comic_strips/classic_emg_first_strips.py

or give it parameters (start dates and end date) on the call with format '%Y/%m/%d':

python comic_strips/classic_emg_first_strips.py "2017/01/01" "2017/01/02"

Output

Comic strips are stored inside the folder extracted_images as PNG images and format: "garf_YYYY_MM_DD.png"

Also all logs and information are stored into a scv file.

Manga chapter scrapper

(WIP) Currently there is a notebook under the folder online_manga that extracts full chapters from online viewers ands saves them as images locally.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
comics_strips		comics_strips
online_manga		online_manga
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data_scrapping

Garfield comic strip scrapper

Execution:

Output

Manga chapter scrapper

About

Uh oh!

Releases

Packages

Languages

ylrax/data_scrapping

Folders and files

Latest commit

History

Repository files navigation

Data_scrapping

Garfield comic strip scrapper

Execution:

Output

Manga chapter scrapper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages