This repository is dedicated to Yuri Ng π.
In this project, I have created a robust news scraper that automatically scrapes news headlines and other information, and it handles the following challenging problems:
- What to do when some websites cannot be accessed via a simple GET request?
- Some news post websites may contain several
h1tags, and only one contains the actual headline. How to automatically extract it?
Read the documentation π here.