Skip to content

Web scraper for text-mining raw data from the internet for further processing to make useful information.

License

Notifications You must be signed in to change notification settings

aalaydrus/JS-Scraper

Repository files navigation

JS-Scraper

There has been many cases that proved teenagers have less tendency to read newspaper articles. As a fellow teenager myself, I have put an effort to combat this issue by simplifying the process of reading the news easier and more convenient by providing a technique that extracts news summary and and classifying them based on the level of how positive or negative they are. Hoping that this will be of more fun and engaging way to improve the rate of teenage awareness in current affairs.

NOTE: URL of the news website is removed from the source code (dont ask me why)

The project is done with Node.js along with the Puppeteer library and more.

Web scraping because #WhoNeedsAPIsBoiiiii

Development Goals

Interface & Front-End:

  • Decide on front-end framework to implement initial GUI
  • Create a basic GUI

Goals for each article scrape:

  • Scrape URL
  • Scrape Date & Author's name
  • Scrape Images (Check legal rights)

Output format and testing:

  • CSV
  • JSON
  • SQL
  • XQuery

Sentiment Analysis:

  • Identify positive and negative tokens (Basic)
  • Indonesian language SA?
  • Identify positive and negative tokens (More Consistent)

Database Construction:

  • MongoDB to accept JSON
  • Test Queries on MongoDB
  • Automate entries to MongoDB with Back-end script

Credits & References:

optikalefx - https://plus.google.com/+optikalefxx

About

Web scraper for text-mining raw data from the internet for further processing to make useful information.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •