Skip to content

team-ir/web-article-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Article Crawler

Description

Crawl articles from any sub domain of Kompas. Strip any HTML tag from the page and extract the main content (news). Extracted news is output into .doc file (news) and .xls (computed TF-IDF)

License

Copyright © 2015 Rudy & Stenly rudolf_bast@live.com 535120063@fti.untar.ac.id This work is free. You can redistribute it and/or modify it under the terms of the Do What The Fuck You Want To Public License, Version 2, as published by Sam Hocevar. See the LICENSE file for more details.

About

Crawls http://www.kompas.com/ and extract the articles

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published