Crawl articles from any sub domain of Kompas. Strip any HTML tag from the page and extract the main content (news). Extracted news is output into .doc file (news) and .xls (computed TF-IDF)
Copyright © 2015 Rudy & Stenly rudolf_bast@live.com 535120063@fti.untar.ac.id This work is free. You can redistribute it and/or modify it under the terms of the Do What The Fuck You Want To Public License, Version 2, as published by Sam Hocevar. See the LICENSE file for more details.