-
Notifications
You must be signed in to change notification settings - Fork 3
Home
ali edited this page Aug 2, 2019
·
2 revisions
Collaborators:
- commons (common modules)
- crawler
- es_page_processor (process pages for elasticSearch)
- page_processor (process pages for hbase)
- search api
- Spark - Used to run mapReduce
- Kafka - A distributed queue that contains 3 main topic (links, page for hbase, page for elasticsearch)
- ElasticSearch - Used to store data and run search queries
- Redis - Used to check politeness for domains and check to reduce updating pages for page_processors
- HBase - Used to store data about links of a page and anchor
- DropWizard - Used to monitoring java programs
- JSoup - Used to parse the pages
- Jackson - Used to serialize and deserialization page class
- Maven - Dependency Management
- Zookeeper - Used for managing hbase and kafka
- Hadoop - Used for using proper file system