Skip to content
ali edited this page Aug 2, 2019 · 2 revisions

Joojle search engine

Collaborators:

Project modules:

  • commons (common modules)
  • crawler
  • es_page_processor (process pages for elasticSearch)
  • page_processor (process pages for hbase)
  • search api

Build with :

  • Spark - Used to run mapReduce
  • Kafka - A distributed queue that contains 3 main topic (links, page for hbase, page for elasticsearch)
  • ElasticSearch - Used to store data and run search queries
  • Redis - Used to check politeness for domains and check to reduce updating pages for page_processors
  • HBase - Used to store data about links of a page and anchor
  • DropWizard - Used to monitoring java programs
  • JSoup - Used to parse the pages
  • Jackson - Used to serialize and deserialization page class
  • Maven - Dependency Management
  • Zookeeper - Used for managing hbase and kafka
  • Hadoop - Used for using proper file system

Clone this wiki locally