Skip to content

ramkishore07s/WikiSearchEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WikiSearchEngine

1. HOW TOs

To create index run

./index.sh <path to wikiDump>

Then to search, run

./search.sh

You will see something like this:

---------------------------------------------------------------------------------
➜  wikiSearch ./search.sh
'text:;cat:<...>;ref:<...>;title:<...>;link:<...>'

> 
---------------------------------------------------------------------------------

The prompt is where you type your query.

NOTE: QUOTATIONS AROUND QUERY ARE MUST.

2. QUERY SYNTAX:

'\<fieldname1\>: \<word1\> \<word2\>;\<fieldname2\>: \<word3\> \<word4\>;'

NOTE:

  1. QUOTATIONS AROUND QUERY ARE MUST.
  2. SEMICOLONS AFTER EACH FIELD IS A MUST.
  3. NOT ALL FIELDS ARE MUST.

Fields:

  • text -> text of the corpus
  • title -> title of each document in the corpus
  • ref -> reference
  • cat -> category
  • link -> links in each page

Sample query:

> 'text: android lollipop;title: google;'

3. STATISTICS

  • The search module was coded in PYTHON 2.7.
  • 90% of the time results will be given be under 0.03 seconds for 4 words.

4. IMPLEMENTATION

  • The indexer is implemented in JAVA and the search is written in PYTHON.
  • It is an implementation of BSBI algorithm.
  • In total six index files are constructed, three for the corpus, three for Document name retrieval.
  • Two small tertiary indexes in memory, two secondary indexes, one inverted index and one file of document names to be retrieved are created.
  • Can lookup any posting in just two disk accesses. This makes the search really fast.

About

Index and Search wikiDump

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published