A search engine for searching Wikipedia XML dumps.
python3 and PyStemmer library is required to run the search engine.
To create the inverted index, run the following command.
python3 wiki_indexer.py <path_to_wiki_dump_folder> <path_to_index_folder>The arguments are the paths to the Wikipedia XML dump folder which contains all the wiki dump files and the folder where the inverted index is to be created and stored.
To search, run the following command.
python3 wiki_search.py <path_to_query_file>-
Index is split into many files. Each file's name is a number starting from 0 up until 3076 and the file extension is .txt.
-
There is a titles folder. Each file's name is a number starting from 0 up until 4914 and the file extension is .txt.