GitHub - DanielGorsia/SearchEngine: A Java project that involved parsing and processing a large corpus of files, developing core logic to retrieve documents with specific rank by using multiple parameters such as semantic modules etc and developing a GUI.

"# SearchEngine" Readme - Search Engine :

INTRODUCTION

This is our Search Engine Project. This project divided into two parts: Part A - read, parse and index It includes the following process:

Reading files from a given corpus, segmenting them into documents Parsing the corpus in batches of 50000 documents, one by one. The parsing could be executed with or without stemming
Indexing the terms of each batch: creating posting files and writing information about the terms into them. In addition, we create a file containing information about all the parsed documents and a united dictionary for the entire corpus. Part B - search and rank
Loading the posting files and dictionaries generated in part A The dictionaries are restored and loaded into memory.
After entering a query:

The query is parsed.
Its term objects are reconstructed from the dictionary
If semantics is enabled the query terms are sent to the semantics model.
The query terms and semantics terms (if enabled) are sent to the ranker who finds the 50 most relevant documents.

Project name: SearchEngine java version: java 1.8

Part A-

Part B-

Press browse button to select an index path
If the stemming option was selected on the corpus in part A,than select the stemming option again.
Press 'load Dictionary' button to load the dictionary to the memory
Type query in the Query text area, you can enable semantic option and press search or press browse button to select an text file path to enter some queries together, you can enable semantic optionan and press search button. -Now, wait until the searching process is finish, it will show you the results.
If you would like to choose document to see its entities. close the results window
and choose a query from choice menu, choose a document (according its id) from choice menu and press 'Get Entities' button.
If you would like to save the results, press browse button near the text field of 'Save the results at:' to select a path to save it, and press 'save' button.

Reset button : clicking this button will delete all content in the selected posting files path
Load dictionary : will load the term dictionary to memory
Show dictionary : shows all the unique terms in the corpus with this total tf

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Desktop/GIT		Desktop/GIT
out		out
src		src
README.md		README.md
SearchEngine.iml		SearchEngine.iml
common-lang3.jar		common-lang3.jar
medallia-word2vec-0.10.2-jar-with-dependencies.jar		medallia-word2vec-0.10.2-jar-with-dependencies.jar
word2vec.c.output.model.txt		word2vec.c.output.model.txt