This is a minified search engine that specializes in discovering the top 10 most relevant documents in the Los Angeles Times Collection. The collection has 136k+ documents but this search engine's performance can retrieve those relevant documents in several milliseconds.
- IndexEngine
- Lexicon
- Query Interpreter
- Snippet Engine
- Ranking Engine
Note: Does not have a web crawler.
The Index Engine creates an inverted index such that as it is indexing each document, it's also tokenizing each word as an id and mapping it to an postings list. The posting list consists of the document id and the number of times the word appears in that document. Using an inverted index saves a significant amount of space compared to a matrix form.
The snippet summaries implemented underneath each document are biased towards the a given query and thus dynamic. In other words, the snippet will change depending on the query that the user inputs.


