Skip to content

Latest commit

 

History

History
51 lines (41 loc) · 955 Bytes

File metadata and controls

51 lines (41 loc) · 955 Bytes

Tiny Search Engine

Eddie Bae (GitHub username: 20eddibae)

This repository implements the three components of CS50’s Tiny Search Engine:

  1. crawler — web crawler that pulls pages from a seed URL
  2. indexer — builds an inverted index from the crawled pages
  3. querier — answers search queries against the index

Prerequisites

  • A UNIX‐compatible shell (macOS / Linux)
  • make, gcc, standard build tools
  • Internet connection (for crawling)

Build

From the top‐level directory:

# build libcs50 and all three tools
make all

Usage

  1. Crawl
# <pagedir> must not exist or be empty
./crawler/crawler <seedURL> <pagedir> <maxDepth>
  1. Indexer
mkdir indexdir
./indexer/indexer pages indexdir

Example:

./crawler/crawler http://cs50tse.cs.dartmouth.edu/tse/letters pages 2
  1. Querier
./querier/querier indexdir
  1. Clean
make clean