CS121 Search Engine

Peter Young, Angela Xiang, David Nguyen, Michael Gearhart

Running the Search Engine:

1. Install prerequisites

pip install bs4
pip install nltk
pip install requests
Backend for gui:
- pip install fastapi
- pip install uvicorn
Frontend for gui:
- cd into the web directory and run the following command: python -m uvicorn gui_main:app --reload
- go to ./web/index.html and run VS Code live server

2. Load auxillary directory

make a directory called databases with the parent directory being Search-Engine

3. Run `pipeline.py`

input '12345678' in the commandline to fully index everything. You can do it in batches, like '1' first,
then run again with '2', but make sure it's in order.

4. Running the Non-GUI search

run ranked_search.py, which lets you input searches, and outputs urls relevant to the query

5. Running the GUI search

open a terminal and from the root directory (Search-Engine) cd into web
run the backend using python -m uvicorn gui_main:app --reload which will launch the backend to your localhost. Wait for the backend to fully load
once the backend is loaded head over to the web folder and use the VS Code Live Server (or any service to view an html page in a browser) to launch index.html. This is the home page
you can type your query into the search to get your results which will be displayed on results.html

GUI SUMMARY:

the gui uses 4 json files called cleaned_id_to_summary_part[1234].json that contains a mapping of urls to its summary using a local llm (phi3)
the llm was obtained using the Ollama application more info here: https://ollama.com/
Github repo for documentation: https://github.com/ollama/ollama
to generate the summary file go to the web directory and do the following commands--MAKE SURE you system has the hardware to run the LLM:
- after installing Ollama, go to a terminal and type ollama run phi3 to download the model. Once downloaded terminate it.
- prepare id_to_url.json by creating a new file called urlChunks in the databases directory
  - go to web/scripts and run chunkurl.py to split the json file
- in the same directory run llm.py changing the path of the json file each time then wait for the summaries to generate (it will take a long time)

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
DEV		DEV
web		web
.gitignore		.gitignore
README.md		README.md
deliverables.py		deliverables.py
indexer.py		indexer.py
pagerank.py		pagerank.py
pipeline.py		pipeline.py
ranked_search.py		ranked_search.py
run_after_index.py		run_after_index.py
search.py		search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS121 Search Engine

Peter Young, Angela Xiang, David Nguyen, Michael Gearhart

Running the Search Engine:

1. Install prerequisites

2. Load auxillary directory

3. Run `pipeline.py`

4. Running the Non-GUI search

5. Running the GUI search

GUI SUMMARY:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CS121 Search Engine

Peter Young, Angela Xiang, David Nguyen, Michael Gearhart

Running the Search Engine:

1. Install prerequisites

2. Load auxillary directory

3. Run pipeline.py

4. Running the Non-GUI search

5. Running the GUI search

GUI SUMMARY:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Run `pipeline.py`

Packages