Skip to content

Snowsearch is an automated systematic literature review tool chain that finds rounds of related papers and ranks them by relevance using an LLM of your choice.

Notifications You must be signed in to change notification settings

dlg1206/snowsearch

Repository files navigation

SnowSearch

An automated systematic literature review tool chain that finds rounds of related papers and ranks them by relevance using an LLM of your choice.

✨ Features

  • 🔍 Automated paper search
  • 🧠 LLM natural language → structured query generation
  • 📊 LLM-based abstract ranking
  • 🚀 GPU support

Table of Contents

Quickstart

  1. Build the image
docker build -t snowsearch .
  1. Create a .env file Other env vars can be set, but for a minimal setup NEO4J_AUTH must be set
NEO4J_AUTH=neo4j/<your password here>

If using OpenAI, also add your API key under OPENAI_API_KEY

  1. Launch the compose stack
docker compose up

Append the -d to launch in the background. Make sure to wait a few moments before launching the CLI to ensure all services are running

Warning

The grobid and ollama images are several gigabytes large, they will take a while to download on first time setup

  1. Launch the CLI
docker run --rm -it --env-file=.env --network=snowsearch-net snowsearch slr "Your search here"

and snowsearch will do the rest! For the list of all options, see the commands readme.

Note

The cli will download a model from ollama automatically (Default is llama3:latest), but you can download the model ahead of time by running docker exec ollama pull <desired model>. This only applies if using ollama.

Accessing the Database

example output from neo4j gui

SnowSearch uses Neo4j, a graph based database, to store data about the papers collected. Neo4j does have a GUI available to more directly view the data. To access, launch the compose stack with the following command:

docker compose -f compose.yaml -f override/database.yaml up

This will open the database ports, which will now be available at localhost:7474 if using the default settings. The username and password will be the same ones set in your .env file.

Enabling GPU Access

If your machine supports cuda, there are options available to allow access to your GPUs to great increase Grobid and ollama performance. To enable, launch the compose stack with the following command:

docker compose -f compose.yaml -f override/gpu.yaml up

Note

Overrides are not exclusive. The only requirement is that the first -f argument MUST be compose.yaml, then overrides can be used in any order. For example, to run with database access and GPU support, run docker run --rm -it -f compose.yaml -f override/database.yaml -f override/gpu.yaml up

Local Deployment and Development

  1. Setup virtual Python environment
pip -m venv venv

Then activate

Windows:

. venv/Scripts/activate

Linux:

. venv/bin/activate
  1. Install dependencies
pip install -r requirments.txt
  1. Test script
python3 snowsearch -h

The help menu should show.

The compose services will also need to be running, which you can launch like so:

docker compose up -f compose.yaml -f override/dev.yaml

This will open up all service ports so scripts run on your machine will be able to access them through localhost and their respective ports. For working with the database, see the database doc for a rundown to how objects have been structured.

Acknowledgments

This toolchain would not be possible without these amazing projects, be sure to check them out!

  • Grobid: A machine learning software for extracting information from scholarly documents
  • OpenAlex: A free database of over 240 million scholarly works (articles, books, datasets, theses)
  • findpapers (Honorable Mention): An application that helps researchers who are looking for references for their work. The application will perform searches in several databases. I did not use this tool in the end but served as inspiration for many aspects of snowsearch.

Questions or Issues?

If you encounter a bug, have a question, or want to suggest a feature, feel free to open a GitHub issue! For contributing, see CONTRIBUTING.md.

About

Snowsearch is an automated systematic literature review tool chain that finds rounds of related papers and ranks them by relevance using an LLM of your choice.

Topics

Resources

Contributing

Stars

Watchers

Forks