An automated systematic literature review tool chain that finds rounds of related papers and ranks them by relevance using an LLM of your choice.
- 🔍 Automated paper search
- 🧠 LLM natural language → structured query generation
- 📊 LLM-based abstract ranking
- 🚀 GPU support
- Quickstart
- Accessing the Database
- Enabling GPU Access
- Local Deployment and Development
- Acknowledgments
- Questions or Issues?
- Build the image
docker build -t snowsearch .- Create a
.envfile Otherenvvars can be set, but for a minimal setupNEO4J_AUTHmust be set
NEO4J_AUTH=neo4j/<your password here>
If using OpenAI, also add your API key under OPENAI_API_KEY
- Launch the compose stack
docker compose upAppend the -d to launch in the background. Make sure to wait a few moments before launching the CLI to ensure all
services are running
Warning
The grobid and ollama images are several gigabytes large, they will take a while to download on first time setup
- Launch the CLI
docker run --rm -it --env-file=.env --network=snowsearch-net snowsearch slr "Your search here"and snowsearch will do the rest! For the list of all options, see the commands readme.
Note
The cli will download a model from ollama automatically (Default is llama3:latest), but you can download the model
ahead of time by running docker exec ollama pull <desired model>. This only applies if using ollama.
SnowSearch uses Neo4j, a graph based database, to store data about the papers collected. Neo4j does have a GUI available to more directly view the data. To access, launch the compose stack with the following command:
docker compose -f compose.yaml -f override/database.yaml upThis will open the database ports, which will now be available at localhost:7474 if using the
default settings. The username and password will be the same ones set in your .env file.
If your machine supports cuda, there are options available to allow access to your GPUs to great increase Grobid and ollama performance. To enable, launch the compose stack with the following command:
docker compose -f compose.yaml -f override/gpu.yaml upNote
Overrides are not exclusive. The only requirement is that the first -f argument MUST be compose.yaml, then
overrides can be used in any order. For example, to run with database access and GPU support, run
docker run --rm -it -f compose.yaml -f override/database.yaml -f override/gpu.yaml up
- Setup virtual Python environment
pip -m venv venvThen activate
Windows:
. venv/Scripts/activateLinux:
. venv/bin/activate- Install dependencies
pip install -r requirments.txt- Test script
python3 snowsearch -hThe help menu should show.
The compose services will also need to be running, which you can launch like so:
docker compose up -f compose.yaml -f override/dev.yamlThis will open up all service ports so scripts run on your machine will be able to access them through localhost and
their respective ports. For working with the database, see the database doc for a rundown to how objects have been
structured.
This toolchain would not be possible without these amazing projects, be sure to check them out!
- Grobid: A machine learning software for extracting information from scholarly documents
- OpenAlex: A free database of over 240 million scholarly works (articles, books, datasets, theses)
- findpapers (Honorable Mention): An application that helps researchers who are looking for references for their work. The application will perform searches in several databases. I did not use this tool in the end but served as inspiration for many aspects of snowsearch.
If you encounter a bug, have a question, or want to suggest a feature, feel free to open a GitHub issue! For contributing, see CONTRIBUTING.md.
