Simple scientific RAG

Requirements

Make sure you have Git, Python 3.12+ and Docker installed on your machine.

Setup Instructions

Clone the repository

git clone git@github.com:lebe1/simple-scientific-RAG.git
cd simple-scientific-RAG

Create a virtual environment (optional but recommended)

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install dependencies

Before running the project, install all the required dependencies using pip:
```
pip install -r requirements.txt
```
Build Dockerfile

Copy the example environment file into the env
```
cp .env.example .env
```
```
docker compose build
```
Run docker container

Run docker-compose.yml to pull the required image:
```
docker compose --profile cpu up -d
```
If you have GPU available, run it with the GPU profile enabled:
```
docker compose --profile gpu up -d
```

Install Ollama model

Pull the required model by running, choose either ollama-cpu or ollama-gpu:

docker exec ollama-cpu ollama run llama3-chatqa:8b
docker exec ollama-cpu ollama run gemma3:27b

Install model for chunking

python -m spacy download de_core_news_lg

Creating the embedding vectors and their index

You have several arguments to pass here. You can decide between three splitting criteria.
1. create-embeddings splits the text until the maximum chunk size is reached
2. create-embeddings-by-article splits the text at each article
3. create-embeddings-by-subarticle splits the text at each subarticle
4. --chunk-size defines the chunk size
5. --model defines the embedding model
In the following several sample arguments are provided
```
python app/workflow.py create-embeddings --chunk-size 0.5 --model sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
python app/workflow.py create-embeddings-by-article --chunk-size 4.0 --model jinaai/jina-embeddings-v2-base-de
python app/workflow.py create-embeddings-by-subarticle --model sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
```
For the experiments, you can also create several indeces with one of the following Scripts
```
./run_workflow_create_index_small
./run_workflow_create_index
```

Running the application after setup instructions

Note: If you want to improve your runtime and you have access to a GPU, comment out the commented lines of code in the docker-compose.yml

Run fastapi server locally:

cd app;
uvicorn main:app --reload

Testing the API

There are two ways for testing the API.
Either by sending the following POST-request using curl:

curl -X POST "http://127.0.0.1:8000/api/rag" -H "Content-Type: application/json" -d '{"question": "Wie hoch darf ein Gebäude in Bauklasse I gemäß Artikel IV in Wien sein?", "model":"jinaai/jina-embeddings-v2-base-de", "spacy_model":"de_core_news_lg", "chunk_size_in_kb":4}'

curl -X POST "http://127.0.0.1:8000/api/search" -H "Content-Type: application/json" -d '{"query": "Wie hoch darf ein Gebäude in Bauklasse I gemäß Artikel IV in Wien sein?", "model":"jinaai/jina-embeddings-v2-base-de", "spacy_model":"de_core_news_lg", "chunk_size_in_kb":4}'

Or by opening the user interface via http://127.0.0.1:8000

Running the automated benchmark execution

Assuming again that you executed the uvicorn command above, execute:

cd app/evaluation;
python benchmark.py --questions ../../data/sample_questions.txt --references ../../data/sample_answers.txt --output-dir ../../data/benchmark_results_final
./cleanup_empty_json.sh -d data/benchmark_results_final/

Which will execute the following combinations of multiple llm-model, chunk-size, and model embedding:

CONFIGURATIONS = {
    "llm_models": ["llama3-chatqa:8b", "gemma3:27b"],  # Add other models you have in Ollama
    "embedding_models": [
        "jinaai/jina-embeddings-v2-base-de",
        "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"  # Add other embedding models
    ],
    "chunk_sizes": [0.125, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128],  # Chunk sizes in KB
    "spacy_models": ["de_core_news_lg"]  # You could add more if needed
}

The cleanup_empty_json.sh script cleans empty JSON files that get created during the benchmark workflow.

Running the quantitative evaluation

To run the quantitative evaluation using the LLM-as-a-judge approach, you can execute the evaluation script:

python evaluate_benchmarks.py --benchmark-dir ../data/benchmark_results --output-dir ../data/evaluation_results --eval-model gemma3:12b --max-retries 2

Please note that you could use any LLM to evaluate, you just need to run it in the ollama container first. The max-retries was added to try multiple times in case the LLM does not provide a proper JSON structure.

Finally, you can run the visualization pipeline:

python visualize_results.py --eval-dir ../data/evaluation_results --output-dir ../data/visualizations

Running the qualitative evaluation

To qualitatively evaluate the benchmark results, collect the results into a single CSV file:

python prepare_qualitative_eval.py --input ../data/benchmark_results_final/ --output ../data/evaluation_results_final/ --mode combine

Next, you need to give scores for each answer and combination of LLM, chunk size, embedding model, and spacy model. Either you manually fill it out, or you let other LLMs do it like ChatGPT or Claude. Finally, you execute the following visualization pipeline:

python visualize_qualitative_eval.py

Which will store the evaluation results under data/evaluation_results_final.

Running the keyword evaluation

IMPORTANT: Make sure all docker container as well as the FastAPI App is running in a terminal via uvicorn main:app --reload.

Only the arguments --llm-model and select-top-k will affect the experiment. The other two arguments are there to record the embedding chunking method that has been executed beforehand.

cd app/evaluation;
python question_query.py   --embedding-model sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2   --llm-model llama3.2   --select-top-k 3 5   --splitting-method SUBARTICLE

Visualizing the results

Make sure the PATH variable is set correctly

python visualize_keyword_eval.py

Data

The data of the legal basis can be found under https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=LrW&Gesetzesnummer=20000006

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
app		app
data		data
submission		submission
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cleanup_empty_json.sh		cleanup_empty_json.sh
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
run.sh		run.sh
run_workflow_create_index.sh		run_workflow_create_index.sh
run_workflow_create_index_small.sh		run_workflow_create_index_small.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple scientific RAG

Requirements

Setup Instructions

Running the application after setup instructions

Testing the API

Running the automated benchmark execution

Running the quantitative evaluation

Running the qualitative evaluation

Running the keyword evaluation

Visualizing the results

Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

lebe1/simple-scientific-RAG

Folders and files

Latest commit

History

Repository files navigation

Simple scientific RAG

Requirements

Setup Instructions

Running the application after setup instructions

Testing the API

Running the automated benchmark execution

Running the quantitative evaluation

Running the qualitative evaluation

Running the keyword evaluation

Visualizing the results

Data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages