Make sure you have Git, Python 3.12+ and Docker installed on your machine.
-
Clone the repository
git clone git@github.com:lebe1/simple-scientific-RAG.git cd simple-scientific-RAG -
Create a virtual environment (optional but recommended)
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies
Before running the project, install all the required dependencies using
pip:pip install -r requirements.txt
-
Build Dockerfile
Copy the example environment file into the
envcp .env.example .env
docker compose build
-
Run docker container
Run docker-compose.yml to pull the required image:
docker compose --profile cpu up -d
If you have GPU available, run it with the GPU profile enabled:
docker compose --profile gpu up -d
-
Install Ollama model
Pull the required model by running, choose either
ollama-cpuorollama-gpu:docker exec ollama-cpu ollama run llama3-chatqa:8b docker exec ollama-cpu ollama run gemma3:27b
-
Install model for chunking
python -m spacy download de_core_news_lg
-
Creating the embedding vectors and their index
You have several arguments to pass here. You can decide between three splitting criteria.
create-embeddingssplits the text until the maximum chunk size is reachedcreate-embeddings-by-articlesplits the text at each articlecreate-embeddings-by-subarticlesplits the text at each subarticle--chunk-sizedefines the chunk size--modeldefines the embedding model
In the following several sample arguments are provided
python app/workflow.py create-embeddings --chunk-size 0.5 --model sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 python app/workflow.py create-embeddings-by-article --chunk-size 4.0 --model jinaai/jina-embeddings-v2-base-de python app/workflow.py create-embeddings-by-subarticle --model sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
For the experiments, you can also create several indeces with one of the following Scripts
./run_workflow_create_index_small ./run_workflow_create_index
Note: If you want to improve your runtime and you have access to a GPU, comment out the commented lines of code in the docker-compose.yml
Run fastapi server locally:
cd app;
uvicorn main:app --reloadThere are two ways for testing the API.
Either by sending the following POST-request using curl:
curl -X POST "http://127.0.0.1:8000/api/rag" -H "Content-Type: application/json" -d '{"question": "Wie hoch darf ein Gebäude in Bauklasse I gemäß Artikel IV in Wien sein?", "model":"jinaai/jina-embeddings-v2-base-de", "spacy_model":"de_core_news_lg", "chunk_size_in_kb":4}'curl -X POST "http://127.0.0.1:8000/api/search" -H "Content-Type: application/json" -d '{"query": "Wie hoch darf ein Gebäude in Bauklasse I gemäß Artikel IV in Wien sein?", "model":"jinaai/jina-embeddings-v2-base-de", "spacy_model":"de_core_news_lg", "chunk_size_in_kb":4}'Or by opening the user interface via http://127.0.0.1:8000
Assuming again that you executed the uvicorn command above, execute:
cd app/evaluation;
python benchmark.py --questions ../../data/sample_questions.txt --references ../../data/sample_answers.txt --output-dir ../../data/benchmark_results_final
./cleanup_empty_json.sh -d data/benchmark_results_final/Which will execute the following combinations of multiple llm-model, chunk-size, and model embedding:
CONFIGURATIONS = {
"llm_models": ["llama3-chatqa:8b", "gemma3:27b"], # Add other models you have in Ollama
"embedding_models": [
"jinaai/jina-embeddings-v2-base-de",
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2" # Add other embedding models
],
"chunk_sizes": [0.125, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128], # Chunk sizes in KB
"spacy_models": ["de_core_news_lg"] # You could add more if needed
}The cleanup_empty_json.sh script cleans empty JSON files that get created during the benchmark workflow.
To run the quantitative evaluation using the LLM-as-a-judge approach, you can execute the evaluation script:
python evaluate_benchmarks.py --benchmark-dir ../data/benchmark_results --output-dir ../data/evaluation_results --eval-model gemma3:12b --max-retries 2Please note that you could use any LLM to evaluate, you just need to run it in the ollama container first. The
max-retrieswas added to try multiple times in case the LLM does not provide a proper JSON structure.
Finally, you can run the visualization pipeline:
python visualize_results.py --eval-dir ../data/evaluation_results --output-dir ../data/visualizationsTo qualitatively evaluate the benchmark results, collect the results into a single CSV file:
python prepare_qualitative_eval.py --input ../data/benchmark_results_final/ --output ../data/evaluation_results_final/ --mode combineNext, you need to give scores for each answer and combination of LLM, chunk size, embedding model, and spacy model. Either you manually fill it out, or you let other LLMs do it like ChatGPT or Claude. Finally, you execute the following visualization pipeline:
python visualize_qualitative_eval.pyWhich will store the evaluation results under data/evaluation_results_final.
IMPORTANT: Make sure all docker container as well as the FastAPI App is running in a terminal via uvicorn main:app --reload.
Only the arguments --llm-model and select-top-k will affect the experiment.
The other two arguments are there to record the embedding chunking method that has been executed beforehand.
cd app/evaluation;
python question_query.py --embedding-model sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 --llm-model llama3.2 --select-top-k 3 5 --splitting-method SUBARTICLEMake sure the PATH variable is set correctly
python visualize_keyword_eval.pyThe data of the legal basis can be found under https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=LrW&Gesetzesnummer=20000006