NaiveRAG is a simple Retrieval-Augmented Generation (RAG) pipeline created for education purposes.
Future plans include:
- Support for different strategies - such as recursive token splitting.
- CLI args and a runnable bash script.
Using conda is recommended:
conda create -n naiverag python=3.12
conda activate naiveragInstall in editable mode:
pip install -e .Create a .env file in the project root:
API_KEY=...Adjust the settings in config.yaml!
After setting everything up, you can run the pipeline using the following command:
python run.pyThe actual RAG pipeline is located in the /rag_pipeline directory, which contains the following subdirectories:
/api– Implements API endpoints for language and embedding model APIs/chunking– Contains strategies for chunking documents/db– Contains vector store implementations
Each of these directories includes a base.py file that defines an abstract base class. These classes are then implemented in specific files (i.e. gemini.py for the Gemini API).
The correct implementation is automatically selected based on your settings in config.yaml.
Utility functions (i.e., for loading the config) are located in /util.