- Create:
python -m venv .venv
- Activate:
.\.venv\Scripts\activate
-
Using
pip:pip install -r requirements.tx
-
Using
uv:uv pip install -r requirements.txt
This will install the necessary libraries including:
ruff==0.11.4faiss-cpu>=1.7.2numpy>=1.23.0openai>=0.27.0tqdm>=4.62.3pathlib>=1.0.0
In order to start sending queries make sure to:
- Have an OPENAI_API_KEY as an environmental variable.
- Load the target repository:
example:
python scripts/run_load.py <repo_url> [--destination <destination_path>]
python scripts/run_load.py https://github.com/viarotel-org/escrcpy
- Make chunks:
example:
python scripts/run_chunk.py <repo_path> [--chunk_size <size>] [--overlap <overlap>] [--output <output_path>]
python scripts/run_chunk.py data/escrcpy
- Embed chunks into an index file
example:
python scripts/run_embed.py [--chunks_path <chunks_file>] [--index_path <index_file>]
python scripts/run_embed.py
Now you can run your queries in CLI as follows:
python scripts/run_answer.py <query> [--top_k <k>] [--index_path <index_file>] [--chunk_path <chunk_file>] [--rerank]example:
python scripts/run_answer.py "How does the SelectDisplay component handle the device options when retrieving display IDs?" --rerank--rerank: Indicates whether to perform reranking on the retrieved chunks after they are fetched. Use --rerank to enable.
To see the metrics of the retriever you can use script run_metrics.py:
python scripts/run_metrics.py [--ground_truth_path <gt_file>] [--top_k <k>] [--index_path <index_file>] [--chunks_path <json_file>] [--rerank]example:
python scripts/run_metrics.py --rerankThe current pipeline achieved the values of
Recall@10: 0.6471MRR@10: 0.4281