This guide explains how to run the analysis in two modes:
- File-level retrieval: predict which files contain the bug
- Function-level retrieval: predict which changed functions were involved
It also describes inputs/outputs, metrics, and tips for running locally or on clusters.
Quick start: Run the analysis notebook on Colab: Open in Colab
- Chunks and embeddings generated for all projects/bugs:
dataset/{project}/{bug}/code_chunks.jsondataset/{project}/{bug}/embedding.npy- If not yet generated, see
CHUNK_EMBEDDING.md(driven bygenerate_dataset_from_bugsinpy.ipynb).
- Installed dependencies:
pip install -r requirements.txt - Optional
.envfor configuration:MODEL_NAME=regularpooria/blaze_code_embedding BATCH_SIZE=128
- K (top-K): controls how many candidates to evaluate. Default in notebooks:
K = 60. - MODEL_NAME and BATCH_SIZE: read by
scripts/embedding.pyfrom env or defaults.
You can run this via the script or the notebook.
- Using the notebook:
run_analysis_file.ipynb - What it does:
- For each project/bug, loads
code_chunks.jsonandembedding.npy - Embeds extracted error tracebacks in batch
- Builds a FAISS index per bug and retrieves top-K code chunks
- Writes per-project results to
tmp/ast/results/bug_results_{project}.json - Aggregates metrics and writes
results_{K}.json(for K in {5,10,15,20}) - Sections:
- Setup: imports,
Kdefinition, folder creation - Inference: batch-embeds errors, retrieves top-K per bug, writes
bug_results_{project}.json - Aggregation: computes MAP, MRR, overall and per-project success rates -> writes
results_{K}.json
- Setup: imports,
Outputs and formats:
tmp/ast/results/bug_results_{project}.json(list per project):{ "index": 42, "files": [ {"file": "pkg/module.py", "function": "foo"}, ... ] }results_{K}.json(aggregated):- keys:
K,model_name,MAP,MRR,searches_passed,searches_failed,success_rate,success_projects
- keys:
This evaluates whether the top-K predictions include the actual changed functions.
- Using the notebook:
run_analysis_function.ipynb- Steps mirror file-level analysis, but compares predicted function names against the set of changed functions for each bug
- Uses
scripts.bugsinpy_utils.parse_changed_function_names_2to infer which functions changed from diffs - Writes
tmp/ast/results/bug_results_{project}.jsonand aggregatedresults_{K}.json(K in {1,5,10} by default)
Notes:
- The function-level notebook sets
model_nameto a static string and (as written) does not compute MAP/MRR; it reports success rates only. - To run it end-to-end, ensure
embedandindex_embeddingsare imported (uncomment the import cell):from scripts.embedding import model, MODEL_NAME, BATCH_SIZE, embed, index_embeddings
- To avoid overwriting file-level
results_{K}.json, rename outputs or run in a clean workspace.
- File-level
- Success if the true changed file name appears in the top-K predicted file list
- Reports mean average precision (MAP) and mean reciprocal rank (MRR) across evaluated bugs
- Function-level
- Success if all changed functions appear in the top-K predicted function list
- Reports success rates per project and overall
- Per-project retrievals:
tmp/ast/results/bug_results_{project}.json - Aggregated metrics:
results_{K}.jsonin repo root
- For simple printing of success rates from an aggregated file-level results JSON, you can use:
Adjust the script or adapt for function-level outputs as needed.
python generate_report.py
- Missing embeddings: Make sure
embedding.npyexists for each{project, bug}(run chunking/embedding first) - Model download on clusters: Pre-download and set
HF_HOME/TRANSFORMERS_CACHEto a shared path; jobs do not have Internet - Function-level notebook error
embed is not defined: Import fromscripts.embeddingas shown above