MAGMA v2 evaluates large language models (LLMs) on five classical graph algorithms—BFS, DFS, Dijkstra’s, Floyd–Warshall, and Prim’s MST—with an emphasis on multistep reasoning and intermediate-step accuracy. Despite progress in LLMs, structured tasks like graph algorithms remain challenging. This benchmark highlights where LLMs excel and where they fall short.
- Five Graph Algorithms: BFS, DFS, Dijkstra’s, Floyd–Warshall, and MST (Prim).
- Intermediate Steps: Measure chain-of-thought accuracy, not just final outputs.
- Multiple Sources: Synthetic and real-world graph data.
- Flexible Prompting: Chain-of-thought or instruction-based queries.
- Modular & Extensible: Easy to add new tasks or adapt data generation.
- Clone the Repo:
git clone https://github.com/ataylor24/MAGMA_v2.git cd MAGMA_v2 - Install (editable mode recommended):
pip install -e .
Conda users:
conda env create --file environment.yml
conda activate nar2
pip install -e .Standard usage for all algorithms:
python magma_generation.py allThis runs BFS, DFS, Dijkstra’s, Floyd–Warshall, and Prim’s MST with default settings in globals.py. Adjust via CLI flags as needed.
TRAIN_TEST_SPLITmaps graph sizes to train/test counts._OOD_TRAIN_LENGTHS&_OOD_EVAL_LENGTHSfor out-of-distribution sizes.OUTPUT_FORMATS: e.g.cot_analysis,magma, etc.FORMATTED_ALGORITHMS: instructions/output formatting for each algorithm.COT_PROMPT: chain-of-thought prompt template.
python sample_data.py bfs \
--graph_sizes 5 7 10 \
--seed 1234 \
--ood_generation False \
--output_dir /path/to/output \
--output_formats cot_analysisbfscan be replaced withdfs,dijkstra,floyd_warshall,mst_prim, orall.graph_sizessets node counts.ood_generationenables out-of-distribution sampling.output_formatstoggles data format (chain-of-thought, etc.).
- Exact Match Accuracy: Matches final solution exactly.
- F1 Score: Partial correctness.
- Intermediate Steps: Evaluates step-by-step reasoning.
- Final Step: Only the final result.
- Trajectory: Entire chain-of-thought.
- Independent: Each step treated independently.
- Fork the repo.
- Create a branch.
- Commit changes.
- Push the branch.
- Open a Pull Request.
Please run pytest before submitting.
- Seed:
100898ensures consistent data generation. - Other defaults in code or config.
MIT License © 2025 Alexander Taylor
- Data adapted from CLRS benchmark
- Model training adapted from Hugging Face Alignment Handbook
Thank you for using MAGMA v2! Accelerate research into LLM-based graph algorithmic reasoning.