Mapping Logical and Cognitive Errors in LLMs

This project is a Python framework to analyze the logical reasoning capabilities of Large Language Models (LLMs) on LSAT (Law School Admission Test) problems.

It feeds problems to an LLM (like Google's Gemini), parses the model's step-by-step reasoning, and generates "reasoning maps" to visualize the logical chain. Finally, it runs a batch analysis to find and categorize recurring patterns of error.

How it Works

This framework operates in three main stages:

1. Data Collection (`main.py`):

Loads 50+ problems from the tasksource/lsat-lr dataset and queries the Gemini API for a step-by-step analysis of each.

2. Mapping (`reasoning_parser.py`):

Parses the LLM's text response. It then builds a networkx graph to create a "reasoning map" (e.g., Context -> Argument Breakdown -> Final Conclusion) and saves it as a .png.

3. Analysis (`analyze_results.py`):

Reads the results.csv file, checks the LLM's answer against the ground truth, and categorizes each question by type (e.g., "Flaw," "Assumption") to find and chart recurring error patterns. The full list of categories can be seen in the categorize_question function.

Tech Stack

The project relies on the following key technologies and libraries:

Language Model (LLM): Google Gemini API (gemini-2.5-flash-preview-09-2025)
Core Language: Python 3
Data Handling: pandas, HuggingFace datasets (for LSAT data)
Networking: httpx (asynchronous API calls)
Graph/Visualization: networkx, matplotlib (for Reasoning Maps and Bar Charts)
Utilities: python-dotenv, asyncio, re (for robust text parsing)

Setup

1. Clone the Repository:

git clone [your-repo-url]
cd ReasoningMaps-LLM

2. Install Dependencies:

Make sure you have all the required Python libraries.

pip install -r requirements.txt

3. Create Environment File:

This project requires a Google Gemini API key.
Create a file named .env in the main folder.
Add your API key to it like this:

LLM_KEY="YOUR_API_KEY"

Usage

The analysis is a two-step process:

1. Data Collection and Mapping:

Run main.py to fetch the problems, query the LLM, and generate the maps. This will analyze 50 problems and may take 5-6 minutes to complete due to API rate limits.

python main.py

After running main.py, the script will produce two outputs:
1. results.csv: A spreadsheet containing the detailed results for each problem.
2. reasoning_maps/: A folder containing 50 visualization maps, one for each problem.

2. Analysis:

Run analyze_results.py to read the results.csv file and print a summary report of the LLM's performance and error patterns.

python analyze_results.py

Example Output

Running python analyze_results.py will produce a report like this:

Analysis Report

--- Analyzing Results from results.csv ---

--- API & Parsing Health ---
Total Problems Processed: 50
API Errors (e.g., 429 Limit): 1

--- Overall Performance (on successful requests) ---
Total Successful Requests: 49
Correct Answers: 48
Incorrect Answers: 1
LLM Accuracy: 97.96%

--- Recurring Patterns of Error ---
The LLM struggled most with the following question types:
  - Method of Reasoning 
    (Failed 1 time(s))

Reasoning Map (Example of a Failure)

The report shows one failure. By checking results.csv for the failed problem, we can find its corresponding map in the reasoning_maps/ folder. This map visualizes the exact logical chain where the LLM failed (note the red "Final Conclusion" node).

Error Pattern Chart

The script also generates a bar chart showing the error counts for all tested categories, providing a clear picture of the LLM's strengths and weaknesses.

Acknowledgments

Data Source: The tasksource/lsat-lr dataset from Hugging Face for providing a robust benchmark of logical reasoning problems.
LLM API Service: The Google Gemini API for providing a high-performance, accessible service for LLM querying.

License

This project is licensed under the MIT License.

See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
analyze_results.py		analyze_results.py
llm_client.py		llm_client.py
load_lsat.py		load_lsat.py
main.py		main.py
reasoning_parser.py		reasoning_parser.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mapping Logical and Cognitive Errors in LLMs

How it Works

1. Data Collection (`main.py`):

2. Mapping (`reasoning_parser.py`):

3. Analysis (`analyze_results.py`):

Tech Stack

Setup

1. Clone the Repository:

2. Install Dependencies:

3. Create Environment File:

Usage

1. Data Collection and Mapping:

2. Analysis:

Example Output

Analysis Report

Reasoning Map (Example of a Failure)

Error Pattern Chart

Acknowledgments

License

About

Uh oh!

Releases

Packages

Languages

eatyeo/ReasoningMaps-LLM

Folders and files

Latest commit

History

Repository files navigation

Mapping Logical and Cognitive Errors in LLMs

How it Works

1. Data Collection (main.py):

2. Mapping (reasoning_parser.py):

3. Analysis (analyze_results.py):

Tech Stack

Setup

1. Clone the Repository:

2. Install Dependencies:

3. Create Environment File:

Usage

1. Data Collection and Mapping:

2. Analysis:

Example Output

Analysis Report

Reasoning Map (Example of a Failure)

Error Pattern Chart

Acknowledgments

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Data Collection (`main.py`):

2. Mapping (`reasoning_parser.py`):

3. Analysis (`analyze_results.py`):

Packages