Skip to content

NLP-CISUC/Cognitive-Flow

Repository files navigation

Cognitive Flow: An LLM-Automated Framework for Quantifying Reasoning Distillation

This repository contains the official implementation, data, and experiment materials for the paper “Cognitive Flow: An LLM-Automated Framework for Quantifying Reasoning Distillation.”

For further details, please refer to the full paper for theoretical background, experimental methodology, and complete visual results.

cot-flows-final-version (1)

Table of Contents

  1. Framework Overview
  2. Repository Structure
  3. Note for installation
  4. How to Replicate
  5. Data
  6. Results Summary

Framework Overview

The Cognitive Flow framework converts unstructured CoT text into a structured, quantifiable representation of reasoning style through four main stages:

1. Step Segmentation

The reasoning trace between <think> and </think> tags is extracted and divided into discrete reasoning steps using double-newline delimiters (\n\n).

2. Label Set Definition

A Label Extractor LLM analyzes a large random sample of reasoning steps (≈1000) to define a concise set of cognitive state labels (e.g., Interpretation, Calculation, Verification).
Prompt templates for this stage are available in cognitive_flow_utils/prompt_templates.py.

3. Step Classification

A Step Classifier LLM assigns one cognitive label to each reasoning step in a few-shot classification setup. The corresponding prompts are also defined in prompt_templates.py.

4. Flow Aggregation and Representation

Labeled sequences are aggregated into an N×N state transition matrix, capturing the conditional probability of transitions between cognitive states — effectively a “fingerprint” of a model’s reasoning style.

This matrix is used for quantitative comparisons across models through metrics such as Cosine Similarity (CS) and Kullback-Leibler Divergence (KLD).


Repository Structure

├── cognitive_flow_utils/
│   ├── dataset_utils.py         # Helper functions for handling prompt datasets
│   ├── llm_methods.py           # Core functions for querying LLMs
│   ├── models_and_clients.py    # Definitions and clients for models (DeepSeek, Gemma, etc.)
│   ├── prompt_templates.py      # System prompts for Label Extractor and Step Annotator
│   ├── step_annotation.py       # StepAnnotator class for batch annotation
│   └── ...
│
├── mmlu-elementary-maths/
│   ├── elementary_labels.txt
│   ├── elementary_maths_prompts.csv
│   ├── ..._steps.csv
│   └── to_steps.ipynb
│
├── mmlu-high-school-maths/
│   ├── high_school_labels.txt
│   ├── hs_maths_prompts.csv
│   ├── ..._steps.csv
│   └── to_steps.ipynb
│
├── mmlu-college-maths/
│   ├── college_labels.txt
│   ├── college_maths_prompts.csv
│   ├── ..._steps.csv
│   └── to_steps.ipynb
│
├── get_completions_from_prompts.py     # Generate model reasoning completions
├── annotate_steps_dataset.py           # Label reasoning steps
├── flow_analysis.ipynb                 # Cognitive Flow (matrix/graph) analysis
├── state_distribution_analysis.ipynb   # Cognitive state frequency analysis
├── token_distribution_analysis.ipynb   # Token effort distribution analysis
└── README.md

Note for installation

If using API-served models, configure API keys (e.g., DeepSeek, Groq) as environment variables for use in
cognitive_flow_utils/models_and_clients.py.

How to Replicate

The experimental pipeline consists of four main stages:

1. Generate Model Completions

Run the reasoning generation script:

python get_completions_from_prompts.py

This script produces raw reasoning outputs (CoTs) for a target model and dataset.
Parameters like dataset path, model, and temperature are configured within the script.

2. Segment CoTs into Steps

Use the notebooks (to_steps.ipynb) in each dataset folder to:

  • Extract <think> text segments
  • Split reasoning into individual steps (\n\n)
  • Save as ..._steps.csv

3. Annotate Reasoning Steps

Classify reasoning steps using:

python annotate_steps_dataset.py

This script assigns cognitive labels to each step and can optionally generate a new label set.

4. Analyze Cognitive Flows

With annotated datasets, use the Jupyter notebooks to:

  • Build state transition matrices
  • Compute CS and KLD between models
  • Generate visualizations

Data

Experiments are based on three subsets of the MMLU benchmark, representing increasing task complexity:

  • Elementary Maths
  • High School Maths
  • College Maths

Each subset includes:

  • The original MMLU prompts
  • Cognitive label sets generated via the Label Extractor LLM
  • Annotated reasoning steps for each evaluated model

All data are available within their corresponding /mmlu-* directories.

Results Summary

The Cognitive Flow framework provides a quantitative lens on reasoning transfer.
Analysis of the DeepSeek-R1 model family shows that:

  • High similarity is observed between teacher and student reasoning on medium-complexity tasks.
  • Divergence increases significantly on both simple and highly complex tasks.
  • Distilled models tend to underperform in “Verification”-related reasoning, neglecting cognitive self-checking.
  • Independently trained RL-based models (e.g., QwenQwQ-32B) display more balanced and adaptable reasoning flows.

These findings suggest that while KD effectively transmits surface reasoning structure, it may not capture deeper, flexible cognitive strategies.

About

Repository with the experiments for the paper "Cognitive Flow: An LLM-Automated Framework for Quantifying Reasoning Distillation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors