🧠 Analogical Code Reasoning Tool

A cognitive science-inspired system for detecting reusable code patterns in novice Python programs

📖 Overview

This project implements a computational model of analogical reasoning to help novice programmers recognize when their manually written code corresponds to existing Python library abstractions—transforming verbose, loop-heavy implementations into elegant, idiomatic Python.

The Problem: Novice programmers often reinvent the wheel, writing explicit loops for operations that Python's standard library already provides (sum(), max(), filter(), sorted(), etc.).

Our Solution: An intelligent system that:

✨ Parses Python code using AST (Abstract Syntax Tree) analysis
🔍 Extracts relational features (loops, accumulators, comparisons, append patterns, etc.)
🧩 Applies a rule-based cognitive engine that detects high-level patterns (aggregation, filtering, mapping, search, sorting)
🎯 Matches detected patterns with a curated database of 15+ library analogies
💡 Outputs ranked suggestions with explanations and example usage

🎓 Academic Context

This tool was developed as part of CS 6795: Cognitive Science at Georgia Tech (Fall 2025), implementing principles from cognitive science research on analogical reasoning:

Analogical Mapping (Gentner's Structure-Mapping Theory)
Rule-Based Reasoning (Production Systems)
Feature-Based Similarity (Thagard's ACME model)

The system models how expert programmers recognize code patterns through analogical transfer from well-known library functions (the source domain) to novel code snippets (the target domain).

✨ Features

Feature	Description
🖥️ Interactive CLI Mode	Paste code directly and receive instant analogies
📁 Batch Processing	Analyze files from disk for bulk evaluation
🌳 AST-Based Analysis	Structural code parsing for robust pattern detection
🧠 Cognitive Rule Engine	IF-THEN rules that mimic human analogical reasoning
📚 Rich Analogy Database	15+ library functions with explanations + examples
📊 Evaluation Suite	10 novice-like test snippets with ground truth validation
🔧 Fully Extensible	Add custom rules, patterns, or analogies with ease

📂 Project Structure

analogical_code_tool/
├── 📄 analogy_db.json           # Library function analogies (source domain)
├── 🔍 parser_features.py        # Extracts relational structure from AST
├── 📋 rules.py                  # IF-THEN rules (analogical mapping logic)
├── ⚙️  engine.py                 # Core reasoning engine
├── 💻 cli.py                    # Interactive + batch CLI interface
├── 📊 evaluate.py               # Accuracy evaluation script
├── 📝 example_snippet.py        # Demo snippet (sum implementation)
├── 📂 test_snippets/            # Novice-like code snippets for testing
│   ├── snippet1_sum.py          # Loop-based sum → sum()
│   ├── snippet2_mean.py         # Average calculation → statistics.mean()
│   ├── snippet3_filter.py       # Conditional filtering → filter()
│   ├── snippet4_map.py          # Element transformation → map()
│   ├── snippet5_max.py          # Manual maximum search → max()
│   ├── snippet6_any.py          # Existence check → any()
│   ├── snippet7_sort.py         # Bubble sort → sorted()
│   ├── snippet8_filter_negatives.py  # Filter pattern variant
│   ├── snippet9_map_square.py   # Map pattern variant
│   └── snippet10_custom_max.py  # Max pattern variant
└── 🎯 ground_truth.json         # Expected analogies for evaluation

🚀 Installation

Zero external dependencies required! Just Python 3.9+:

git clone <https://github.com/toyeade1/CS-6795-Project-.git>
cd analogical_code_tool

That's it! The tool uses only Python's standard library.

💡 Usage

1️⃣ Interactive Mode (Recommended)

Paste any Python snippet and type END when finished:

python cli.py --interactive

Example Session:

Analogical Code Tool - Interactive Mode
Paste your Python code snippet below.
When you are done, enter a line containing only 'END' and press Enter.

Enter snippet (type 'END' on its own line to finish):
def total(nums):
    s = 0
    for n in nums:
        s += n
    return s
END

Analyzing snippet...

Detected analogies (rule-based):

1. sum  (module: builtins, pattern: aggregation)
   Why: Sums an iterable of numbers and returns the total.
   Example usage:
      total = sum(nums)
   Score: 4.00

Analyze another snippet? [y/N]:

2️⃣ Analyze a File

python cli.py --file path/to/your_code.py

Example:

python cli.py --file test_snippets/snippet5_max.py

3️⃣ Analyze Inline Code

python cli.py --code "def square(nums): result=[n*n for n in nums]; return result"

Output:

Detected analogies (rule-based):

1. map  (module: builtins, pattern: map)
   Why: Applies a function to each item of an iterable.
   Example usage:
      squares = list(map(lambda x: x * x, nums))
   Score: 2.00

2. list-comprehension-map  (module: syntax, pattern: map)
   Why: List comprehension for mapping elements.
   Example usage:
      squares = [x * x for x in nums]
   Score: 2.00

4️⃣ Customize Top-k Suggestions

python cli.py --file test.py --top-k 10

📊 Evaluation

Evaluate the tool's accuracy on 10 curated novice-like snippets:

python evaluate.py

Sample Output:

Snippet: snippet1_sum.py
  Expected: {'sum'}
  Suggested: ['sum', 'statistics.mean', 'numpy.mean']

Snippet: snippet5_max.py
  Expected: {'max'}
  Suggested: ['max', 'min', 'any']

...

Total snippets: 10
Top-1 accuracy: 0.80
Top-3 accuracy: 0.90

The evaluation compares system suggestions against human-defined ground truth in ground_truth.json.

🔬 How It Works

High-Level Architecture

┌─────────────────┐
│  Code Snippet   │  ← Novice-written Python code
└────────┬────────┘
         ↓
┌─────────────────┐
│   AST Parser    │  ← Converts code to Abstract Syntax Tree
└────────┬────────┘
         ↓
┌─────────────────┐
│    Feature      │  ← Extracts relational features:
│   Extractor     │    • has_loop, loop_type
└────────┬────────┘    • uses_accumulator, accumulator_operation
         ↓              • appends_to_list, conditional_inside_loop
┌─────────────────┐    • uses_comparison, has_swap_pattern
│  Rule Engine    │  ← Applies IF-THEN rules:
└────────┬────────┘    • IF accumulator+add+loop → "aggregation"
         ↓              • IF append+conditional → "filter"
┌─────────────────┐    • IF swap+comparison → "sorting"
│   Analogy DB    │  ← Matches patterns to library functions
└────────┬────────┘
         ↓
┌─────────────────┐
│    Ranked       │  ← Scored by feature overlap
│  Suggestions    │
└─────────────────┘

Cognitive Mapping Process

Component	Domain	Description
Target Domain	Novice Code	Loop-based implementations with explicit control flow
Source Domain	Library Functions	High-level abstractions like `sum()`, `filter()`, `sorted()`
Mapping Rules	Production Rules	IF accumulator + add operation + loop → aggregation pattern
Similarity Measure	Feature Overlap	Scores candidates by matching structural features

Example Rule:

Rule(
    name="aggregation_rule",
    pattern_label="aggregation",
    condition_fn=lambda f: (
        f.get("has_loop") and
        f.get("uses_accumulator") and
        f.get("accumulator_operation") == "add" and
        f.get("returns_accumulator")
    )
)

This models analogical reasoning as studied in cognitive science literature (Gentner, Holyoak, Thagard, etc.).

🧩 Extending the Tool

Add New Analogies

Edit analogy_db.json:

{
  "name": "any",
  "module": "builtins",
  "pattern": "search",
  "features": {
    "uses_comparison": true,
    "returns_accumulator": true
  },
  "example_usage": "has_positive = any(x > 0 for x in nums)",
  "explanation": "Returns True if any element of the iterable is truthy."
}

Add New Rules

Edit rules.py:

Rule(
    name="comprehension_filter",
    pattern_label="filter",
    condition_fn=lambda f: (
        f.get("uses_comparison") and
        f.get("appends_to_list")
    )
)

Add New Features

Modify parser_features.py to detect additional structural patterns:

def visit_ListComp(self, node: ast.ListComp) -> Any:
    """Detect list comprehension patterns."""
    self.features["has_list_comprehension"] = True
    self.generic_visit(node)

📈 Detected Patterns

Pattern	Description	Example Libraries
aggregation	Accumulates values using add/multiply operations	`sum()`, `math.prod()`, `statistics.mean()`
filter	Conditionally selects elements	`filter()`, list comprehensions with `if`
map	Transforms each element	`map()`, list comprehensions
search	Finds elements via comparison	`max()`, `min()`, `any()`, `all()`
sorting	Reorders elements	`sorted()`, `list.sort()`
iteration	Generic loop patterns	`for`/`while` loops

🎯 Example Transformations

Before (Novice Code) → After (Pythonic)

Example 1: Sum

# Before: Manual accumulation
def total(nums):
    s = 0
    for n in nums:
        s += n
    return s

# After: Built-in function
total = sum(nums)

Example 2: Filter

# Before: Conditional append
def get_positives(nums):
    result = []
    for n in nums:
        if n > 0:
            result.append(n)
    return result

# After: filter() or comprehension
positives = [x for x in nums if x > 0]

Example 3: Map

# Before: Transform and append
def squares(nums):
    result = []
    for n in nums:
        result.append(n * n)
    return result

# After: List comprehension
squares = [x * x for x in nums]

🔍 Technical Details

Feature Extraction

The FeatureExtractor (AST visitor) detects:

Loop Structures: has_loop, loop_type (for/while)
Accumulator Patterns: uses_accumulator, accumulator_operation (add/mult)
List Building: appends_to_list
Conditional Logic: conditional_inside_loop, uses_comparison
Sorting Cues: has_swap_pattern, iterates_over_indices

Rule Engine

7 production rules map features to high-level patterns:

Aggregation Rule (add-based accumulation)
Product Aggregation Rule (multiply-based accumulation)
Filter Rule (conditional append)
Map Rule (unconditional append)
Search Rule (comparison + accumulator)
Sorting Rule (swap pattern + comparison)
Generic Iteration Rule (fallback)

Scoring Algorithm

score = 1.0  (base score for pattern match)
for each matching feature:
    score += 1.0

Candidates are ranked by descending score, with top-k returned.

🏆 Performance Summary

Metric	Score
Top-1 Accuracy	~40-50%
Top-3 Accuracy	~80%
Test Suite Size	20 novice snippets

🛠️ Future Enhancements

Support for more complex patterns (nested loops, multiple return paths)
Integration with IDEs (VS Code extension, PyCharm plugin)
Machine learning-based feature extraction
Natural language explanations powered by LLMs
Multi-language support (JavaScript, Java, C++)
Real-time code suggestions during typing

📄 License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

🙏 Acknowledgements

This tool was created as part of:

CS 6795: Cognitive Science — Georgia Tech (Fall 2025)

📧 Contact

For questions, suggestions, or collaboration opportunities, please open an issue or reach out via the course portal.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
test_snippets		test_snippets
README.md		README.md
analogy_db.json		analogy_db.json
cli.py		cli.py
engine.py		engine.py
evaluate.py		evaluate.py
example_snippet.py		example_snippet.py
ground_truth.json		ground_truth.json
parser_features.py		parser_features.py
rules.py		rules.py

toyeade1/CS-6795-Project-

Folders and files

Latest commit

History

Repository files navigation