Skip to content

toyeade1/CS-6795-Project-

Repository files navigation

🧠 Analogical Code Reasoning Tool

A cognitive science-inspired system for detecting reusable code patterns in novice Python programs

Python 3.9+ License: MIT Georgia Tech


📖 Overview

This project implements a computational model of analogical reasoning to help novice programmers recognize when their manually written code corresponds to existing Python library abstractions—transforming verbose, loop-heavy implementations into elegant, idiomatic Python.

The Problem: Novice programmers often reinvent the wheel, writing explicit loops for operations that Python's standard library already provides (sum(), max(), filter(), sorted(), etc.).

Our Solution: An intelligent system that:

  • Parses Python code using AST (Abstract Syntax Tree) analysis
  • 🔍 Extracts relational features (loops, accumulators, comparisons, append patterns, etc.)
  • 🧩 Applies a rule-based cognitive engine that detects high-level patterns (aggregation, filtering, mapping, search, sorting)
  • 🎯 Matches detected patterns with a curated database of 15+ library analogies
  • 💡 Outputs ranked suggestions with explanations and example usage

🎓 Academic Context

This tool was developed as part of CS 6795: Cognitive Science at Georgia Tech (Fall 2025), implementing principles from cognitive science research on analogical reasoning:

  • Analogical Mapping (Gentner's Structure-Mapping Theory)
  • Rule-Based Reasoning (Production Systems)
  • Feature-Based Similarity (Thagard's ACME model)

The system models how expert programmers recognize code patterns through analogical transfer from well-known library functions (the source domain) to novel code snippets (the target domain).


✨ Features

Feature Description
🖥️ Interactive CLI Mode Paste code directly and receive instant analogies
📁 Batch Processing Analyze files from disk for bulk evaluation
🌳 AST-Based Analysis Structural code parsing for robust pattern detection
🧠 Cognitive Rule Engine IF-THEN rules that mimic human analogical reasoning
📚 Rich Analogy Database 15+ library functions with explanations + examples
📊 Evaluation Suite 10 novice-like test snippets with ground truth validation
🔧 Fully Extensible Add custom rules, patterns, or analogies with ease

📂 Project Structure

analogical_code_tool/
├── 📄 analogy_db.json           # Library function analogies (source domain)
├── 🔍 parser_features.py        # Extracts relational structure from AST
├── 📋 rules.py                  # IF-THEN rules (analogical mapping logic)
├── ⚙️  engine.py                 # Core reasoning engine
├── 💻 cli.py                    # Interactive + batch CLI interface
├── 📊 evaluate.py               # Accuracy evaluation script
├── 📝 example_snippet.py        # Demo snippet (sum implementation)
├── 📂 test_snippets/            # Novice-like code snippets for testing
│   ├── snippet1_sum.py          # Loop-based sum → sum()
│   ├── snippet2_mean.py         # Average calculation → statistics.mean()
│   ├── snippet3_filter.py       # Conditional filtering → filter()
│   ├── snippet4_map.py          # Element transformation → map()
│   ├── snippet5_max.py          # Manual maximum search → max()
│   ├── snippet6_any.py          # Existence check → any()
│   ├── snippet7_sort.py         # Bubble sort → sorted()
│   ├── snippet8_filter_negatives.py  # Filter pattern variant
│   ├── snippet9_map_square.py   # Map pattern variant
│   └── snippet10_custom_max.py  # Max pattern variant
└── 🎯 ground_truth.json         # Expected analogies for evaluation

🚀 Installation

Zero external dependencies required! Just Python 3.9+:

git clone <https://github.com/toyeade1/CS-6795-Project-.git>
cd analogical_code_tool

That's it! The tool uses only Python's standard library.


💡 Usage

1️⃣ Interactive Mode (Recommended)

Paste any Python snippet and type END when finished:

python cli.py --interactive

Example Session:

Analogical Code Tool - Interactive Mode
Paste your Python code snippet below.
When you are done, enter a line containing only 'END' and press Enter.

Enter snippet (type 'END' on its own line to finish):
def total(nums):
    s = 0
    for n in nums:
        s += n
    return s
END

Analyzing snippet...

Detected analogies (rule-based):

1. sum  (module: builtins, pattern: aggregation)
   Why: Sums an iterable of numbers and returns the total.
   Example usage:
      total = sum(nums)
   Score: 4.00

Analyze another snippet? [y/N]:

2️⃣ Analyze a File

python cli.py --file path/to/your_code.py

Example:

python cli.py --file test_snippets/snippet5_max.py

3️⃣ Analyze Inline Code

python cli.py --code "def square(nums): result=[n*n for n in nums]; return result"

Output:

Detected analogies (rule-based):

1. map  (module: builtins, pattern: map)
   Why: Applies a function to each item of an iterable.
   Example usage:
      squares = list(map(lambda x: x * x, nums))
   Score: 2.00

2. list-comprehension-map  (module: syntax, pattern: map)
   Why: List comprehension for mapping elements.
   Example usage:
      squares = [x * x for x in nums]
   Score: 2.00

4️⃣ Customize Top-k Suggestions

python cli.py --file test.py --top-k 10

📊 Evaluation

Evaluate the tool's accuracy on 10 curated novice-like snippets:

python evaluate.py

Sample Output:

Snippet: snippet1_sum.py
  Expected: {'sum'}
  Suggested: ['sum', 'statistics.mean', 'numpy.mean']

Snippet: snippet5_max.py
  Expected: {'max'}
  Suggested: ['max', 'min', 'any']

...

Total snippets: 10
Top-1 accuracy: 0.80
Top-3 accuracy: 0.90

The evaluation compares system suggestions against human-defined ground truth in ground_truth.json.


🔬 How It Works

High-Level Architecture

┌─────────────────┐
│  Code Snippet   │  ← Novice-written Python code
└────────┬────────┘
         ↓
┌─────────────────┐
│   AST Parser    │  ← Converts code to Abstract Syntax Tree
└────────┬────────┘
         ↓
┌─────────────────┐
│    Feature      │  ← Extracts relational features:
│   Extractor     │    • has_loop, loop_type
└────────┬────────┘    • uses_accumulator, accumulator_operation
         ↓              • appends_to_list, conditional_inside_loop
┌─────────────────┐    • uses_comparison, has_swap_pattern
│  Rule Engine    │  ← Applies IF-THEN rules:
└────────┬────────┘    • IF accumulator+add+loop → "aggregation"
         ↓              • IF append+conditional → "filter"
┌─────────────────┐    • IF swap+comparison → "sorting"
│   Analogy DB    │  ← Matches patterns to library functions
└────────┬────────┘
         ↓
┌─────────────────┐
│    Ranked       │  ← Scored by feature overlap
│  Suggestions    │
└─────────────────┘

Cognitive Mapping Process

Component Domain Description
Target Domain Novice Code Loop-based implementations with explicit control flow
Source Domain Library Functions High-level abstractions like sum(), filter(), sorted()
Mapping Rules Production Rules IF accumulator + add operation + loop → aggregation pattern
Similarity Measure Feature Overlap Scores candidates by matching structural features

Example Rule:

Rule(
    name="aggregation_rule",
    pattern_label="aggregation",
    condition_fn=lambda f: (
        f.get("has_loop") and
        f.get("uses_accumulator") and
        f.get("accumulator_operation") == "add" and
        f.get("returns_accumulator")
    )
)

This models analogical reasoning as studied in cognitive science literature (Gentner, Holyoak, Thagard, etc.).


🧩 Extending the Tool

Add New Analogies

Edit analogy_db.json:

{
  "name": "any",
  "module": "builtins",
  "pattern": "search",
  "features": {
    "uses_comparison": true,
    "returns_accumulator": true
  },
  "example_usage": "has_positive = any(x > 0 for x in nums)",
  "explanation": "Returns True if any element of the iterable is truthy."
}

Add New Rules

Edit rules.py:

Rule(
    name="comprehension_filter",
    pattern_label="filter",
    condition_fn=lambda f: (
        f.get("uses_comparison") and
        f.get("appends_to_list")
    )
)

Add New Features

Modify parser_features.py to detect additional structural patterns:

def visit_ListComp(self, node: ast.ListComp) -> Any:
    """Detect list comprehension patterns."""
    self.features["has_list_comprehension"] = True
    self.generic_visit(node)

📈 Detected Patterns

Pattern Description Example Libraries
aggregation Accumulates values using add/multiply operations sum(), math.prod(), statistics.mean()
filter Conditionally selects elements filter(), list comprehensions with if
map Transforms each element map(), list comprehensions
search Finds elements via comparison max(), min(), any(), all()
sorting Reorders elements sorted(), list.sort()
iteration Generic loop patterns for/while loops

🎯 Example Transformations

Before (Novice Code) → After (Pythonic)

Example 1: Sum

# Before: Manual accumulation
def total(nums):
    s = 0
    for n in nums:
        s += n
    return s

# After: Built-in function
total = sum(nums)

Example 2: Filter

# Before: Conditional append
def get_positives(nums):
    result = []
    for n in nums:
        if n > 0:
            result.append(n)
    return result

# After: filter() or comprehension
positives = [x for x in nums if x > 0]

Example 3: Map

# Before: Transform and append
def squares(nums):
    result = []
    for n in nums:
        result.append(n * n)
    return result

# After: List comprehension
squares = [x * x for x in nums]

🔍 Technical Details

Feature Extraction

The FeatureExtractor (AST visitor) detects:

  • Loop Structures: has_loop, loop_type (for/while)
  • Accumulator Patterns: uses_accumulator, accumulator_operation (add/mult)
  • List Building: appends_to_list
  • Conditional Logic: conditional_inside_loop, uses_comparison
  • Sorting Cues: has_swap_pattern, iterates_over_indices

Rule Engine

7 production rules map features to high-level patterns:

  1. Aggregation Rule (add-based accumulation)
  2. Product Aggregation Rule (multiply-based accumulation)
  3. Filter Rule (conditional append)
  4. Map Rule (unconditional append)
  5. Search Rule (comparison + accumulator)
  6. Sorting Rule (swap pattern + comparison)
  7. Generic Iteration Rule (fallback)

Scoring Algorithm

score = 1.0  (base score for pattern match)
for each matching feature:
    score += 1.0

Candidates are ranked by descending score, with top-k returned.


🏆 Performance Summary

Metric Score
Top-1 Accuracy ~40-50%
Top-3 Accuracy ~80%
Test Suite Size 20 novice snippets

🛠️ Future Enhancements

  • Support for more complex patterns (nested loops, multiple return paths)
  • Integration with IDEs (VS Code extension, PyCharm plugin)
  • Machine learning-based feature extraction
  • Natural language explanations powered by LLMs
  • Multi-language support (JavaScript, Java, C++)
  • Real-time code suggestions during typing

📄 License

MIT License

Copyright (c) 2025

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.


🙏 Acknowledgements

This tool was created as part of:

CS 6795: Cognitive Science — Georgia Tech (Fall 2025)


📧 Contact

For questions, suggestions, or collaboration opportunities, please open an issue or reach out via the course portal.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages