Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions ONTOLOGY_ANALYSIS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Ontological Analysis of TBP Monty Repository

This directory contains the results of an automated ontological exploration of the TBP Monty codebase.

## Overview

The ontological analysis systematically extracts and catalogs all concepts from:
- **Python source files** (105 files in `src/`): Classes, Functions, Methods, Constants, Enums, Protocols, TypedDicts, etc.
- **YAML configuration files** (280 files in `conf/`): Configuration keys, experiment types, model configurations, parameter groups

## Files

### `extract_ontology.py`
Python script that performs the automated concept extraction. It uses:
- **AST parsing** for Python files to extract classes, functions, methods, constants, and their docstrings
- **YAML parsing** for configuration files to extract configuration hierarchies and parameters

### `ontology_concepts.json`
The output JSON file containing all extracted concepts. Each concept has:
- `concept`: The name/identifier of the concept
- `description`: A brief description of what it is and what it does
- `source`: The file path where the concept was found

## Statistics

Total concepts extracted: **3,352**

### Breakdown by Type:
- **Config**: 2,227 (YAML configuration concepts)
- **Method**: 437 (instance methods in classes)
- **Method (with params)**: 157 (methods with explicit parameter lists)
- **Function**: 147 (module-level functions)
- **Class**: 130 (standard Python classes)
- **Function (with params)**: 67 (functions with explicit parameter lists)
- **Protocol**: 35+ (runtime-checkable protocols and interfaces)
- **Constant**: 19 (module-level constants)
- **Dataclass**: 15 (dataclasses for data structures)
- **Other**: TypedDict, Enum, ABC (Abstract Base Classes)

## Usage

### Running the Extraction
```bash
python extract_ontology.py
```

This will:
1. Scan all Python files in `src/` directory
2. Scan all YAML files in `conf/` directory
3. Extract concepts from both
4. Save results to `ontology_concepts.json`

### Analyzing the Results

The JSON file can be used for:

1. **Identifying ontological overload**: Find concepts with similar names or descriptions that might represent the same thing
```bash
jq '.[] | select(.concept | contains("Match"))' ontology_concepts.json
```

2. **Finding duplicated concepts**: Search for concepts that appear in multiple places
```bash
jq 'group_by(.concept) | .[] | select(length > 1)' ontology_concepts.json
```

3. **Analyzing by module**: Filter concepts by source file or directory
```bash
jq '.[] | select(.source | contains("evidence_matching"))' ontology_concepts.json
```

4. **Searching by description**: Find concepts related to specific functionality
```bash
jq '.[] | select(.description | contains("graph"))' ontology_concepts.json
```

## Next Steps

Use this ontological inventory to:

1. **Identify redundant concepts**: Look for multiple concepts that serve the same purpose
2. **Find naming inconsistencies**: Spot similar concepts with different naming conventions
3. **Detect architectural issues**: Identify concepts that might be in the wrong module
4. **Create refactoring tasks**: Generate individual tasks to consolidate or rename concepts
5. **Build a data model**: Create a comprehensive data model showing relationships between concepts

## Example Queries

### Find all learning module related concepts:
```bash
jq '.[] | select(.description | contains("learning") or .concept | contains("LM"))' ontology_concepts.json
```

### Find all motor system concepts:
```bash
jq '.[] | select(.concept | contains("Motor") or .concept | contains("motor"))' ontology_concepts.json
```

### Find configuration concepts for a specific experiment:
```bash
jq '.[] | select(.source | contains("conf/experiment"))' ontology_concepts.json
```

### Count concepts by source directory:
```bash
jq -r '.[].source | split("/")[0:3] | join("/")' ontology_concepts.json | sort | uniq -c | sort -rn
```

## Notes

- The extraction is designed to be conservative and may miss some concepts (better to miss than to create false positives)
- Private methods (starting with `_`) are excluded unless they are special methods (`__init__`, `__call__`, etc.)
- YAML concepts are organized hierarchically using path notation (e.g., `experiment/config/motor_system/policy`)
- Some descriptions are truncated at 200 characters for readability
213 changes: 213 additions & 0 deletions ONTOLOGY_QUICKSTART.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# Quick Reference: Using the Ontological Analysis

This guide provides practical examples for using the ontology analysis tools.

## Files Overview

| File | Purpose |
|------|---------|
| `extract_ontology.py` | Extracts all concepts from Python and YAML files |
| `ontology_concepts.json` | JSON output with 3,352 cataloged concepts |
| `analyze_ontology.py` | Helper script to analyze the concepts |
| `ONTOLOGY_ANALYSIS.md` | Complete documentation |

## Quick Start

### 1. Re-run the extraction (if code changes)
```bash
python extract_ontology.py
```

### 2. Run the analysis
```bash
python analyze_ontology.py
```

### 3. Use jq for custom queries
```bash
# Install jq if needed: sudo apt-get install jq

# Count total concepts
jq 'length' ontology_concepts.json

# Find all concepts related to "motor"
jq '.[] | select(.concept | contains("motor") or contains("Motor"))' ontology_concepts.json

# Count concepts by type
jq -r '.[].description | split(":")[0]' ontology_concepts.json | sort | uniq -c | sort -rn

# Find all classes
jq '.[] | select(.description | startswith("Class:"))' ontology_concepts.json

# Search in specific module
jq '.[] | select(.source | contains("evidence_matching"))' ontology_concepts.json
```

## Common Analysis Tasks

### Find Potential Duplicates

Look for concepts with similar names that might be redundant:

```bash
# Find all "match" related concepts
jq '.[] | select(.concept | test("match"; "i"))' ontology_concepts.json | jq -s 'group_by(.concept) | .[] | {concept: .[0].concept, count: length, sources: [.[].source]}'

# Find concepts that appear in multiple files
jq 'group_by(.concept) | map(select(length > 1)) | .[] | {concept: .[0].concept, count: length, locations: [.[].source]}' ontology_concepts.json
```

### Identify Ontological Overload

Find areas with too many concepts:

```bash
# Count concepts per module
jq -r '.[].source | split("/")[0:3] | join("/")' ontology_concepts.json | sort | uniq -c | sort -rn | head -20

# Find modules with most classes
jq '.[] | select(.description | startswith("Class:")) | .source' ontology_concepts.json | sed 's|/[^/]*$||' | sort | uniq -c | sort -rn | head -10
```

### Explore Configuration Structure

Understand YAML configuration hierarchy:

```bash
# All configuration concepts
jq '.[] | select(.description | startswith("Config:"))' ontology_concepts.json | jq -s 'length'

# Group configs by file
jq '.[] | select(.description | startswith("Config:")) | .source' ontology_concepts.json | sort | uniq -c | sort -rn | head -20

# Find experiment configurations
jq '.[] | select(.source | contains("conf/experiment"))' ontology_concepts.json | head -20
```

### Find Specific Patterns

```bash
# All learning module concepts
jq '.[] | select(.concept | contains("LM") or .concept | contains("Learning"))' ontology_concepts.json

# All motor policies
jq '.[] | select(.concept | contains("Policy") or .concept | contains("policy"))' ontology_concepts.json

# All protocols (interfaces)
jq '.[] | select(.description | contains("Protocol"))' ontology_concepts.json

# All dataclasses
jq '.[] | select(.description | startswith("Dataclass:"))' ontology_concepts.json

# All enums
jq '.[] | select(.description | contains("Enum"))' ontology_concepts.json
```

## Python API Examples

### Load and analyze in Python

```python
import json

# Load concepts
with open('ontology_concepts.json', 'r') as f:
concepts = json.load(f)

# Find concepts by name
motor_concepts = [c for c in concepts if 'motor' in c['concept'].lower()]
print(f"Found {len(motor_concepts)} motor-related concepts")

# Group by source file
from collections import defaultdict
by_file = defaultdict(list)
for concept in concepts:
by_file[concept.get('source', 'unknown')].append(concept['concept'])

# Files with most concepts
sorted_files = sorted(by_file.items(), key=lambda x: -len(x[1]))
for file, concept_list in sorted_files[:10]:
print(f"{len(concept_list):3d} concepts in {file}")

# Find duplicate concept names
from collections import Counter
concept_names = [c['concept'] for c in concepts]
duplicates = {name: count for name, count in Counter(concept_names).items() if count > 1}
print(f"Found {len(duplicates)} concept names used in multiple places")
```

### Custom analysis script

```python
import json

def find_related_concepts(keyword, concepts):
"""Find all concepts related to a keyword"""
return [
c for c in concepts
if keyword.lower() in c['concept'].lower()
or keyword.lower() in c['description'].lower()
]

# Load concepts
with open('ontology_concepts.json', 'r') as f:
concepts = json.load(f)

# Find related concepts
evidence_concepts = find_related_concepts('evidence', concepts)
print(f"\nFound {len(evidence_concepts)} evidence-related concepts:")
for c in evidence_concepts[:10]:
print(f" - {c['concept']}: {c['description'][:80]}")
```

## Identifying Refactoring Opportunities

### 1. Find Similar Names
```bash
# Look for variations of the same concept
jq '.[] | .concept' ontology_concepts.json | grep -i "match" | sort | uniq
```

### 2. Find Scattered Implementations
```bash
# Find concepts that appear in multiple unrelated modules
jq 'group_by(.concept) | map(select(length > 3)) | .[] | {name: .[0].concept, locations: [.[].source]}' ontology_concepts.json
```

### 3. Identify Naming Inconsistencies
```bash
# Compare naming patterns (e.g., snake_case vs camelCase)
jq -r '.[].concept' ontology_concepts.json | grep -E "^[A-Z]" | head -20 # CamelCase
jq -r '.[].concept' ontology_concepts.json | grep -E "^[a-z_]" | head -20 # snake_case
```

## Output Format

Each concept in `ontology_concepts.json` has:

```json
{
"concept": "ConceptName",
"description": "Type: Brief description",
"source": "relative/path/to/file.py"
}
```

## Tips

1. **Start broad**: Use `analyze_ontology.py` for overview
2. **Drill down**: Use `jq` for specific queries
3. **Look for patterns**: Similar names, duplicate locations, scattered implementations
4. **Document findings**: Create a separate document listing refactoring opportunities
5. **Prioritize**: Focus on high-impact areas (frequently used modules, public APIs)

## Next Steps

1. Review the analysis output from `analyze_ontology.py`
2. Identify top 10 areas with potential issues
3. Create individual refactoring tasks for each issue
4. Use the JSON data to track which concepts have been addressed

---

**Questions?** See `ONTOLOGY_ANALYSIS.md` for complete documentation.
Loading