Merged
Conversation
Implements comprehensive markdown structure processing functionality that converts markdown elements (headings, sections, lists, tables, code blocks, and blockquotes) into RDF graph entities. Changes: - Add KB entity models for markdown structure elements (KbHeading, KbSection, KbList, KbListItem, KbTable, KbCodeBlock, KbBlockquote) - Create MarkdownStructureProcessor to convert markdown elements to KB entities with proper relationships - Integrate MarkdownStructureProcessor into EntityProcessor pipeline - Add generate_markdown_element_id method to EntityIdGenerator - Add comprehensive test coverage for all markdown structure types All markdown structure elements are now processed into the RDF graph with proper metadata including position information, nesting levels, and parent-child relationships. Tests: All 9 tests pass
Converts markdown structure processing tests to follow the project's specification-driven testing methodology instead of unit tests. Changes: - Remove unit test file from tests/processor directory - Create 5 new specification test cases for markdown structure: - markdown_structure_01_single_heading - markdown_structure_02_code_block - markdown_structure_03_list - markdown_structure_04_table - markdown_structure_05_blockquote - Update markdown structure processor to use deterministic IDs based on position instead of random UUIDs for sections, lists, tables, and code blocks - Regenerate all 60 spec test expected outputs to include new markdown structure entities in RDF graphs - Add regenerate_spec_outputs.py script for batch updating test expectations when processor output changes Test Results: All 61 specification tests pass This aligns with the project's specification-driven testing approach where behavior is captured in declarative artifacts (input.md and expected_output.ttl files) rather than imperative Python test code.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: Add markdown structure processing to graph
Description
Summary
Implements comprehensive markdown structure processing functionality that converts markdown elements (headings, sections, lists, tables, code blocks, and blockquotes) into RDF graph entities with proper relationships and metadata.
Key Changes
Added 7 new Pydantic models for markdown structure:
KbHeading - Markdown headings (h1-h6) with level and hierarchy
KbSection - Content sections with heading relationships
KbList - Ordered/unordered lists with item counts
KbListItem - Individual list items with parent relationships
KbTable - Tables with row/column counts and headers
KbCodeBlock - Code blocks with language and line count
KbBlockquote - Blockquotes with nesting levels
All models include RDF property mappings, position tracking, and Schema.org types.
Converts markdown elements to KB entities
Maintains parent-child relationships (heading↔section, list↔items)
Tracks position information (start/end line numbers)
Uses deterministic ID generation based on position for reproducibility
Provides statistics on extracted structure
Integrated into main processing pipeline
Automatically extracts structure from all documents
Processes alongside todos, wikilinks, and named entities
Added generate_markdown_element_id() method
Deterministic URIs based on element type and position
Created 5 new test cases in specs/test_cases/:
markdown_structure_01_single_heading
markdown_structure_02_code_block
markdown_structure_03_list
markdown_structure_04_table
markdown_structure_05_blockquote
Regenerated all 60 existing spec test outputs to include new entities
Added scripts/regenerate_spec_outputs.py utility for batch updates
Impact
All markdown structure elements are now fully represented in the knowledge graph with:
✅ Proper RDF types and Schema.org mappings
✅ Position metadata (start/end line numbers)
✅ Parent-child relationships
✅ Queryable via SPARQL
✅ Deterministic, reproducible entity IDs
Test Plan

All 61 specification tests pass

RDF converter handles all new entity types

Deterministic ID generation ensures test reproducibility

Integration tests verify end-to-end processing

Spec tests use declarative approach per project standards
Testing Results
============================= test session starts ==============================
collected 61 items
tests/test_specifications.py::test_specifications PASSED x60
tests/test_specifications.py::test_test_cases_directory_exists PASSED
===================== 61 passed, 31 warnings in 1.51s =========================