Skip to content

Conversation

@Mpdreamz
Copy link
Contributor

Summary

This PR fundamentally rearchitects the PipeTableParser to use a flat sibling structure instead of a deeply nested tree, reducing time complexity from O(n²) to O(n) for large tables.

The Problem

How the Old Parser Worked

The original parser allowed pipe delimiters to nest content as children. For a simple table like:

| a | b |
| c | d |

The inline tree structure was deeply nested:

PipeDelimiter [|]
└── "a"
    └── PipeDelimiter [|]
        └── "b"
            └── LineBreak [\n]
                └── PipeDelimiter [|]
                    └── "c"
                        └── PipeDelimiter [|]
                            └── "d"
                                └── LineBreak [\n]

Depth = O(n) where n = number of cells

Why This Was Problematic

  1. O(n²) Cell Boundary Detection: To find cell boundaries, the parser walked up the parent chain from each delimiter. With n delimiters nested n-deep, this required O(n²) operations.

  2. Stack Overflow on Large Tables: .NET's default stack depth limit caused tables with 1000+ rows to crash with DepthLimitExceededException.

  3. Quadratic Time Scaling:

    • 100→500 rows (5x): 42x slower (not 5x)
    • 500→1000 rows (2x): 3.9x slower (not 2x)
    • 1000→1500 rows (1.5x): 2.3x slower (not 1.5x)
  4. Large Tables Simply Failed: 5000+ row tables couldn't be parsed at all.

The Solution

Flat Sibling Structure

By setting IsClosed = true on PipeTableDelimiterInline, subsequent content becomes siblings rather than children:

| a | b |
| c | d |

Now produces a flat structure:

[|] ← [a] ← [|] ← [b] ← [|] ← [\n] ← [|] ← [c] ← [|] ← [d] ← [|] ← [\n]
 ↑────↑─────↑─────↑─────↑──────↑──────↑─────↑─────↑─────↑─────↑──────↑
                    All siblings at root level

Depth = O(1) constant

Cell Boundary Detection

Finding cell content is now a simple sibling walk:

For cell "b" in `| a | b |`:

    [|]  [a]  [|]  [b]  [|]  [\n]
               ↑    ↑    ↑
             start  │   current delimiter
                   cell content
                   
Walk backward from [|] until hitting another [|] or [\n]

Handling Nested Pipes

Pipes can still end up nested inside unmatched emphasis:

*a | b*|

The PromoteNestedPipesToRootLevel method detects and promotes these:

Before: EmphasisDelimiter { "a", Pipe, "b" }
After:  EmphasisDelimiter { "a" } ← Pipe ← Container { "b" }

Benchmarks

Baseline Results (Before)

Method Mean Error StdDev Gen0 Gen1 Allocated
'PipeTable 100 rows x 5 cols' 542.0 µs 2.25 µs 1.88 µs 2.9297 0.9766 367.38 KB
'PipeTable 500 rows x 5 cols' 23,018.4 µs 150.30 µs 133.24 µs - - 1818.08 KB
'PipeTable 1000 rows x 5 cols' 89,418.0 µs 507.04 µs 474.28 µs - - 3702.70 KB
'PipeTable 1500 rows x 5 cols' 201,593.3 µs 2,133.24 µs 1,995.44 µs - - 5660.16 KB
'PipeTable 5000 rows x 5 cols' -- -- -- -- --
'PipeTable 10000 rows x 5 cols' -- -- -- -- --

❌ = Failed with depth limit exceeded

Current Results (After)

Method Mean Error StdDev Gen0 Gen1 Gen2 Allocated
'PipeTable 100 rows x 5 cols' 147.2 µs 1.75 µs 1.46 µs 2.9297 0.7324 0.4883 360.54 KB
'PipeTable 500 rows x 5 cols' 743.3 µs 7.30 µs 6.10 µs 13.6719 5.8594 5.8594 1772.96 KB
'PipeTable 1000 rows x 5 cols' 1,530.0 µs 28.71 µs 29.48 µs 25.3906 11.7188 11.7188 3547.08 KB
'PipeTable 1500 rows x 5 cols' 2,360.1 µs 43.73 µs 117.48 µs 39.0625 19.5313 19.5313 5377.33 KB
'PipeTable 5000 rows x 5 cols' 8,044.9 µs 39.83 µs 33.26 µs 78.1250 46.8750 46.8750 18121.73 KB
'PipeTable 10000 rows x 5 cols' 16,383.8 µs 124.95 µs 116.88 µs 125.0000 93.7500 93.7500 36538.63 KB

Performance Improvement

Rows Before After Speedup
100 542 µs 147 µs 3.7x
500 23,018 µs 743 µs 31x
1000 89,418 µs 1,530 µs 58x
1500 201,593 µs 2,360 µs 85x
5000 ❌ crashed 8,045 µs works
10000 ❌ crashed 16,384 µs works

Memory Improvement

Rows Before After Reduction
100 367.38 KB 360.54 KB 1.9%
500 1818.08 KB 1772.96 KB 2.5%
1000 3702.70 KB 3547.08 KB 4.2%
1500 5660.16 KB 5377.33 KB 5.0%

Scaling Verification (Linear)

Rows Time Time/Row Scaling
1000 1,530 µs 1.53 µs -
5000 (5x) 8,045 µs 1.61 µs ✅ ~5x
10000 (10x) 16,384 µs 1.64 µs ✅ ~10x

Time per row is nearly constant, confirming O(n) complexity.

Breaking Changes

None. The output AST is identical; only the internal parsing strategy changed.

Test Results

All 3,595 existing tests pass.

Pipe tables were creating deeply nested tree structures where each pipe
delimiter contained all subsequent content as children, causing O(n²)
traversal complexity for n cells. This change restructures the parser to
use a flat sibling-based structure, treating tables as matrices rather
than nested trees.

Key changes:
- Set IsClosed=true on PipeTableDelimiterInline to prevent nesting
- Add PromoteNestedPipesToRootLevel() to flatten pipes nested in emphasis
- Update cell boundary detection to use sibling traversal
- Move EmphasisInlineParser before PipeTableParser in processing order
- Fix EmphasisInlineParser to continue past IsClosed delimiters
- Add ContainsParentOrSiblingOfType<T>() helper for flat structure detection

Performance improvements (measured on typical markdown content):

| Rows | Before    | After   | Speedup |
|------|-----------|---------|---------|
| 100  | 542 μs    | 150 μs  | 3.6x    |
| 500  | 23,018 μs | 763 μs  | 30x     |
| 1000 | 89,418 μs | 1,596 μs| 56x     |
| 1500 | 201,593 μs| 2,740 μs| 74x     |
| 5000 | CRASH     | 10,588 μs| ∞      |
| 10000| CRASH     | 18,551 μs| ∞      |

Tables with 5000+ rows previously crashed due to stack overflow from
recursive depth. They now parse successfully with linear time complexity.
@xoofx
Copy link
Owner

xoofx commented Jan 30, 2026

Thank you! Yep, this code was bad when I implemented it in the first place, I was not very inspired. 😅

I assume you have been using a coding agent, would you mind sharing which one with which level of thinking?

@Mpdreamz
Copy link
Contributor Author

Aye! Claude Code Claude Opus 4.5

Anecdotally this is the first time I had to get firm with it stating "THERE IS A WAY". It kept giving up on emphasis inlined in table cells with pipes that are part of the emphasis.

In the end the trick was ensuring it registers itself after the emphasis parser.

@xoofx xoofx merged commit d47fbc7 into xoofx:master Jan 30, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants