New OutlineTextSplitter class

One of the thing that drives me batty about RAG is that it often retrieves chunks that contain partial lists. It'll retrieve a chunk that contains Step 2 and 3 of a task but completely drop Step 1 and 4.  My `Document Sections` is designed to improve that situation by at least keeping the order of steps correct but it doesn't solve for dropped steps and list items.  The core issue is that many times these missing steps are in in adjacent chunks that aren't semantically relevant to the query.

In an effort to solve this issue, I'm working with my tool (GPT-5.2) to design a new `OutlineTextSplitter` class. The idea is to break text splitting into a 2 phase problem. You first create an outline that identifies the structure of the document you want to split, ignoring any token counts, and then you split the document based on the outline, not delimiters. This should result in a better chance of a sequence of steps landing in the same chunk. The `Document Sections` algorithm will then do the rest of the work. 

Bellow is a partial discussion with my tool thinking through the problem:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New OutlineTextSplitter class #88

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

New OutlineTextSplitter class #88

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions