Skip to content

Improved Document Sections Algorithm #87

@Stevenic

Description

@Stevenic

One of the less obvious features of Vectra is the Document Section Algorithm I created some 3 years ago... One of the main things I don't like about Vector Databases is that they return chunks out of order. This can lead to things like the model being presented with a partial sequence of instructions or set of instructions that are simply in the wrong order. The idea behind Document Sections is to try as best as possible to present relevant information to the model in the order it was written...

If you think about the way humans consume information. We take a book and we first find the chapter we're interested in. Then we read the chapter in the order it was written. My goal with Document Sections was to try and mimic that by using semantic relevance to create a heatmap of sorts over the corpus to basically identify the chapters of information that most likely contain the users answer. I then wanted to reconstruct that information in its entirety in the order it was written. I wrote this algorithm 3 years ago without the help of AI and it was super tricky to write.

Today we have AI and I have a super powerful tool that can analyze code bases of any size. So what follows is a conversation I had with my tool (powered by GPT-5.2):

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions