-
Notifications
You must be signed in to change notification settings - Fork 49
Description
One of the less obvious features of Vectra is the Document Section Algorithm I created some 3 years ago... One of the main things I don't like about Vector Databases is that they return chunks out of order. This can lead to things like the model being presented with a partial sequence of instructions or set of instructions that are simply in the wrong order. The idea behind Document Sections is to try as best as possible to present relevant information to the model in the order it was written...
If you think about the way humans consume information. We take a book and we first find the chapter we're interested in. Then we read the chapter in the order it was written. My goal with Document Sections was to try and mimic that by using semantic relevance to create a heatmap of sorts over the corpus to basically identify the chapters of information that most likely contain the users answer. I then wanted to reconstruct that information in its entirety in the order it was written. I wrote this algorithm 3 years ago without the help of AI and it was super tricky to write.
Today we have AI and I have a super powerful tool that can analyze code bases of any size. So what follows is a conversation I had with my tool (powered by GPT-5.2):