Skip to content

giulio-leone/vectorless

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vectorless

Reasoning-based document indexing without vector databases

npm version License: MIT

Vectorless extracts hierarchical document structure from PDFs and other documents using AI agents, enabling intelligent document navigation and retrieval without vector databases or chunking.

Features

  • No Vector DB - Uses document structure and LLM reasoning instead of vector similarity
  • No Chunking - Documents organized into natural sections, not artificial chunks
  • Human-like Retrieval - Simulates how experts navigate complex documents
  • Multi-Document Search - Search across multiple documents with cross-document relations
  • Efficient Tree Search - Hybrid MCTS/Greedy algorithm for optimal performance
  • Expert Knowledge - Inject domain expertise without fine-tuning
  • Multi-Node Extraction - Extract and aggregate content from multiple nodes
  • Streaming SSE - Real-time progress events during processing
  • Provider Agnostic - Works with any AI SDK v6 compatible model

Installation

npm install vectorless
# or
pnpm add vectorless
# or
yarn add vectorless

Quick Start

import { createTreeSearchOrchestrator, getDomainTemplate } from "vectorless";
import { openai } from "@ai-sdk/openai";

const model = openai("gpt-4o-mini");
const orchestrator = createTreeSearchOrchestrator(model);

// Search with automatic algorithm selection
for await (const { event, result } of orchestrator.search(
  "What are the key findings?",
  knowledgeBase.tree,
)) {
  console.log(event.type, event.data);
  if (result) {
    console.log("Answer:", result.extractedContent);
  }
}

Multi-Document Search

import { createMultiDocSearchAgent } from "vectorless";

const multiSearch = createMultiDocSearchAgent(model);

for await (const { result } of multiSearch.searchAcross(
  "Compare conclusions across documents",
  [kb1, kb2, kb3],
  { aggregationStrategy: "merge" },
)) {
  if (result) {
    console.log("Merged answer:", result.mergedContent);
    console.log("Cross-doc relations:", result.crossDocRelations);
  }
}

Domain Templates

import { getDomainTemplate, createMemoryPreferenceStore } from "vectorless";

const store = createMemoryPreferenceStore();
const legalTemplate = await store.getTemplate("legal");
// Templates: legal, medical, technical, academic, financial

Architecture

Vectorless follows hexagonal architecture (ports & adapters pattern):

src/
├── domain/           # Pure business logic (schemas, types)
├── ports/            # Interface definitions (contracts)
├── agents/           # AI agents (20+ specialized agents)
│   ├── query-classifier.ts    # Query complexity detection
│   ├── greedy-search.ts       # Fast O(log n) search
│   ├── mcts-search.ts         # Monte Carlo Tree Search
│   ├── multi-doc-search.ts    # Cross-document search
│   ├── content-extractor.ts   # Fragment extraction
│   └── multi-node-aggregator.ts # Content aggregation
├── use-cases/        # Application logic orchestrators
├── templates/        # Domain-specific templates
└── infrastructure/   # Default adapters (memory, file)

Core Agents

Agent Purpose
QueryClassifierAgent Classifies query as simple/complex/multi-doc
GreedySearchAgent BM25-based O(log n) tree traversal
MCTSSearchAgent UCB1-based Monte Carlo Tree Search
MultiDocSearchAgent Parallel search across knowledge bases
ContentExtractorAgent Extracts relevant fragments from nodes
MultiNodeAggregatorAgent Deduplicates and merges content
TocDetectorAgent Detects document table of contents
StructureExtractorAgent Builds hierarchical tree from TOC
SummarizerAgent Generates section summaries
EntityExtractorAgent Extracts named entities
RelationExtractorAgent Finds entity relationships
QuestionAnswerAgent Answers questions from KB

Tree Search Algorithms

Query Classification

const classifier = createQueryClassifierAgent(model);
const classification = await classifier.classify("What year was founded?", 1);
// { type: "simple", confidence: 0.9, suggestedAlgorithm: "greedy" }

Greedy Search (Simple Queries)

  • Complexity: O(log n)
  • Scoring: BM25/TF-IDF + LLM refinement
  • Best for: Single-fact lookups, direct answers

MCTS Search (Complex Queries)

  • Algorithm: UCB1 with beam width
  • Formula: score = Q/N + C * sqrt(ln(T)/N)
  • Best for: Multi-step reasoning, comparisons

Multi-Document MCTS

  • Parallel expansion across documents
  • Aggregation strategies: merge, rank, cluster
  • Cross-document relation detection

Domain Templates

Built-in templates for specialized domains:

Template Focus
legal Contracts, regulations, liability clauses
medical Clinical trials, diagnoses, drug interactions
technical API docs, specifications, code examples
academic Research papers, citations, methodology
financial Reports, metrics, risk factors
import { LEGAL_TEMPLATE, FINANCIAL_TEMPLATE } from "vectorless";

console.log(LEGAL_TEMPLATE.priorityKeywords);
// ["shall", "hereby", "pursuant", "liability", ...]

console.log(FINANCIAL_TEMPLATE.sectionWeights);
// { "financial highlights": 1.5, "risk factors": 1.3, ... }

Ports (Interfaces)

Vectorless defines clean interfaces for all external dependencies:

Port Purpose
PdfParserPort PDF text extraction
CachePort Caching layer
KnowledgeBaseRepositoryPort KB persistence
GreedySearchPort Greedy search algorithm
MCTSSearchPort MCTS search algorithm
QueryClassifierPort Query classification
ContentExtractorPort Content extraction
MultiNodeAggregatorPort Content aggregation
PreferenceStorePort User preferences & templates

Integration with OneGenUI

For UI components and full integration:

pnpm add @onegenui/vectorless
import { useTreeSearch, useMultiDocSearch } from "@onegenui/vectorless";

function SearchPanel() {
  const { search, results, isSearching } = useTreeSearch(model);
  // ...
}

License

Credits

About

Vectorless 2.0 - Knowledge Extraction & Interactive Exploration for Documents

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors