Reasoning-based document indexing without vector databases
Vectorless extracts hierarchical document structure from PDFs and other documents using AI agents, enabling intelligent document navigation and retrieval without vector databases or chunking.
- No Vector DB - Uses document structure and LLM reasoning instead of vector similarity
- No Chunking - Documents organized into natural sections, not artificial chunks
- Human-like Retrieval - Simulates how experts navigate complex documents
- Multi-Document Search - Search across multiple documents with cross-document relations
- Efficient Tree Search - Hybrid MCTS/Greedy algorithm for optimal performance
- Expert Knowledge - Inject domain expertise without fine-tuning
- Multi-Node Extraction - Extract and aggregate content from multiple nodes
- Streaming SSE - Real-time progress events during processing
- Provider Agnostic - Works with any AI SDK v6 compatible model
npm install vectorless
# or
pnpm add vectorless
# or
yarn add vectorlessimport { createTreeSearchOrchestrator, getDomainTemplate } from "vectorless";
import { openai } from "@ai-sdk/openai";
const model = openai("gpt-4o-mini");
const orchestrator = createTreeSearchOrchestrator(model);
// Search with automatic algorithm selection
for await (const { event, result } of orchestrator.search(
"What are the key findings?",
knowledgeBase.tree,
)) {
console.log(event.type, event.data);
if (result) {
console.log("Answer:", result.extractedContent);
}
}import { createMultiDocSearchAgent } from "vectorless";
const multiSearch = createMultiDocSearchAgent(model);
for await (const { result } of multiSearch.searchAcross(
"Compare conclusions across documents",
[kb1, kb2, kb3],
{ aggregationStrategy: "merge" },
)) {
if (result) {
console.log("Merged answer:", result.mergedContent);
console.log("Cross-doc relations:", result.crossDocRelations);
}
}import { getDomainTemplate, createMemoryPreferenceStore } from "vectorless";
const store = createMemoryPreferenceStore();
const legalTemplate = await store.getTemplate("legal");
// Templates: legal, medical, technical, academic, financialVectorless follows hexagonal architecture (ports & adapters pattern):
src/
├── domain/ # Pure business logic (schemas, types)
├── ports/ # Interface definitions (contracts)
├── agents/ # AI agents (20+ specialized agents)
│ ├── query-classifier.ts # Query complexity detection
│ ├── greedy-search.ts # Fast O(log n) search
│ ├── mcts-search.ts # Monte Carlo Tree Search
│ ├── multi-doc-search.ts # Cross-document search
│ ├── content-extractor.ts # Fragment extraction
│ └── multi-node-aggregator.ts # Content aggregation
├── use-cases/ # Application logic orchestrators
├── templates/ # Domain-specific templates
└── infrastructure/ # Default adapters (memory, file)
| Agent | Purpose |
|---|---|
QueryClassifierAgent |
Classifies query as simple/complex/multi-doc |
GreedySearchAgent |
BM25-based O(log n) tree traversal |
MCTSSearchAgent |
UCB1-based Monte Carlo Tree Search |
MultiDocSearchAgent |
Parallel search across knowledge bases |
ContentExtractorAgent |
Extracts relevant fragments from nodes |
MultiNodeAggregatorAgent |
Deduplicates and merges content |
TocDetectorAgent |
Detects document table of contents |
StructureExtractorAgent |
Builds hierarchical tree from TOC |
SummarizerAgent |
Generates section summaries |
EntityExtractorAgent |
Extracts named entities |
RelationExtractorAgent |
Finds entity relationships |
QuestionAnswerAgent |
Answers questions from KB |
const classifier = createQueryClassifierAgent(model);
const classification = await classifier.classify("What year was founded?", 1);
// { type: "simple", confidence: 0.9, suggestedAlgorithm: "greedy" }- Complexity: O(log n)
- Scoring: BM25/TF-IDF + LLM refinement
- Best for: Single-fact lookups, direct answers
- Algorithm: UCB1 with beam width
- Formula:
score = Q/N + C * sqrt(ln(T)/N) - Best for: Multi-step reasoning, comparisons
- Parallel expansion across documents
- Aggregation strategies: merge, rank, cluster
- Cross-document relation detection
Built-in templates for specialized domains:
| Template | Focus |
|---|---|
legal |
Contracts, regulations, liability clauses |
medical |
Clinical trials, diagnoses, drug interactions |
technical |
API docs, specifications, code examples |
academic |
Research papers, citations, methodology |
financial |
Reports, metrics, risk factors |
import { LEGAL_TEMPLATE, FINANCIAL_TEMPLATE } from "vectorless";
console.log(LEGAL_TEMPLATE.priorityKeywords);
// ["shall", "hereby", "pursuant", "liability", ...]
console.log(FINANCIAL_TEMPLATE.sectionWeights);
// { "financial highlights": 1.5, "risk factors": 1.3, ... }Vectorless defines clean interfaces for all external dependencies:
| Port | Purpose |
|---|---|
PdfParserPort |
PDF text extraction |
CachePort |
Caching layer |
KnowledgeBaseRepositoryPort |
KB persistence |
GreedySearchPort |
Greedy search algorithm |
MCTSSearchPort |
MCTS search algorithm |
QueryClassifierPort |
Query classification |
ContentExtractorPort |
Content extraction |
MultiNodeAggregatorPort |
Content aggregation |
PreferenceStorePort |
User preferences & templates |
For UI components and full integration:
pnpm add @onegenui/vectorlessimport { useTreeSearch, useMultiDocSearch } from "@onegenui/vectorless";
function SearchPanel() {
const { search, results, isSearching } = useTreeSearch(model);
// ...
}