Vectorless 2.0 transforms document processing from simple PDF indexing to a comprehensive Knowledge Extraction & Interactive Exploration system. It extracts structured knowledge from documents, builds a queryable knowledge base, and provides AI-powered Q&A capabilities.
- Multi-Format Support: PDF, Word (.docx), Excel (.xlsx), Markdown, and plain text
- Entity Extraction: People, places, dates, organizations, concepts, events
- Relation Discovery: Find connections between document sections
- Quote Extraction: Identify significant quotes with attribution
- Knowledge Base: Build a structured, queryable knowledge graph
- AI Q&A: Ask questions and get sourced answers from documents
- MCP Integration: Use as MCP tools in AI chat applications
import { generatePageIndex } from "@onegenui/vectorless";
import { createGoogleGenerativeAI } from "@ai-sdk/google";
const google = createGoogleGenerativeAI({ apiKey: process.env.GOOGLE_API_KEY });
const model = google("gemini-3-flash-preview-exp");
const pdfBuffer = fs.readFileSync("document.pdf");
const result = await generatePageIndex(pdfBuffer, { model });
console.log(result.tree); // Document structure tree
console.log(result.title); // Document title
console.log(result.hasToc); // Whether TOC was detectedimport { generateKnowledgeBase } from "@onegenui/vectorless";
const buffer = fs.readFileSync("document.pdf");
const result = await generateKnowledgeBase(buffer, "document.pdf", "application/pdf", {
model,
extractEntities: true,
extractRelations: true,
extractQuotes: true,
extractKeywords: true,
extractCitations: true,
generateSummaries: true,
generateKeyInsights: true,
});
const kb = result.knowledgeBase;
console.log(kb.entities); // Extracted entities
console.log(kb.relations); // Entity relations
console.log(kb.quotes); // Significant quotes
console.log(kb.keywords); // Document keywords
console.log(kb.keyInsights); // AI-generated insightsVectorless 2.0 follows a hexagonal architecture pattern:
┌─────────────────────────────────────────────────────┐
│ Application │
│ ┌─────────────────────────────────────────────┐ │
│ │ Use Cases │ │
│ │ • GenerateIndexUseCase │ │
│ │ • GenerateKnowledgeBaseUseCase │ │
│ │ • AnswerQuestionUseCase │ │
│ │ • DeepDiveUseCase │ │
│ └─────────────────────────────────────────────┘ │
│ ↑ │
│ Ports │
│ ↓ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Adapters │ │
│ │ • PdfParseAdapter │ │
│ │ • MammothAdapter (Word) │ │
│ │ • XlsxAdapter (Excel) │ │
│ │ • MarkdownAdapter │ │
│ │ • MemoryCacheAdapter │ │
│ │ • MemoryKnowledgeBaseRepository │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
interface Entity {
id: string;
type: "person" | "date" | "place" | "concept" | "organization" | "event" | "number" | "term";
value: string;
normalized?: string;
description?: string;
occurrences: Array<{
nodeId: string;
pageNumber: number;
position?: number;
context?: string;
}>;
confidence?: number;
}interface Relation {
id: string;
sourceNodeId: string;
targetNodeId: string;
type: "references" | "contradicts" | "supports" | "elaborates" | "precedes" | "follows" | "summarizes" | "defines" | "examples";
confidence: number;
evidence?: string;
}interface Quote {
id: string;
text: string;
pageNumber: number;
nodeId: string;
significance: "key" | "supporting" | "notable";
speaker?: string;
context?: string;
}interface KnowledgeNode {
id: string;
title: string;
level: number;
pageStart: number;
pageEnd: number;
summary: string;
detailedSummary?: string;
keyPoints: string[];
entities: Array<{ entityId: string; relevance?: number }>;
keywords: string[];
quotes: Array<{ quoteId: string }>;
internalRefs: string[];
externalRefs: Array<{ citationId: string }>;
metrics?: {
wordCount: number;
complexity: "low" | "medium" | "high";
importance: number;
readingTimeMinutes: number;
depth: number;
};
rawText?: string;
children: KnowledgeNode[];
}interface DocumentKnowledgeBase {
id: string;
filename: string;
mimeType: string;
hash: string;
processedAt: string;
totalPages: number;
totalTokens: number;
tree: KnowledgeNode;
entities: Entity[];
relations: Relation[];
keywords: Keyword[];
quotes: Quote[];
citations: Citation[];
metrics: DocumentMetrics;
description: string;
keyInsights: string[];
}Vectorless provides MCP (Model Context Protocol) tools for integration with AI chat:
Generate a structured index from a PDF document.
{
"name": "pdf-index",
"arguments": {
"url": "https://example.com/document.pdf",
"addSummaries": true,
"addDescription": true
}
}Extract a full knowledge base from a document.
{
"name": "generate-knowledge-base",
"arguments": {
"url": "https://example.com/document.pdf",
"extractEntities": true,
"extractRelations": true,
"extractQuotes": true
}
}Ask questions about a processed knowledge base.
{
"name": "answer-question",
"arguments": {
"knowledgeBaseId": "kb-123",
"question": "What are the main findings?"
}
}List all processed knowledge bases.
{
"name": "list-knowledge-bases",
"arguments": {}
}Vectorless 2.0 includes React components for visualization:
Navigate the document tree structure.
import { DocumentExplorer } from "@onegenui/components";
<DocumentExplorer
tree={knowledgeBase.tree}
onNodeSelect={(node) => console.log(node)}
expandedByDefault={true}
/>Browse and filter extracted entities.
import { EntityExplorer } from "@onegenui/components";
<EntityExplorer
entities={knowledgeBase.entities}
onEntityClick={(entity) => console.log(entity)}
filterTypes={["person", "organization"]}
/>Visualize entity relationships.
import { KnowledgeGraph } from "@onegenui/components";
<KnowledgeGraph
entities={knowledgeBase.entities}
relations={knowledgeBase.relations}
onNodeClick={(entity) => console.log(entity)}
/>Display date/event entities on a timeline.
import { DocumentTimeline } from "@onegenui/components";
<DocumentTimeline
entities={knowledgeBase.entities}
onEntityClick={(entity) => console.log(entity)}
/>Display document citations.
import { CitationViewer } from "@onegenui/components";
<CitationViewer
citations={knowledgeBase.citations}
onCitationClick={(citation) => console.log(citation)}
/>Show detailed analysis for a selected node.
import { DeepAnalysisPanel } from "@onegenui/components";
<DeepAnalysisPanel
node={selectedNode}
entities={knowledgeBase.entities}
/>Manage knowledge base state and queries.
import { useKnowledgeBase } from "@onegenui/components";
const {
knowledgeBase,
setKnowledgeBase,
entities,
relations,
quotes,
getEntityById,
getRelationsByNode,
searchEntities,
filterEntitiesByType,
} = useKnowledgeBase({ initialKnowledgeBase: kb });Manage document tree navigation state.
import { useDocumentExplorer } from "@onegenui/components";
const {
tree,
selectedNode,
expandedNodes,
selectNode,
toggleNode,
expandAll,
collapseAll,
searchNodes,
getNodePath,
} = useDocumentExplorer({ initialTree: kb.tree });Manage Q&A interaction state.
import { useQuestionAnswer } from "@onegenui/components";
const {
question,
setQuestion,
answer,
isLoading,
error,
askQuestion,
clearAnswer,
history,
} = useQuestionAnswer();The following AI agents are available for customized extraction:
- EntityExtractorAgent: Extract named entities from text
- RelationExtractorAgent: Discover relations between sections
- QuoteExtractorAgent: Identify significant quotes
- KeywordExtractorAgent: Extract keywords with TF-IDF scoring
- MetricsCalculatorAgent: Calculate document metrics
- CitationResolverAgent: Parse and structure citations
- QuestionAnswerAgent: Answer questions from knowledge base
- DeepDiveAgent: Provide detailed analysis of topics
Generate a page index from a PDF document.
Parameters:
buffer:Buffer | ArrayBuffer- PDF file contentsoptions:PageIndexOptions- Configuration options
Returns: Promise<PageIndexResult>
Generate a full knowledge base from a document.
Parameters:
buffer:Buffer | ArrayBuffer- Document contentsfilename:string- Original filenamemimeType:string- Document MIME typeoptions:KnowledgeBaseOptions- Configuration options
Returns: Promise<KnowledgeBaseResult>
Apache 2.0