MindooDB provides an incremental change processing mechanism via iterateChangesSince() that enables efficient scanning of documents that have changed since a given cursor. This document explores various strategies for building dynamic indexes and query systems on top of MindooDB, inspired by proven approaches from other database systems.
MindooDB maintains an internal index that tracks documents sorted by (lastModified, docId), enabling efficient incremental processing. The iterateChangesSince() method:
- Accepts a cursor
{ lastModified: number, docId: string }to resume from a specific position - Processes documents in modification order (oldest first)
- Supports consumer-driven batching: as an async generator, you can stop iterating at any time and resume later from the last cursor you observed
- Returns (or exposes) the cursor of the each processed document for continuation
- Handles deleted documents (they remain in the index and are marked with
MindooDoc.idDeleted()==true, useful to remove them from external indexes)
This mechanism provides the foundation for building persistent indexes that can be incrementally updated as documents change, without requiring full database scans.
The following sections discuss various indexing approaches for NoSQL databases. We currently have implemented the Virtual Views API.
Concept: CouchDB uses JavaScript map/reduce functions to build views (indexes) that extract and compute values from documents. The map function emits key-value pairs, and reduce functions aggregate results.
Application to MindooDB:
- Map Function: User-defined function that takes a
MindooDocand emits(key, value)pairs - Reduce Function (optional): Aggregates values for the same key (e.g., count, sum, average)
- Incremental Updates: Use
iterateChangesSince()to process only changed documents since last index update - Persistent Index: Store the emitted key-value pairs in a separate index structure
Example Use Case:
// Map function: Extract category and date from documents
function mapFunction(doc: MindooDoc) {
const data = doc.getData();
if (data.category && data.createdAt) {
emit([data.category, data.createdAt], {
docId: doc.getId(),
title: data.title,
lastModified: doc.getLastModified()
});
}
}
// Reduce function: Count documents per category
function reduceFunction(keys, values) {
return values.length;
}Benefits:
- Flexible: Users define what to index via map functions
- Incremental: Only changed documents need reprocessing
- Proven pattern: Widely understood from CouchDB
- Supports complex queries: Multiple keys, ranges, aggregations
Incremental Update Mechanism:
The key to efficient incremental map/reduce is caching map outputs in the index. Here's how it works:
-
Store Map Outputs: For each document, store the key-value pairs emitted by the map function
- Index structure:
Map<docId, Array<{key, value}>>- tracks what each document emitted - View index:
Map<key, Array<{docId, value}>>- organizes by emitted keys
- Index structure:
-
Document Updates: When a document changes via
iterateChangesSince():- Remove old entries: Look up cached map outputs for the document, remove those key-value pairs from the view index
- Add new entries: Run map function on the updated document, add new key-value pairs to the view index
- Update cache: Replace cached map outputs with new ones
- Incremental reduce: Update reduce results by subtracting old values and adding new values
-
Document Deletions: When a document is deleted:
- Remove entries: Look up cached map outputs, remove all key-value pairs from the view index
- Clear cache: Remove document from the map output cache
- Incremental reduce: Subtract deleted document's contributions from reduce results
Example Implementation:
class MapReduceView {
// Cache of map outputs per document
private mapOutputs: Map<string, Array<{key: any, value: any}>> = new Map();
// View index: organized by emitted keys
private viewIndex: Map<string, Array<{docId: string, value: any}>> = new Map();
// Reduce cache: pre-computed reduce results per key
private reduceCache: Map<string, any> = new Map();
async updateDocument(doc: MindooDoc) {
const docId = doc.getId();
// Get old map outputs (if document existed before)
const oldOutputs = this.mapOutputs.get(docId) || [];
// Compute new map outputs
const newOutputs = this.mapFunction(doc);
// Remove old entries from view index
for (const {key, value} of oldOutputs) {
this.removeFromViewIndex(key, docId);
this.updateReduce(key, value, 'subtract');
}
// Add new entries to view index
for (const {key, value} of newOutputs) {
this.addToViewIndex(key, docId, value);
this.updateReduce(key, value, 'add');
}
// Update cache
this.mapOutputs.set(docId, newOutputs);
}
async deleteDocument(docId: string) {
const oldOutputs = this.mapOutputs.get(docId);
if (!oldOutputs) return;
// Remove all entries
for (const {key, value} of oldOutputs) {
this.removeFromViewIndex(key, docId);
this.updateReduce(key, value, 'subtract');
}
// Clear cache
this.mapOutputs.delete(docId);
}
private updateReduce(key: any, value: any, operation: 'add' | 'subtract') {
const currentValues = this.viewIndex.get(key) || [];
const newReduceResult = this.reduceFunction(key, currentValues);
this.reduceCache.set(key, newReduceResult);
}
}Reduce Function Requirements:
Critical Limitation: NOT all reduce functions work incrementally!
For incremental updates to work correctly, reduce functions must be:
- Associative:
reduce(a, reduce(b, c)) === reduce(reduce(a, b), c)- order of grouping doesn't matter - Commutative:
reduce(a, b) === reduce(b, a)- order of operands doesn't matter
This allows partial reductions to be combined in any order, which is essential for incremental updates where we subtract old values and add new values.
Reduce Functions That Work Incrementally:
✅ Count: (keys, values) => values.length
// Incremental: count = oldCount - removedCount + newCount✅ Sum: (keys, values) => values.reduce((a, b) => a + b, 0)
// Incremental: sum = oldSum - removedSum + newSum✅ Min/Max: (keys, values) => Math.min(...values) or Math.max(...values)
// Incremental: Need to track all values or recompute, but can be done
// More efficient: Store sorted list and update incrementally✅ Average (with count): Store both sum and count, compute average on query
// Incremental: sum and count updated separately, average computed on-demand
reduce: (keys, values) => ({ sum: values.reduce((a, b) => a + b, 0), count: values.length })
query: (result) => result.sum / result.count✅ Set Union: (keys, values) => [...new Set(values.flat())]
// Incremental: Can add/remove individual values✅ Boolean OR/AND: (keys, values) => values.some(v => v) or values.every(v => v)
// Incremental: Can update based on individual value changesReduce Functions That DON'T Work Incrementally:
❌ Last Value: (keys, values) => values[values.length - 1]
// Problem: Order-dependent, not commutative
// If we remove a value, we don't know which was "last" without full recomputation❌ Median: (keys, values) => { const sorted = values.sort(); return sorted[Math.floor(sorted.length/2)]; }
// Problem: Requires all values to compute, can't incrementally update
// Workaround: Store all values, recompute median on query (defeats purpose)❌ Standard Deviation: Requires mean and all values
// Problem: Can't compute incrementally without storing all values
// Workaround: Store all values, recompute on query❌ Percentiles: Requires sorted list of all values
// Problem: Order-dependent, requires full value set❌ Distinct Count with Details: (keys, values) => ({ count: new Set(values).size, items: [...new Set(values)] })
// Problem: While count is incremental, maintaining the full list of distinct items
// requires tracking all values, which defeats incremental benefitsWhy Some Functions Don't Work:
The fundamental issue is that incremental updates require:
- Subtracting old contributions: When a document changes, we need to remove its old map outputs
- Adding new contributions: Then add the new map outputs
For functions like "last value" or "median", removing a value changes the result in ways that can't be computed without knowing all remaining values. The result isn't just oldResult - removedValue + newValue; it depends on the entire set.
Workarounds for Non-Incremental Reduce Functions:
-
Store All Values: Keep all values in the reduce result, recompute on query
reduce: (keys, values) => values // Store all values query: (result) => computeMedian(result) // Recompute on query
- ✅ Works for any function
- ❌ Defeats storage benefits of reduction
- ❌ Slower queries (full recomputation)
-
Approximate Algorithms: Use algorithms that can be updated incrementally
- Approximate Median: Use streaming algorithms (e.g., T-Digest)
- Approximate Percentiles: Use quantile sketches
- ✅ Incremental updates possible
- ❌ Approximate results (may be acceptable)
-
Hybrid Approach: Cache map outputs, but recompute reduce on query
- Still get efficient updates/deletions (via cached map outputs)
- Accept slower queries (recompute reduce)
- ✅ Works for any reduce function
- ❌ Slower queries
-
Periodic Full Rebuild: For complex reduce functions, periodically rebuild
- Incremental updates for simple cases
- Full rebuild during maintenance window
- ✅ Works for any function
- ❌ Not real-time
Recommendation for MindooDB:
- Support incremental reduce for associative/commutative functions (count, sum, min, max, average)
- Support "store all values" mode for complex functions (median, percentiles) with clear documentation that queries will be slower
- Provide approximate algorithms where possible (e.g., approximate percentiles)
- Allow users to choose between incremental (fast queries) and recompute-on-query (works for any function)
Alternative: Recompute Reduce On-The-Fly
Instead of caching reduce results, you could:
- Cache only map outputs
- Recompute reduce results when queried by aggregating all values for a key
This trades query performance (slower queries) for update performance (faster updates) and storage (less storage). For most use cases, caching reduce results is preferred.
Challenges:
- Need to handle document deletions (remove from index) - solved by caching map outputs
- Map functions must be deterministic - required for consistency
- Reduce functions need to be associative and commutative - required for incremental updates
- Index storage and querying infrastructure required
- Storage overhead for caching map outputs and reduce results
CouchDB's Implementation (Reference):
CouchDB's incremental map/reduce is well-documented and provides a proven reference implementation:
Key Documentation:
- CouchDB Guide: Views: Comprehensive guide to how views work
- CouchDB Documentation: Design Documents: Official documentation on views and reduce functions
- CouchDB Performance Guide: Performance considerations for views
How CouchDB Does It:
-
B-Tree Index Structure: CouchDB stores view indexes as B-trees where:
- Leaf nodes: Contain the actual map outputs (key-value pairs)
- Internal nodes: Contain pre-computed reduce values
- Each node stores reduce results, enabling efficient incremental updates
-
Document Revision Tracking: CouchDB tracks document revisions (sequence numbers) to identify:
- Which documents have changed since last view update
- Which map outputs need to be removed (old document state)
- Which map outputs need to be added (new document state)
-
Incremental Update Process:
- When a document changes, CouchDB:
- Identifies the affected keys in the B-tree (from old map outputs)
- Removes old map outputs from leaf nodes
- Adds new map outputs to leaf nodes
- Recalculates reduce values only for affected internal nodes (bottom-up)
- Updates the B-tree structure as needed
- When a document changes, CouchDB:
-
Reduce Function Requirements: CouchDB requires reduce functions to be:
- Associative:
reduce(a, reduce(b, c)) === reduce(reduce(a, b), c) - Commutative:
reduce(a, b) === reduce(b, a) - This allows partial reductions to be combined in any order
- Associative:
-
Built-in Reduce Functions: CouchDB provides optimized built-in reduce functions:
_count: Count documents_sum: Sum numeric values_stats: Statistical aggregations (min, max, sum, count)- These are implemented in Erlang (not JavaScript) for better performance
Key Insight from CouchDB:
CouchDB doesn't explicitly "cache" map outputs in a separate structure. Instead, the B-tree index itself serves as the cache:
- Map outputs are stored in leaf nodes of the B-tree
- The B-tree is organized by emitted keys
- Document IDs are stored alongside map outputs, allowing efficient lookup
- When a document changes, CouchDB can find and remove old entries by traversing the B-tree
For MindooDB Implementation:
We can learn from CouchDB's approach:
- Use B-tree or similar structure for the view index (organized by keys)
- Store document ID with map outputs to enable efficient removal
- Track document sequence numbers (or use
iterateChangesSince()cursors) to identify changes - Pre-compute reduce values in internal nodes for fast queries
- Require associative/commutative reduce functions for incremental updates
Open Source Options:
- PouchDB: Client-side CouchDB implementation with map/reduce views - uses this incremental approach
- PouchDB's source code is a good reference: GitHub - pouchdb/pouchdb
- Look at the
mapreducemodule for view implementation details
- CouchDB Source Code: GitHub - apache/couchdb
- The view index implementation is in Erlang
- Can study the B-tree structure and update algorithms
- Hoodie: Uses PouchDB, could provide inspiration for view management
- Custom Implementation: Build a lightweight map/reduce engine tailored to MindooDB, inspired by CouchDB's approach
📚 See VirtualView Documentation for comprehensive documentation, API reference, and examples.
Concept: HCL Notes/Domino uses categorized views that group documents into hierarchical categories based on field values. Categories can be nested, and documents appear under their category paths.
Implementation: MindooDB includes a full VirtualView implementation, inspired by and adapted from Karsten Lehmann's Domino JNA project (klehmann/domino-jna). This provides:
- VirtualViewColumn: Define columns for categorization, sorting, display, and totals
- VirtualViewEntryData: Tree nodes representing categories or documents
- VirtualViewNavigator: Hierarchical navigation with expand/collapse and selection
- MindooDBVirtualViewDataProvider: Incremental updates via
iterateChangesSince() - Multi-Database Views: Combine documents from multiple MindooDB instances
- Multi-Tenant Views: Span views across different MindooTenants
- Category Totals: Built-in SUM and AVERAGE aggregations
Quick Example:
import { VirtualViewFactory, ColumnSorting, TotalMode } from "./indexing/virtualviews";
// Create a categorized view with totals
const view = await VirtualViewFactory.createView()
.addCategoryColumn("department", { sorting: ColumnSorting.ASCENDING })
.addCategoryColumn("year", { sorting: ColumnSorting.DESCENDING })
.addSortedColumn("lastName")
.addTotalColumn("salary", TotalMode.SUM)
.withDB("employees", employeeDB)
.buildAndUpdate();
// Navigate the view
const nav = VirtualViewFactory.createNavigator(view).expandAll().build();
for await (const entry of nav.entriesForward()) {
if (entry.isCategory()) {
console.log(`${entry.getCategoryValue()}: Total $${entry.getColumnValue("salary")}`);
}
}Benefits:
- ✅ Intuitive: Natural way to organize documents
- ✅ Efficient: Direct category lookups with sorted navigation
- ✅ Hierarchical: Supports nested categories (use backslash:
"2024\\Q1\\January") - ✅ Multi-Database: Combine documents from multiple sources
- ✅ Incremental: Efficient updates via
iterateChangesSince() - ✅ Totals: Built-in SUM and AVERAGE for category rows
Source:
src/indexing/virtualviews/- Full TypeScript implementation- Inspired by Domino JNA VirtualView
Concept: FlexSearch is a high-performance, memory-efficient full-text search library that supports incremental indexing and real-time search.
Application to MindooDB:
- Index Creation: Create FlexSearch indexes for text fields in documents
- Incremental Updates: Use
iterateChangesSince()to add/update/remove documents from indexes - Multiple Indexes: Support separate indexes for different fields or document types
- Query Interface: Provide search API that queries FlexSearch indexes
Example Use Case:
import { Index } from "flexsearch";
// Create index for document titles and content
const searchIndex = new Index({
tokenize: "forward",
cache: 100
});
// Index a document
function indexDocument(doc: MindooDoc) {
const data = doc.getData();
searchIndex.add(doc.getId(), `${data.title} ${data.content}`);
}
// Search
const results = searchIndex.search("query text");Benefits:
- High Performance: Fast search with low memory footprint
- Incremental: Supports add/update/delete operations
- Flexible: Configurable tokenization, stemming, etc.
- Real-time: Updates immediately queryable
- Open Source: Well-maintained library
Challenges:
- Need to extract text from documents (which fields to index?)
- Handling document deletions
- Multiple language support
- Index persistence (save/load indexes)
Open Source Options:
- FlexSearch: Primary option, actively maintained
- MiniSearch: Alternative lightweight option
- Lunr.js: Client-side full-text search
Concept: Create indexes on specific document fields, similar to traditional database indexes (B-tree, hash indexes, etc.).
Application to MindooDB:
- Index Definition: Define indexes on specific fields (e.g.,
data.email,data.createdAt) - Index Types: Support different index types:
- B-Tree: For sorted queries, ranges, comparisons
- Hash: For exact lookups
- Composite: Multiple fields (e.g.,
[data.category, data.priority])
- Incremental Updates: Update indexes as documents change via
iterateChangesSince()
Example Use Case:
// Define indexes
const indexes = {
email: { type: "hash", field: "email" },
createdAt: { type: "btree", field: "createdAt" },
categoryPriority: { type: "btree", fields: ["category", "priority"] }
};
// Query
const results = indexes.email.get("user@example.com");
const rangeResults = indexes.createdAt.range(startDate, endDate);Benefits:
- Familiar: Well-understood indexing model
- Efficient: Optimized for specific query patterns
- Flexible: Support multiple index types
- Standard: Similar to SQL indexes
Challenges:
- Need to define indexes upfront
- Storage overhead for multiple indexes
- Index maintenance complexity
- Handling schema changes
Open Source Options:
- LevelDB/RocksDB: Key-value stores with range queries
- LMDB: Lightning Memory-Mapped Database
- BTree-JS: JavaScript B-tree implementation
- Sorted Set Libraries: For sorted indexes
Concept: Materialized views are pre-computed query results stored as tables/indexes that are updated when underlying data changes.
Application to MindooDB:
- View Definition: Define views as queries over documents (e.g., "all active tasks grouped by project")
- Incremental Updates: Use
iterateChangesSince()to update views incrementally - Query Interface: Query the materialized view instead of scanning all documents
- Multiple Views: Support multiple views for different query patterns
Example Use Case:
// Define materialized view
const view = {
name: "active-tasks-by-project",
query: (doc: MindooDoc) => {
const data = doc.getData();
return data.status === "active" && data.project;
},
groupBy: "project",
aggregate: {
count: true,
sum: "effort",
avg: "priority"
}
};
// Query view
const results = view.get("ProjectA"); // Returns aggregated data for ProjectABenefits:
- Fast Queries: Pre-computed results
- Complex Aggregations: Support sum, count, average, etc.
- Incremental: Only changed documents need reprocessing
- Flexible: Can define multiple views
Challenges:
- View maintenance overhead
- Storage for materialized results
- Handling deletions and updates
- View refresh strategies
Open Source Options:
- Custom Implementation: Build view engine tailored to MindooDB
- In-Memory Aggregation Libraries: For computing aggregations
Concept: If documents contain relationships (references to other documents), build graph indexes to enable efficient relationship traversal.
Application to MindooDB:
- Relationship Extraction: Extract document references from fields (e.g.,
data.parentId,data.relatedIds) - Graph Structure: Build adjacency lists or graph databases
- Incremental Updates: Update graph as documents change
- Query Interface: Support graph queries (e.g., "find all descendants", "find shortest path")
Example Use Case:
// Extract relationships
function extractRelationships(doc: MindooDoc) {
const data = doc.getData();
return {
docId: doc.getId(),
parentId: data.parentId,
childrenIds: data.childrenIds || [],
relatedIds: data.relatedIds || []
};
}
// Graph queries
const descendants = graphIndex.getDescendants("doc123");
const path = graphIndex.findPath("doc1", "doc2");Benefits:
- Relationship Queries: Efficient traversal of document relationships
- Hierarchical Data: Natural for tree structures
- Complex Queries: Support graph algorithms
Challenges:
- Need to identify relationships in documents
- Graph maintenance complexity
- Storage overhead
- Query language design
Open Source Options:
- Gun.js: Graph database (may be overkill)
- Neo4j: Full graph database (server-side)
- Custom Implementation: Lightweight graph structures
Concept: If documents have temporal aspects, build time-series indexes optimized for time-range queries.
Application to MindooDB:
- Time Extraction: Extract timestamps from documents (
createdAt,lastModified, or custom time fields) - Time-Series Structure: Use specialized data structures (e.g., time-ordered indexes, time buckets)
- Incremental Updates: Append new time points as documents change
- Query Interface: Support time-range queries, aggregations over time
Example Use Case:
// Time-series index
const timeSeries = {
field: "timestamp",
granularity: "hour", // or "day", "month"
aggregations: ["count", "sum", "avg"]
};
// Query
const hourlyData = timeSeries.query({
start: startTime,
end: endTime,
granularity: "hour"
});Benefits:
- Time Queries: Optimized for temporal queries
- Aggregations: Efficient time-based aggregations
- Visualization: Good for charts and analytics
Challenges:
- Need temporal data in documents
- Storage for time-series data
- Aggregation computation
Open Source Options:
- TimescaleDB: PostgreSQL extension (server-side)
- InfluxDB: Time-series database (server-side)
- Custom Implementation: Lightweight time-series structures
Different indexing strategies can be combined:
- Full-Text + Field Indexes: Use FlexSearch for text search, B-tree indexes for structured queries
- Map/Reduce + Categorized Views: Use map/reduce to build categorized views
- Materialized Views + Time-Series: Pre-compute time-based aggregations
- Graph + Full-Text: Index document relationships and content
An index manager could coordinate multiple indexes:
class IndexManager {
private indexes: Map<string, Index>;
async processChanges(cursor: ProcessChangesCursor | null) {
// Process changes once
for await (const { doc, cursor: currentCursor } of db.iterateChangesSince(cursor)) {
// Update all indexes
for (const [name, index] of this.indexes) {
await index.update(doc);
}
cursor = currentCursor;
}
}
}When implementing incremental map/reduce views, there are two main approaches:
Approach 1: Cache Map Values (Recommended)
This is how CouchDB and similar systems work:
- Store map outputs: For each document, cache the
(key, value)pairs emitted by the map function - Index structure: Maintain both:
docId → [map outputs]- tracks what each document emittedkey → [docId, value]- organizes by emitted keys for efficient querying
- Incremental updates: When a document changes:
- Look up cached map outputs for the document
- Remove old
(key, value)pairs from the view index - Run map function on updated document
- Add new
(key, value)pairs to the view index - Update cached map outputs
- Incrementally update reduce results (subtract old, add new)
Benefits:
- ✅ Efficient deletions: Can remove document from view without re-running map function
- ✅ Efficient updates: Only need to diff old vs new map outputs
- ✅ Fast reduce updates: Can incrementally update reduce results
- ✅ Handles document state changes: If document no longer matches map criteria, old entries are cleanly removed
Storage Cost:
- Stores map outputs for every document (even if document is deleted from view)
- Typically acceptable: map outputs are usually small (just keys and values)
Approach 2: Recompute Reduce On-The-Fly
Alternative approach that doesn't cache map outputs:
- No map cache: Don't store what each document emitted
- On update: Re-run map function on document, but don't know what it emitted before
- Problem: Can't efficiently remove old entries without knowing what they were
- Solution: Mark document as "dirty" and rebuild reduce for affected keys
Benefits:
- ✅ Lower storage: No map output cache
- ✅ Simpler: Less state to manage
Drawbacks:
- ❌ Inefficient deletions: Need to rebuild reduce for potentially many keys
- ❌ Can't efficiently remove old entries: Don't know what document emitted before
- ❌ Slower updates: May need to recompute reduce for multiple keys
Hybrid Approach: Cache Map, Recompute Reduce
Middle ground:
- Cache map outputs: Store what each document emitted
- Don't cache reduce results: Recompute reduce when queried
Benefits:
- ✅ Efficient updates/deletions: Can remove old entries using cached map outputs
- ✅ Lower storage: No reduce result cache
- ✅ Always fresh: Reduce results computed from current view state
Drawbacks:
- ❌ Slower queries: Reduce computed on-demand
- ❌ Still need map output cache: So storage savings are limited
For MindooDB, use Approach 1 (Cache Map Values):
- Cache map outputs per document:
Map<docId, Array<{key, value}>> - Maintain view index:
Map<key, Array<{docId, value}>>for efficient key lookups - Cache reduce results:
Map<key, reducedValue>for fast queries - Incremental updates: Use cached map outputs to efficiently add/remove entries
Why this works well with iterateChangesSince():
async function updateView(db: MindooDB, view: MapReduceView, cursor: ProcessChangesCursor | null) {
for await (const { doc, cursor: currentCursor } of db.iterateChangesSince(cursor)) {
cursor = currentCursor;
if (doc.isDeleted()) {
// Efficiently remove using cached map outputs
await view.deleteDocument(doc.getId());
} else {
// Efficiently update using cached map outputs
await view.updateDocument(doc);
}
}
}The cached map outputs enable efficient incremental updates because:
- We know exactly what to remove (old map outputs)
- We know exactly what to add (new map outputs)
- We can incrementally update reduce results (subtract old, add new)
- No need to scan or recompute anything
Indexes need to be persisted to survive restarts:
- File-Based: Save indexes to disk (JSON, binary formats)
- Database-Backed: Store indexes in a separate database (LevelDB, SQLite)
- Incremental Save: Save only changes, not full index
- Checkpointing: Periodic full saves for recovery
When document schemas or index definitions change:
- Version Indexes: Track index versions
- Migration: Support index migration/rebuild
- Backward Compatibility: Handle old index formats
- Batch Updates: Process multiple documents before updating indexes
- Lazy Indexing: Defer index updates for non-critical indexes
- Index Partitioning: Split large indexes into smaller ones
- Caching: Cache frequently accessed index data
- Partial Failures: Handle errors for individual documents without stopping entire index update
- Recovery: Ability to rebuild indexes from scratch if corrupted
- Validation: Verify index consistency
- Encryption: Indexes may contain sensitive data (consider encryption at rest)
- Access Control: Who can create/modify indexes?
- Audit Trail: Track index changes for compliance
- FlexSearch: High-performance, incremental full-text search
- MiniSearch: Lightweight alternative
- Lunr.js: Client-side full-text search
- LevelDB: Key-value store with range queries
- RocksDB: LevelDB fork with additional features
- LMDB: Lightning Memory-Mapped Database
- SQLite: Embedded SQL database for index storage
- PouchDB: Client-side CouchDB with map/reduce views
- Custom Implementation: Lightweight map/reduce engine
- BTree-JS: B-tree implementation
- Trie Libraries: For category hierarchies
- Sorted Set Libraries: For sorted indexes
- Gun.js: Graph database (may be overkill for simple relationships)
- Custom Implementation: Lightweight graph structures
Based on the analysis, a hybrid approach combining multiple strategies is recommended:
- FlexSearch for Full-Text Search: Primary choice for text search capabilities
- Map/Reduce for Flexible Indexing: Allow users to define custom indexes via map functions
- Field-Based Indexes for Common Queries: B-tree indexes for structured queries (dates, numbers, etc.)
- Index Manager: Orchestrate multiple indexes, handle persistence, incremental updates
Phase 1: Foundation
- Build index manager infrastructure
- Implement FlexSearch integration
- Support incremental updates via
iterateChangesSince() - Basic index persistence
Phase 2: Map/Reduce
- Implement map/reduce engine
- Support user-defined map functions
- Basic reduce functions (count, sum)
Phase 3: Field Indexes
- B-tree indexes for common field types
- Composite indexes
- Range queries
Phase 4: Advanced Features
- Categorized views
- Materialized views
- Graph indexes (if needed)
MindooDB's iterateChangesSince() mechanism provides an excellent foundation for building dynamic, incremental indexes. The recommended approach is to:
- Start with FlexSearch for full-text search (proven, incremental, performant)
- Add map/reduce capabilities for flexible, user-defined indexes
- Complement with field-based indexes for structured queries
- Use an index manager to orchestrate multiple indexes
This hybrid approach provides the flexibility of map/reduce, the performance of specialized indexes, and the search capabilities of full-text indexing, all while leveraging MindooDB's incremental change processing.