Local-first SurrealDB + Node.js data intelligence app for large file collections.
Use it to turn mixed documents/spreadsheets/logs into durable structured memory that AI can query for patterns, correlations, and probability-style insights.
Current AI provider path: OpenRouter (single endpoint, multi-model routing for easy model comparisons). More providers (OpenAI/Anthropic/Together/local) are planned.
OpenCLAW operator skill bundle is included under openclaw/ for agent-driven operation of this project.
When fully running, you can:
- Create a cache (dataset workspace)
- Upload files (or load sample fixture)
- Generate data model + indexing strategy plans (domain extraction mapping + lexicon + table intents)
- Index into SurrealDB
- Ask in either:
- Surreal only (no LLM cost)
- Surreal + AI (OpenRouter)
npm install+npm startonhttp://localhost:3000- Cache CRUD (create/list)
- File upload queue and support preview
- Supported extraction now:
.txt .md .csv .tsv .json .eml .sql .pdf .docx .xlsx .xls .doc .epub - Unknown extensions: best-effort raw text fallback extraction
- Surreal indexing endpoint
- Index logs + readiness state
- Real index progress endpoint with ETA estimates (
/api/index-progress/:cacheId) - Load Existing Index path to reuse prior index without re-scanning files
- Query endpoint with two modes:
- Surreal search mode (term-driven retrieval)
- Surreal retrieval + OpenRouter synthesis
- Query strategy controls (preset + custom notes)
- Convert natural-language questions to SurrealQL against current cache schema
- View current cache schema (table fields) in-app
- Index strategy controls (preset + custom notes)
- Data model + indexing strategy plan generation (main-topic/comprehensive/all-inclusive) with extraction mapping + lexicon + suppressions + priority relationships
- AI request preview for planning and outbound provider payload visibility
- Chat persistence per cache in
chat-logs/<cache-id>/<chat-id>.json - Continue conversations with context from prior turns in each chat
- OpenRouter key/model local storage + env-credential mode
- OpenRouter ping health check (green/red)
- Query cookbook UI (vector / relationship / time-window / anomaly patterns)
- SurrealQL conversion output panel + copy button
- Schema viewer + indexed-table export download
- OpenCLAW configuration export button + bundled operator skill docs (
openclaw/) - Manifest persistence:
indexes/<cache-id>/manifest.jsonindexes/<cache-id>/snapshots/<timestamp>.json
- Sample fixture for flow testing:
fixtures/sample-case-500w.txt
This project now uses an investigation-first indexing strategy by default:
- Extract text from each file
- Store document metadata
- Chunk document text for retrieval
- Derive structured investigation signals per chunk:
- entities (people, orgs, emails)
- events (e.g., money/transfer-style amounts)
- activities (actions people took: met/called/sent/requested/etc.)
- intent signals (request, urgency, concealment, authorization-style language)
- location/time context (captured where detectable from text patterns)
- anomalies (concealment, threshold splitting, integrity mismatch indicators)
- relations (co-occurrence links between entities)
- At query time, combine:
- chunk retrieval
- structured table retrieval
- optional AI synthesis grounded in Surreal evidence
This is the current baseline strategy and will be iterated over time (better extraction quality, richer graph logic, stronger anomaly detection).
- Node 20+
- SurrealDB server running locally (or remote)
./scripts-start-surreal.shAlternative direct command:
surreal start --user root --pass root --bind 127.0.0.1:8000 file:./data/surreal.dbIf you see datastore load errors on Mac, run this exact sequence:
cd /path/to/surreal-investigate
mkdir -p data
pkill -f "surreal start" || true
surreal start --user root --pass root --bind 127.0.0.1:8000 "surrealkv://$(pwd)/data/surreal.db"Why this works:
- uses an absolute path via
$(pwd) - ensures the
data/folder exists - clears stale Surreal processes
- uses
surrealkv://engine explicitly
If your
surrealbinary is in~/.local/bin/surreal, use that full path.
npm install
cp .env.example .env # optional but recommended
npm startOpen: http://localhost:3000
PORT=3000
SURREAL_URL=ws://127.0.0.1:8000/rpc
SURREAL_NS=surreal_investigate
SURREAL_DB=main
SURREAL_USER=root
SURREAL_PASS=root
INDEX_JOB_TIMEOUT_MS=1200000
AI_PROVIDER_URL=https://openrouter.ai/api/v1/chat/completions
AI_API_KEY=sk-or-v1-...
AI_MODEL=qwen/qwen3-32bINDEX_JOB_TIMEOUT_MS defaults to 20 minutes.
AI_PROVIDER_URL/AI_API_KEY/AI_MODEL power secure .env credential loading in the UI.
- Create cache (e.g.
harbor-case) - Click Load Sample File
- Click Create / Refresh Index
- Confirm UI shows
Ready for questions ✅ - Ask in Surreal only mode:
who moved money and through which entities?
- (Optional) set OpenRouter key/model and click Ping
- Switch to Surreal + AI mode and ask same question.
GET /api/healthGET /api/configGET /api/credentials/env-loadGET /api/cachesPOST /api/cachesDELETE /api/cache-file/:cacheId/:fileIdGET /api/chats/:cacheIdPOST /api/chats/:cacheIdGET /api/chats/:cacheId/:chatIdPOST /api/uploadPOST /api/cache-quick-summary/:cacheIdPOST /api/index/feature-plans/:cacheIdGET /api/index-recommendation/:cacheIdPOST /api/index/:cacheIdGET /api/index-progress/:cacheIdGET /api/index-manifest/:cacheIdGET /api/index-data-export/:cacheIdPOST /api/index-diagnose/:cacheIdGET /api/schema/:cacheIdPOST /api/convert-surrealql/:cacheIdGET /api/openclaw-skill/:cacheIdGET /api/index-profile/:cacheIdPOST /api/index-profile/:cacheIdDELETE /api/index-profile/:cacheIdPOST /api/query/:cacheIdPOST /api/openrouter/ping
surreal-investigate/
lib/
surreal.js
extract.js
investigate.js
fixtures/
sample-case-500w.txt
public/
index.html
app.js
styles.css
uploads/
indexes/
data/
chat-logs/
server.js- Recipe-specific relationship builders by dataset class (narrative/report/structured/geo/communications)
- Strong extraction-quality gates before "ready for questions"
- Read-only execution mode for generated SurrealQL
- Query timeline visualization and relationship map
- Additional provider adapters (OpenAI/Anthropic/Together/local)
