Surreal Investigate

Local-first SurrealDB + Node.js data intelligence app for large file collections.

Use it to turn mixed documents/spreadsheets/logs into durable structured memory that AI can query for patterns, correlations, and probability-style insights.

Current AI provider path: OpenRouter (single endpoint, multi-model routing for easy model comparisons). More providers (OpenAI/Anthropic/Together/local) are planned.

OpenCLAW operator skill bundle is included under openclaw/ for agent-driven operation of this project.

When fully running, you can:

Create a cache (dataset workspace)
Upload files (or load sample fixture)
Generate data model + indexing strategy plans (domain extraction mapping + lexicon + table intents)
Index into SurrealDB
Ask in either:
- Surreal only (no LLM cost)
- Surreal + AI (OpenRouter)

Current implemented features

npm install + npm start on http://localhost:3000
Cache CRUD (create/list)
File upload queue and support preview
Supported extraction now: .txt .md .csv .tsv .json .eml .sql .pdf .docx .xlsx .xls .doc .epub
Unknown extensions: best-effort raw text fallback extraction
Surreal indexing endpoint
Index logs + readiness state
Real index progress endpoint with ETA estimates (/api/index-progress/:cacheId)
Load Existing Index path to reuse prior index without re-scanning files
Query endpoint with two modes:
- Surreal search mode (term-driven retrieval)
- Surreal retrieval + OpenRouter synthesis
Query strategy controls (preset + custom notes)
Convert natural-language questions to SurrealQL against current cache schema
View current cache schema (table fields) in-app
Index strategy controls (preset + custom notes)
Data model + indexing strategy plan generation (main-topic/comprehensive/all-inclusive) with extraction mapping + lexicon + suppressions + priority relationships
AI request preview for planning and outbound provider payload visibility
Chat persistence per cache in chat-logs/<cache-id>/<chat-id>.json
Continue conversations with context from prior turns in each chat
OpenRouter key/model local storage + env-credential mode
OpenRouter ping health check (green/red)
Query cookbook UI (vector / relationship / time-window / anomaly patterns)
SurrealQL conversion output panel + copy button
Schema viewer + indexed-table export download
OpenCLAW configuration export button + bundled operator skill docs (openclaw/)
Manifest persistence:
- indexes/<cache-id>/manifest.json
- indexes/<cache-id>/snapshots/<timestamp>.json
Sample fixture for flow testing: fixtures/sample-case-500w.txt

Current working investigation strategy (default)

This project now uses an investigation-first indexing strategy by default:

Extract text from each file
Store document metadata
Chunk document text for retrieval
Derive structured investigation signals per chunk:
- entities (people, orgs, emails)
- events (e.g., money/transfer-style amounts)
- activities (actions people took: met/called/sent/requested/etc.)
- intent signals (request, urgency, concealment, authorization-style language)
- location/time context (captured where detectable from text patterns)
- anomalies (concealment, threshold splitting, integrity mismatch indicators)
- relations (co-occurrence links between entities)
At query time, combine:
- chunk retrieval
- structured table retrieval
- optional AI synthesis grounded in Surreal evidence

This is the current baseline strategy and will be iterated over time (better extraction quality, richer graph logic, stronger anomaly detection).

Prerequisites

Node 20+
SurrealDB server running locally (or remote)

Start Surreal locally (example)

./scripts-start-surreal.sh

Alternative direct command:

surreal start --user root --pass root --bind 127.0.0.1:8000 file:./data/surreal.db

macOS fallback (recommended if datastore init fails)

If you see datastore load errors on Mac, run this exact sequence:

cd /path/to/surreal-investigate
mkdir -p data
pkill -f "surreal start" || true
surreal start --user root --pass root --bind 127.0.0.1:8000 "surrealkv://$(pwd)/data/surreal.db"

Why this works:

uses an absolute path via $(pwd)
ensures the data/ folder exists
clears stale Surreal processes
uses surrealkv:// engine explicitly

If your surreal binary is in ~/.local/bin/surreal, use that full path.

Run

npm install
cp .env.example .env   # optional but recommended
npm start

Open: http://localhost:3000

Environment variables

PORT=3000
SURREAL_URL=ws://127.0.0.1:8000/rpc
SURREAL_NS=surreal_investigate
SURREAL_DB=main
SURREAL_USER=root
SURREAL_PASS=root
INDEX_JOB_TIMEOUT_MS=1200000
AI_PROVIDER_URL=https://openrouter.ai/api/v1/chat/completions
AI_API_KEY=sk-or-v1-...
AI_MODEL=qwen/qwen3-32b

INDEX_JOB_TIMEOUT_MS defaults to 20 minutes. AI_PROVIDER_URL/AI_API_KEY/AI_MODEL power secure .env credential loading in the UI.

E2E test flow (manual)

Create cache (e.g. harbor-case)
Click Load Sample File
Click Create / Refresh Index
Confirm UI shows Ready for questions ✅
Ask in Surreal only mode:
- who moved money and through which entities?
(Optional) set OpenRouter key/model and click Ping
Switch to Surreal + AI mode and ask same question.

API endpoints

GET /api/health
GET /api/config
GET /api/credentials/env-load
GET /api/caches
POST /api/caches
DELETE /api/cache-file/:cacheId/:fileId
GET /api/chats/:cacheId
POST /api/chats/:cacheId
GET /api/chats/:cacheId/:chatId
POST /api/upload
POST /api/cache-quick-summary/:cacheId
POST /api/index/feature-plans/:cacheId
GET /api/index-recommendation/:cacheId
POST /api/index/:cacheId
GET /api/index-progress/:cacheId
GET /api/index-manifest/:cacheId
GET /api/index-data-export/:cacheId
POST /api/index-diagnose/:cacheId
GET /api/schema/:cacheId
POST /api/convert-surrealql/:cacheId
GET /api/openclaw-skill/:cacheId
GET /api/index-profile/:cacheId
POST /api/index-profile/:cacheId
DELETE /api/index-profile/:cacheId
POST /api/query/:cacheId
POST /api/openrouter/ping

Project structure

surreal-investigate/
  lib/
    surreal.js
    extract.js
    investigate.js
  fixtures/
    sample-case-500w.txt
  public/
    index.html
    app.js
    styles.css
  uploads/
  indexes/
  data/
  chat-logs/
  server.js

Next planned upgrades

Recipe-specific relationship builders by dataset class (narrative/report/structured/geo/communications)
Strong extraction-quality gates before "ready for questions"
Read-only execution mode for generated SurrealQL
Query timeline visualization and relationship map
Additional provider adapters (OpenAI/Anthropic/Together/local)

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
fixtures		fixtures
images		images
lib		lib
openclaw		openclaw
public		public
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
scripts-start-surreal.sh		scripts-start-surreal.sh
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Surreal Investigate

Current implemented features

Current working investigation strategy (default)

Prerequisites

Start Surreal locally (example)

macOS fallback (recommended if datastore init fails)

Run

Environment variables

E2E test flow (manual)

API endpoints

Project structure

Next planned upgrades

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Surreal Investigate

Current implemented features

Current working investigation strategy (default)

Prerequisites

Start Surreal locally (example)

macOS fallback (recommended if datastore init fails)

Run

Environment variables

E2E test flow (manual)

API endpoints

Project structure

Next planned upgrades

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages