Universal hierarchical chunk database with AI-powered search agents.
Early stage — functional but barely tested. The core DB, search engine, and agent pipeline work end-to-end. Tested with LFM2-350M on vLLM. Fine-tuning will make the agent layer production-grade.
Store anything — movies, papers, codebases, products — as hierarchical nodes with multi-modal embeddings, AI analyses, and cross-references. Query with keyword search, vector similarity, or let AI agents explore the data for you.
# Build
cargo build --release
# Run (no embeddings, no LLM — pure search mode)
./target/release/formica-core
# Run with LLM agent (vLLM/Ollama)
./target/release/formica-core --llm-url http://localhost:8000/v1 --llm-model your-model
# Run with local embeddings (fastembed)
./target/release/formica-core --embed-pool 2Cortex listens on http://localhost:8200 by default.
Without LLM — pure search engine:
You call /api/search, /api/query, /api/aggregate directly.
The DB does keyword matching, vector similarity, filtering, stats.
No model needed. Sub-millisecond responses.
With small LLM (350M-3B) — strategy-based agent:
1. Question comes in (/api/ask)
2. Heuristic classifies: is this a search? aggregate? comparison? deep read?
3. Deterministic tool chain runs (search → read → get analyses → aggregate)
4. LLM gets the gathered data + question → writes a 2-4 sentence answer
5. The model never picks tools — it only summarizes data (its strong suit)
With small LLM + orchestration — parallel decomposition:
1. Big question comes in (/api/pipeline)
2. LLM splits into 3-4 sub-questions
3. Each sub-question runs step 2-4 above IN PARALLEL
4. LLM checks: are there gaps? → spawns more searches
5. Tournament merge: pairs of results merged → merged → final report
With big LLM (7B+, Claude, GPT) — full agentic:
The big model gets 16 tools and picks what to call freely.
It can chain search → read_node → get_analyses → find_similar → respond.
Works through the standard /api/ask endpoint with tool_choice=required.
All modes coexist. Switch per request with "no_llm": true or "model": "big-model".
┌─────────────────────────────────────────────────────────┐
│ Cortex Server │
│ │
│ ┌─────────┐ ┌──────────┐ ┌────────────┐ │
│ │ Nodes │ │ Analyses │ │ Embeddings │ │
│ │ (DashMap)│ │ (DashMap) │ │ (VectorStore│ │
│ │ │ │ │ │ SIMD flat) │ │
│ └────┬────┘ └────┬─────┘ └─────┬──────┘ │
│ │ │ │ │
│ ┌────┴────────────┴──────────────┴──────┐ │
│ │ Search Engine │ │
│ │ Keyword (inverted index) │ │
│ │ + Vector (SIMD dot product) │ │
│ │ + RRF fusion │ │
│ │ + Domain-partitioned pre-filtering │ │
│ └───────────────┬───────────────────────┘ │
│ │ │
│ ┌───────────────┴───────────────────────┐ │
│ │ Agent Layer │ │
│ │ Strategy agent (small models) │ │
│ │ Full agentic (big models) │ │
│ │ Orchestrator (tree decomposition) │ │
│ │ Pipeline (recursive gather+merge) │ │
│ └───────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Everything is a Node in a tree:
Movie Paper Codebase
├── Act 1 ├── Abstract ├── src/
│ ├── Scene 1 │ └── chunk │ ├── auth/
│ │ ├── Shot 1 ├── Methods │ │ ├── login.ts
│ │ └── Dialogue 1 │ └── Equation 1 │ │ └── middleware.ts
│ └── Scene 2 └── Results │ └── api/
└── Act 2 └── tests/
Each node can have:
- Multiple embeddings (semantic, visual, audio, code — any modality)
- AI analyses (mood, sentiment, complexity, topics)
- Edges to other nodes (cites, implements, similar_to)
- Tags, metadata, temporal ranges (t_start/t_end for media)
| Endpoint | Method | Description |
|---|---|---|
/api/search |
POST | Hybrid keyword + vector search |
/api/query |
POST | Filter by kind, domain, tags, time, metadata |
/api/count |
POST | Count matching nodes |
/api/aggregate |
POST | Stats across analyses (avg, min, max, distribution) |
/api/distinct |
POST | Unique values for a metadata key |
/api/nodes |
POST | Ingest hierarchical data |
/api/nodes/{id} |
GET | Read a node |
/api/nodes/{id}/children |
GET | Direct children |
/api/nodes/{id}/subtree |
GET | Full tree below |
/api/nodes/{id}/ancestors |
GET | Path to root |
/api/nodes/{id}/siblings |
GET | Same-level neighbors |
/api/nodes/{id}/similar |
POST | Vector similarity |
/api/nodes/{id}/analyses |
GET | AI analyses |
/api/nodes/{id}/edges |
GET | Cross-references |
/api/edges |
POST | Add edge between nodes |
/api/analyses |
POST | Add analysis to node |
/api/embeddings |
POST | Store embedding vector |
/api/embeddings/batch |
POST | Batch store embeddings |
/api/schema |
GET | Database structure overview |
/api/tools |
GET | Agent tool definitions (JSON Schema) |
/api/config |
GET | Server config + all endpoints |
/api/models |
GET | Available LLM models |
/api/stats |
GET | Database statistics |
| Endpoint | Method | Description |
|---|---|---|
/api/ask |
POST | Ask a question — agent searches and synthesizes |
/api/orchestrate |
POST | Tree decomposition (quick/medium/deep/hyper) |
/api/pipeline |
POST | Full recursive: decompose → gather → gap fill → merge |
| Endpoint | Method | Description |
|---|---|---|
/api/tasks |
POST | Create task for agent swarm |
/api/tasks/pending |
GET | Get pending tasks |
/api/tasks/{id}/claim |
POST | Claim a task |
/api/tasks/{id}/complete |
POST | Complete with result |
/api/tasks/{id}/tool-call |
POST | Record tool usage |
Every AI endpoint accepts optional overrides:
# Switch model per request
curl /api/ask -d '{"question": "...", "model": "qwen3.5-9b"}'
# Disable LLM (pure deterministic)
curl /api/ask -d '{"question": "...", "no_llm": true}'
# Force strategy
curl /api/ask -d '{"question": "...", "strategy": "aggregate"}'
# Different LLM server
curl /api/ask -d '{"question": "...", "llm_url": "http://other:8000/v1"}'
# Pipeline with raw mode (no synthesis)
curl /api/pipeline -d '{"question": "...", "raw_mode": true}'
# Orchestrate with hyper intensity
curl /api/orchestrate -d '{"question": "...", "intensity": "hyper"}'The /api/ask endpoint uses 6 strategies with heuristic routing:
| Strategy | When | What it does |
|---|---|---|
keyword_search |
Default | Search → read top results → get analyses |
filter_query |
"list all", "every" | Filter by exact criteria → get analyses |
aggregate |
"most", "highest", "average" | Compute stats → rank items |
explore_tree |
"structure", "parts of" | Navigate hierarchy → show children |
compare |
"compare", "vs" | Search each item → aggregate per item |
deep_read |
"what happens", "describe" | Full detail: content, siblings, analyses, edges |
The /api/pipeline endpoint runs a recursive analysis:
- Decompose: LLM splits question into independent sub-questions
- Gather: Parallel leaf agents search for each sub-question
- Evaluate: LLM identifies gaps in evidence → spawns more searches
- Merge: Tournament-style pairwise merge into final report
Options: raw_mode (skip synthesis), skip_gap_fill, merge_style (tournament/direct).
| Mode | Depth | Max Tasks | Tool Calls/Leaf | Use Case |
|---|---|---|---|---|
quick |
0 | 1 | 8 | Simple fact lookup |
medium |
1 | 15 | 20 | Most questions |
deep |
3 | 50 | 35 | Thorough analysis |
hyper |
4 | 200 | 50 | Exhaustive research |
- Pool-based concurrent fastembed (multilingual-e5-small, 384-dim)
- Background auto-embedding of new nodes
- External API backend support (OpenAI, Ollama, custom)
- Batch store endpoint for any modality (image CLIP, audio Whisper, code embeddings)
- SIMD-optimized vector search (50-100M comparisons/sec)
# Start cortex
./target/release/formica-core --llm-url http://localhost:8000/v1 --llm-model your-model
# Run test suite (seeds data + tests all endpoints)
python tests/test_all.py--listen Listen address (default: 0.0.0.0:8200)
--data-dir Snapshot directory (default: ./formica-data)
--snapshot-interval Seconds between snapshots (default: 300, 0=disabled)
--restore Load snapshot on startup
--embed-pool Embedding pool size (default: 2, 0=disabled)
--embed-batch Embedding batch size (default: 64)
--embed-api-url External embedding API URL
--llm-url LLM endpoint (vLLM/Ollama, OpenAI-compatible)
--llm-model LLM model name
Apache-2.0