Formica

Universal hierarchical chunk database with AI-powered search agents.

Early stage — functional but barely tested. The core DB, search engine, and agent pipeline work end-to-end. Tested with LFM2-350M on vLLM. Fine-tuning will make the agent layer production-grade.

Store anything — movies, papers, codebases, products — as hierarchical nodes with multi-modal embeddings, AI analyses, and cross-references. Query with keyword search, vector similarity, or let AI agents explore the data for you.

Quick Start

# Build
cargo build --release

# Run (no embeddings, no LLM — pure search mode)
./target/release/formica-core

# Run with LLM agent (vLLM/Ollama)
./target/release/formica-core --llm-url http://localhost:8000/v1 --llm-model your-model

# Run with local embeddings (fastembed)
./target/release/formica-core --embed-pool 2

Cortex listens on http://localhost:8200 by default.

How It Works

Without LLM — pure search engine:

You call /api/search, /api/query, /api/aggregate directly.
The DB does keyword matching, vector similarity, filtering, stats.
No model needed. Sub-millisecond responses.

With small LLM (350M-3B) — strategy-based agent:

1. Question comes in (/api/ask)
2. Heuristic classifies: is this a search? aggregate? comparison? deep read?
3. Deterministic tool chain runs (search → read → get analyses → aggregate)
4. LLM gets the gathered data + question → writes a 2-4 sentence answer
5. The model never picks tools — it only summarizes data (its strong suit)

With small LLM + orchestration — parallel decomposition:

1. Big question comes in (/api/pipeline)
2. LLM splits into 3-4 sub-questions
3. Each sub-question runs step 2-4 above IN PARALLEL
4. LLM checks: are there gaps? → spawns more searches
5. Tournament merge: pairs of results merged → merged → final report

With big LLM (7B+, Claude, GPT) — full agentic:

The big model gets 16 tools and picks what to call freely.
It can chain search → read_node → get_analyses → find_similar → respond.
Works through the standard /api/ask endpoint with tool_choice=required.

All modes coexist. Switch per request with "no_llm": true or "model": "big-model".

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Cortex Server                       │
│                                                         │
│  ┌─────────┐  ┌──────────┐  ┌────────────┐            │
│  │ Nodes   │  │ Analyses │  │ Embeddings │            │
│  │ (DashMap)│  │ (DashMap) │  │ (VectorStore│            │
│  │         │  │          │  │  SIMD flat) │            │
│  └────┬────┘  └────┬─────┘  └─────┬──────┘            │
│       │            │              │                     │
│  ┌────┴────────────┴──────────────┴──────┐             │
│  │           Search Engine               │             │
│  │  Keyword (inverted index)             │             │
│  │  + Vector (SIMD dot product)          │             │
│  │  + RRF fusion                         │             │
│  │  + Domain-partitioned pre-filtering   │             │
│  └───────────────┬───────────────────────┘             │
│                  │                                      │
│  ┌───────────────┴───────────────────────┐             │
│  │           Agent Layer                 │             │
│  │  Strategy agent (small models)        │             │
│  │  Full agentic (big models)            │             │
│  │  Orchestrator (tree decomposition)    │             │
│  │  Pipeline (recursive gather+merge)    │             │
│  └───────────────────────────────────────┘             │
└─────────────────────────────────────────────────────────┘

Data Model

Everything is a Node in a tree:

Movie                          Paper                       Codebase
├── Act 1                      ├── Abstract                ├── src/
│   ├── Scene 1                │   └── chunk               │   ├── auth/
│   │   ├── Shot 1             ├── Methods                 │   │   ├── login.ts
│   │   └── Dialogue 1         │   └── Equation 1          │   │   └── middleware.ts
│   └── Scene 2                └── Results                 │   └── api/
└── Act 2                                                  └── tests/

Each node can have:

Multiple embeddings (semantic, visual, audio, code — any modality)
AI analyses (mood, sentiment, complexity, topics)
Edges to other nodes (cites, implements, similar_to)
Tags, metadata, temporal ranges (t_start/t_end for media)

API Endpoints

Pure Search (no LLM needed)

Endpoint	Method	Description
`/api/search`	POST	Hybrid keyword + vector search
`/api/query`	POST	Filter by kind, domain, tags, time, metadata
`/api/count`	POST	Count matching nodes
`/api/aggregate`	POST	Stats across analyses (avg, min, max, distribution)
`/api/distinct`	POST	Unique values for a metadata key
`/api/nodes`	POST	Ingest hierarchical data
`/api/nodes/{id}`	GET	Read a node
`/api/nodes/{id}/children`	GET	Direct children
`/api/nodes/{id}/subtree`	GET	Full tree below
`/api/nodes/{id}/ancestors`	GET	Path to root
`/api/nodes/{id}/siblings`	GET	Same-level neighbors
`/api/nodes/{id}/similar`	POST	Vector similarity
`/api/nodes/{id}/analyses`	GET	AI analyses
`/api/nodes/{id}/edges`	GET	Cross-references
`/api/edges`	POST	Add edge between nodes
`/api/analyses`	POST	Add analysis to node
`/api/embeddings`	POST	Store embedding vector
`/api/embeddings/batch`	POST	Batch store embeddings
`/api/schema`	GET	Database structure overview
`/api/tools`	GET	Agent tool definitions (JSON Schema)
`/api/config`	GET	Server config + all endpoints
`/api/models`	GET	Available LLM models
`/api/stats`	GET	Database statistics

AI-Powered (requires LLM)

Endpoint	Method	Description
`/api/ask`	POST	Ask a question — agent searches and synthesizes
`/api/orchestrate`	POST	Tree decomposition (quick/medium/deep/hyper)
`/api/pipeline`	POST	Full recursive: decompose → gather → gap fill → merge

Agent Swarm (external workers)

Endpoint	Method	Description
`/api/tasks`	POST	Create task for agent swarm
`/api/tasks/pending`	GET	Get pending tasks
`/api/tasks/{id}/claim`	POST	Claim a task
`/api/tasks/{id}/complete`	POST	Complete with result
`/api/tasks/{id}/tool-call`	POST	Record tool usage

Per-Request Configuration

Every AI endpoint accepts optional overrides:

# Switch model per request
curl /api/ask -d '{"question": "...", "model": "qwen3.5-9b"}'

# Disable LLM (pure deterministic)
curl /api/ask -d '{"question": "...", "no_llm": true}'

# Force strategy
curl /api/ask -d '{"question": "...", "strategy": "aggregate"}'

# Different LLM server
curl /api/ask -d '{"question": "...", "llm_url": "http://other:8000/v1"}'

# Pipeline with raw mode (no synthesis)
curl /api/pipeline -d '{"question": "...", "raw_mode": true}'

# Orchestrate with hyper intensity
curl /api/orchestrate -d '{"question": "...", "intensity": "hyper"}'

Agent Strategies

The /api/ask endpoint uses 6 strategies with heuristic routing:

Strategy	When	What it does
`keyword_search`	Default	Search → read top results → get analyses
`filter_query`	"list all", "every"	Filter by exact criteria → get analyses
`aggregate`	"most", "highest", "average"	Compute stats → rank items
`explore_tree`	"structure", "parts of"	Navigate hierarchy → show children
`compare`	"compare", "vs"	Search each item → aggregate per item
`deep_read`	"what happens", "describe"	Full detail: content, siblings, analyses, edges

Pipeline Layers

The /api/pipeline endpoint runs a recursive analysis:

Decompose: LLM splits question into independent sub-questions
Gather: Parallel leaf agents search for each sub-question
Evaluate: LLM identifies gaps in evidence → spawns more searches
Merge: Tournament-style pairwise merge into final report

Options: raw_mode (skip synthesis), skip_gap_fill, merge_style (tournament/direct).

Search Intensity Modes

Mode	Depth	Max Tasks	Tool Calls/Leaf	Use Case
`quick`	0	1	8	Simple fact lookup
`medium`	1	15	20	Most questions
`deep`	3	50	35	Thorough analysis
`hyper`	4	200	50	Exhaustive research

Embedding Pipeline

Pool-based concurrent fastembed (multilingual-e5-small, 384-dim)
Background auto-embedding of new nodes
External API backend support (OpenAI, Ollama, custom)
Batch store endpoint for any modality (image CLIP, audio Whisper, code embeddings)
SIMD-optimized vector search (50-100M comparisons/sec)

Testing

# Start cortex
./target/release/formica-core --llm-url http://localhost:8000/v1 --llm-model your-model

# Run test suite (seeds data + tests all endpoints)
python tests/test_all.py

CLI Options

--listen          Listen address (default: 0.0.0.0:8200)
--data-dir        Snapshot directory (default: ./formica-data)
--snapshot-interval  Seconds between snapshots (default: 300, 0=disabled)
--restore         Load snapshot on startup
--embed-pool      Embedding pool size (default: 2, 0=disabled)
--embed-batch     Embedding batch size (default: 64)
--embed-api-url   External embedding API URL
--llm-url         LLM endpoint (vLLM/Ollama, OpenAI-compatible)
--llm-model       LLM model name

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
crates		crates
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Formica

Quick Start

How It Works

Architecture

Data Model

API Endpoints

Pure Search (no LLM needed)

AI-Powered (requires LLM)

Agent Swarm (external workers)

Per-Request Configuration

Agent Strategies

Pipeline Layers

Search Intensity Modes

Embedding Pipeline

Testing

CLI Options

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Formica

Quick Start

How It Works

Architecture

Data Model

API Endpoints

Pure Search (no LLM needed)

AI-Powered (requires LLM)

Agent Swarm (external workers)

Per-Request Configuration

Agent Strategies

Pipeline Layers

Search Intensity Modes

Embedding Pipeline

Testing

CLI Options

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages