Skip to content

NodeNestor/Formica

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Formica

Universal hierarchical chunk database with AI-powered search agents.

Early stage — functional but barely tested. The core DB, search engine, and agent pipeline work end-to-end. Tested with LFM2-350M on vLLM. Fine-tuning will make the agent layer production-grade.

Store anything — movies, papers, codebases, products — as hierarchical nodes with multi-modal embeddings, AI analyses, and cross-references. Query with keyword search, vector similarity, or let AI agents explore the data for you.

Quick Start

# Build
cargo build --release

# Run (no embeddings, no LLM — pure search mode)
./target/release/formica-core

# Run with LLM agent (vLLM/Ollama)
./target/release/formica-core --llm-url http://localhost:8000/v1 --llm-model your-model

# Run with local embeddings (fastembed)
./target/release/formica-core --embed-pool 2

Cortex listens on http://localhost:8200 by default.

How It Works

Without LLM — pure search engine:

You call /api/search, /api/query, /api/aggregate directly.
The DB does keyword matching, vector similarity, filtering, stats.
No model needed. Sub-millisecond responses.

With small LLM (350M-3B) — strategy-based agent:

1. Question comes in (/api/ask)
2. Heuristic classifies: is this a search? aggregate? comparison? deep read?
3. Deterministic tool chain runs (search → read → get analyses → aggregate)
4. LLM gets the gathered data + question → writes a 2-4 sentence answer
5. The model never picks tools — it only summarizes data (its strong suit)

With small LLM + orchestration — parallel decomposition:

1. Big question comes in (/api/pipeline)
2. LLM splits into 3-4 sub-questions
3. Each sub-question runs step 2-4 above IN PARALLEL
4. LLM checks: are there gaps? → spawns more searches
5. Tournament merge: pairs of results merged → merged → final report

With big LLM (7B+, Claude, GPT) — full agentic:

The big model gets 16 tools and picks what to call freely.
It can chain search → read_node → get_analyses → find_similar → respond.
Works through the standard /api/ask endpoint with tool_choice=required.

All modes coexist. Switch per request with "no_llm": true or "model": "big-model".

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Cortex Server                       │
│                                                         │
│  ┌─────────┐  ┌──────────┐  ┌────────────┐            │
│  │ Nodes   │  │ Analyses │  │ Embeddings │            │
│  │ (DashMap)│  │ (DashMap) │  │ (VectorStore│            │
│  │         │  │          │  │  SIMD flat) │            │
│  └────┬────┘  └────┬─────┘  └─────┬──────┘            │
│       │            │              │                     │
│  ┌────┴────────────┴──────────────┴──────┐             │
│  │           Search Engine               │             │
│  │  Keyword (inverted index)             │             │
│  │  + Vector (SIMD dot product)          │             │
│  │  + RRF fusion                         │             │
│  │  + Domain-partitioned pre-filtering   │             │
│  └───────────────┬───────────────────────┘             │
│                  │                                      │
│  ┌───────────────┴───────────────────────┐             │
│  │           Agent Layer                 │             │
│  │  Strategy agent (small models)        │             │
│  │  Full agentic (big models)            │             │
│  │  Orchestrator (tree decomposition)    │             │
│  │  Pipeline (recursive gather+merge)    │             │
│  └───────────────────────────────────────┘             │
└─────────────────────────────────────────────────────────┘

Data Model

Everything is a Node in a tree:

Movie                          Paper                       Codebase
├── Act 1                      ├── Abstract                ├── src/
│   ├── Scene 1                │   └── chunk               │   ├── auth/
│   │   ├── Shot 1             ├── Methods                 │   │   ├── login.ts
│   │   └── Dialogue 1         │   └── Equation 1          │   │   └── middleware.ts
│   └── Scene 2                └── Results                 │   └── api/
└── Act 2                                                  └── tests/

Each node can have:

  • Multiple embeddings (semantic, visual, audio, code — any modality)
  • AI analyses (mood, sentiment, complexity, topics)
  • Edges to other nodes (cites, implements, similar_to)
  • Tags, metadata, temporal ranges (t_start/t_end for media)

API Endpoints

Pure Search (no LLM needed)

Endpoint Method Description
/api/search POST Hybrid keyword + vector search
/api/query POST Filter by kind, domain, tags, time, metadata
/api/count POST Count matching nodes
/api/aggregate POST Stats across analyses (avg, min, max, distribution)
/api/distinct POST Unique values for a metadata key
/api/nodes POST Ingest hierarchical data
/api/nodes/{id} GET Read a node
/api/nodes/{id}/children GET Direct children
/api/nodes/{id}/subtree GET Full tree below
/api/nodes/{id}/ancestors GET Path to root
/api/nodes/{id}/siblings GET Same-level neighbors
/api/nodes/{id}/similar POST Vector similarity
/api/nodes/{id}/analyses GET AI analyses
/api/nodes/{id}/edges GET Cross-references
/api/edges POST Add edge between nodes
/api/analyses POST Add analysis to node
/api/embeddings POST Store embedding vector
/api/embeddings/batch POST Batch store embeddings
/api/schema GET Database structure overview
/api/tools GET Agent tool definitions (JSON Schema)
/api/config GET Server config + all endpoints
/api/models GET Available LLM models
/api/stats GET Database statistics

AI-Powered (requires LLM)

Endpoint Method Description
/api/ask POST Ask a question — agent searches and synthesizes
/api/orchestrate POST Tree decomposition (quick/medium/deep/hyper)
/api/pipeline POST Full recursive: decompose → gather → gap fill → merge

Agent Swarm (external workers)

Endpoint Method Description
/api/tasks POST Create task for agent swarm
/api/tasks/pending GET Get pending tasks
/api/tasks/{id}/claim POST Claim a task
/api/tasks/{id}/complete POST Complete with result
/api/tasks/{id}/tool-call POST Record tool usage

Per-Request Configuration

Every AI endpoint accepts optional overrides:

# Switch model per request
curl /api/ask -d '{"question": "...", "model": "qwen3.5-9b"}'

# Disable LLM (pure deterministic)
curl /api/ask -d '{"question": "...", "no_llm": true}'

# Force strategy
curl /api/ask -d '{"question": "...", "strategy": "aggregate"}'

# Different LLM server
curl /api/ask -d '{"question": "...", "llm_url": "http://other:8000/v1"}'

# Pipeline with raw mode (no synthesis)
curl /api/pipeline -d '{"question": "...", "raw_mode": true}'

# Orchestrate with hyper intensity
curl /api/orchestrate -d '{"question": "...", "intensity": "hyper"}'

Agent Strategies

The /api/ask endpoint uses 6 strategies with heuristic routing:

Strategy When What it does
keyword_search Default Search → read top results → get analyses
filter_query "list all", "every" Filter by exact criteria → get analyses
aggregate "most", "highest", "average" Compute stats → rank items
explore_tree "structure", "parts of" Navigate hierarchy → show children
compare "compare", "vs" Search each item → aggregate per item
deep_read "what happens", "describe" Full detail: content, siblings, analyses, edges

Pipeline Layers

The /api/pipeline endpoint runs a recursive analysis:

  1. Decompose: LLM splits question into independent sub-questions
  2. Gather: Parallel leaf agents search for each sub-question
  3. Evaluate: LLM identifies gaps in evidence → spawns more searches
  4. Merge: Tournament-style pairwise merge into final report

Options: raw_mode (skip synthesis), skip_gap_fill, merge_style (tournament/direct).

Search Intensity Modes

Mode Depth Max Tasks Tool Calls/Leaf Use Case
quick 0 1 8 Simple fact lookup
medium 1 15 20 Most questions
deep 3 50 35 Thorough analysis
hyper 4 200 50 Exhaustive research

Embedding Pipeline

  • Pool-based concurrent fastembed (multilingual-e5-small, 384-dim)
  • Background auto-embedding of new nodes
  • External API backend support (OpenAI, Ollama, custom)
  • Batch store endpoint for any modality (image CLIP, audio Whisper, code embeddings)
  • SIMD-optimized vector search (50-100M comparisons/sec)

Testing

# Start cortex
./target/release/formica-core --llm-url http://localhost:8000/v1 --llm-model your-model

# Run test suite (seeds data + tests all endpoints)
python tests/test_all.py

CLI Options

--listen          Listen address (default: 0.0.0.0:8200)
--data-dir        Snapshot directory (default: ./formica-data)
--snapshot-interval  Seconds between snapshots (default: 300, 0=disabled)
--restore         Load snapshot on startup
--embed-pool      Embedding pool size (default: 2, 0=disabled)
--embed-batch     Embedding batch size (default: 64)
--embed-api-url   External embedding API URL
--llm-url         LLM endpoint (vLLM/Ollama, OpenAI-compatible)
--llm-model       LLM model name

License

Apache-2.0

About

Universal hierarchical chunk database with AI agent swarm search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors