The fastest, most trustworthy memory layer for coding agents.
FlowState-QMD turns local markdown knowledge into shared project memory for Codex-, Claude-, Cursor-, and MCP-style agents. It combines a durable markdown memory store with a FlowState anticipatory cache so agents can pull relevant context before they fall into a reactive search loop.
- Anticipatory memory, not just stored memory. FlowState prefetches context into
~/.cache/qmd/intuition.jsonso agents can start from the right documents instead of deciding to search after the fact. - Built for coding agents. It works best on docs, ADRs, RFCs, notes, runbooks, changelogs, migration logs, and benchmark writeups.
- Local-first and inspectable. Everything runs on your machine with SQLite,
sqlite-vec, and local GGUF models vianode-llama-cpp. - MCP-native. The default experience is a clean MCP server plus a packaged agent skill.
- Trust over magic. Results keep file paths, doc IDs, snippets, and explain traces so the agent can show its work.
- Durable knowledge: indexed markdown files in named collections
- Working memory: the FlowState anticipatory cache at
~/.cache/qmd/intuition.json - Context overlays: human-authored collection/path summaries added with
qmd context add
git clone https://github.com/amanning3390/flowstate-qmd.git
cd flowstate-qmd
bun install
# Verify the host, recommend a profile, and emit wrapper configs
qmd init --target all
# Inspect readiness any time
qmd doctor
# Index your repo memory
qmd collection add ./docs --name docs
qmd collection add ./notes --name notes
qmd embed
# Start the coding-agent memory server
qmd mcpIf you want FlowState anticipatory memory too:
# Watch the current agent session log and keep the anticipatory cache warm
qmd flow ~/.codex/sessions/current.log --lite
# Compatibility alias also supported:
qmd flow start --watch ~/.codex/sessions/current.log --liteThe best live demo for judges or teammates:
- Index a repo's
docs/,notes/,CHANGELOG.md, and ADRs. - Start
qmd flowon an active coding-agent session log. - Ask a question like:
- "Why did we roll back the auth migration?"
- "What changed after the database incident?"
- "What did we decide about the API contract in the ADR?"
- Have the agent call
fetch_anticipatory_contextbefore a regular search. - Show that the returned memories already include the ADR, changelog, or meeting note the agent needed.
See docs/DEMO.md for an exact script.
Traditional RAG for agents looks like this:
user asks → agent decides to search → tool call → wait → result processing → answer
FlowState-QMD changes the loop:
user asks → anticipatory context already exists → agent answers or deepens with query/get
That difference matters most in coding workflows where the missing context is often already in:
- design docs
- changelogs
- migration notes
- incident writeups
- RFCs / ADRs
- benchmark reports
qmd collection add . --name repo-memory --mask '**/*.md'
# Compatibility alias
qmd index . --name repo-memory --mask '**/*.md'qmd context add qmd://repo-memory/docs "Project documentation and architecture notes"
qmd context add qmd://repo-memory/adr "Architecture decision records and tradeoff history"
qmd context add / "Answer as a coding agent using the repo's documented decisions"qmd embed# Auto-expand + rerank
qmd query "why was the rollout reverted"
# Structured search for better control
qmd query $'lex: "auth rollback" migration\nvec: why did we revert the auth migration?'
# Inspect why memories were chosen
qmd query --json --explain "performance regression in auth service"qmd get "#abc123"
qmd multi-get "docs/**/*.md,CHANGELOG.md"qmd mcpThe flagship tool order for coding agents is:
fetch_anticipatory_contextqueryget/multi_getstatus
{
"mcpServers": {
"qmd": { "command": "qmd", "args": ["mcp"] }
}
}qmd init --target hermes
qmd init --target gemini,kiro,vscode
qmd doctor --jsonqmd init now:
- profiles the host and recommends
standardorlite - writes a bootstrap report to
~/.cache/qmd/bootstrap-report.json - emits or installs config for Hermes, Claude Code, Codex, Gemini CLI, Kiro, VS Code, OpenClaw, and pi
- keeps the canonical
qmdMCP namespace and tool order across every client
hermes: installs~/.hermes/config.yamlclaude-code: installs~/.claude.jsoncodex: installs~/.codex/config.tomlgemini: installs~/.gemini/settings.jsonkiro: installs.kiro/settings/mcp.jsonvscode: installs.vscode/mcp.jsonopenclaw,pi: emits ready-to-apply artifacts under~/.cache/qmd/targets/
qmd skill install --global --yesThe packaged skill is optimized for coding-agent memory and teaches the agent to:
- use
fetch_anticipatory_contextfirst - use
queryfor deeper retrieval - use
get/multi_getfor exact evidence
fetch_anticipatory_context prefers the FlowState cache and falls back to a live project-memory query when needed.
Example payload:
{
"recent_conversation": "What changed in the auth rollback plan and why did we revert the migration?",
"refresh": false,
"lite_mode": false
}Typical uses:
- current coding question with recent local context
- migration/debugging follow-up questions
- ADR/changelog lookups where the agent should feel preloaded
qmd status
qmd collection list
qmd collection update-cmd docs 'git pull --rebase --ff-only'
qmd collection exclude archive
qmd ls repo-memory/docs
qmd cleanup- Qwen3 embedding + reranker models
- best quality
- recommended for Apple Silicon with 16GB+ RAM or comparable Linux hardware
- Models: Qwen3-Embedding-0.6B + Qwen3-Reranker-0.6B (~1.2GB VRAM total)
- Use case: Laptops, CI environments, or demo setups with <8GB RAM
- Behavior:
hybridQuery()caps results at 5, limits candidates to 15, and skips LLM reranking - Auto-engaged: Systems with <8GB RAM automatically use Lite mode
Use:
qmd flow ~/.codex/sessions/current.log --liteThe --lite flag can also be passed to qmd query and qmd embed for consistent low-memory operation.
- The best results come from markdown knowledge, not arbitrary binary project assets.
- Anticipatory memory depends on having an active session log to watch.
- First model-backed search can be slower while local models warm up.
qmd collection add,qmd embed, andqmd updateare intentionally manual operations; they are not auto-run by this repo's agents.
Use Bun in this repo.
bun install
npx vitest run --reporter=verbose test/
bun test --preload ./src/test-preload.ts test/Important commands:
bun src/cli/qmd.ts <command>
qmd skill show
qmd mcp --http --daemon
qmd mcp stopWhen multiple agents index the same project, FlowState-QMD prevents duplicate memories:
# Agent A indexes a doc about auth rollback
# Agent B tries to index a semantically identical doc
# → QMD detects 0.94 cosine similarity (threshold: 0.90)
# → Annotates existing memory instead of duplicatingDedup stats are tracked via getDedupStats() and visible in qmd status.
FlowState tracks cache hit/miss rates and refresh latency in ~/.cache/qmd/telemetry.json, surfaced via qmd status and the MCP status tool.
Reproducible benchmarks in bench/:
bun bench/latency.ts # FTS vs hybrid vs hybrid-lite search latency
bun bench/cache-hit.ts # Intuition cache read/parse latency (1000 rounds)
bun bench/tool-calls.ts # MCP tool round-trip simulationAll output JSON for programmatic consumption.
┌─────────────────────────────────────────────────────────────┐
│ Agent (Hermes, Claude Code, Codex, ...) │
│ │ │
│ MCP tool calls │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ MCP Server (stdio / HTTP) │ │
│ │ fetch_anticipatory_context │ query │ get │ status │ │
│ └──────────────┬──────────────┴───────┴─────┴──────┘ │
│ │ │ │
│ ┌───────────▼──────────┐ ┌────────▼────────┐ │
│ │ FlowState Engine │ │ Store (SQLite) │ │
│ │ fs.watch + debounce │ │ FTS5 + vec │ │
│ │ intuition.json │ │ BM25 + cosine │ │
│ │ telemetry.json │ │ RRF fusion │ │
│ └──────────────────────┘ │ LLM reranking │ │
│ │ Idempotency │ │
│ └──────────────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ node-llama-cpp │ │
│ │ Qwen3 Embed + Rank │ │
│ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Store: SQLite + FTS5 +
sqlite-vec - Retrieval: BM25 + vector search + reciprocal rank fusion + reranking
- FlowState: event-driven watcher that keeps
intuition.jsonfresh - Idempotency: cosine similarity dedup at 0.90 threshold on document ingest
- Telemetry: cache hit/miss tracking persisted to
telemetry.json - Interfaces: CLI, MCP server, SDK, packaged agent skill
- Demo script: docs/DEMO.md
- Submission writeup: SUBMISSION.md
- Promo/demo video assets: assets/video
FlowState-QMD is not trying to be every kind of memory system. It is trying to be the best coding-agent memory layer: fast, local, inspectable, shared across agents, and strong enough to make project context feel native.
