This document describes the current reset-era user-memory retrieval stack for:
user_memory_entriesuser_memory_views- runtime context injection
/api/v1/user-memory*query endpoints
User-memory now runs on a hybrid pipeline:
- PostgreSQL remains the source of truth for entries and views
- Milvus stores the user-memory vector index
- PostgreSQL FTS / trigram / structured filters provide lexical and symbolic retrieval
- the final ranking path uses hybrid fusion and reranking
Write path:
session ledger -> observations -> entries/views- the same transaction enqueues embedding jobs in
user_memory_embedding_jobs - the indexing worker writes entry/view embeddings into the active Milvus collection
Read path:
- planner builds query variants, keyword terms, and structured filters
- semantic retrieval queries Milvus
- lexical retrieval queries PostgreSQL search indexes
- structured retrieval filters PostgreSQL metadata fields
- candidates are merged with reciprocal-rank fusion
- reranking applies model rerank when configured, otherwise heuristic rerank
- the final relevance gate applies
user_memory.retrieval.similarity_threshold
Cleanup path:
backfill_user_memory_embeddings.pybuilds the active collectionreconcile_user_memory_embeddings.pychecks missing/orphan vectors- periodic vector cleanup runs reconcile + optional compaction
Runtime and API share the same hybrid core but use different planner policies.
Runtime:
planner_mode=runtime_light- deterministic planning only
- no extra LLM call
- no reflection
API:
planner_mode=api_full- one optional LLM planning pass
- up to one reflection round when enabled and worthwhile
/api/v1/user-memory/profilescopes retrieval toview_type=user_profile/api/v1/user-memory/episodesscopes retrieval toview_type=episodeand supplements withfact_kind=eventwhen needed
Wildcard queries ("" or "*") bypass the full hybrid path and return recent active rows from PostgreSQL.
Canonical user-memory retrieval config:
user_memory.embedding.*user_memory.retrieval.hybrid_enableduser_memory.retrieval.similarity_thresholduser_memory.retrieval.vector.*user_memory.retrieval.lexical.*user_memory.retrieval.structured.*user_memory.retrieval.fusion.*user_memory.retrieval.rerank.*user_memory.retrieval.planner.*user_memory.retrieval.reflection.*user_memory.vector_indexing.*user_memory.vector_cleanup.*user_memory.extraction.*user_memory.consolidation.*session_ledger.*runtime_context.enable_*
GET /api/v1/user-memory/config also returns:
user_memory.indexState.activeCollectionuser_memory.indexState.activeSignatureuser_memory.indexState.buildStateuser_memory.indexState.lastBackfillStartedAtuser_memory.indexState.lastBackfillCompletedAtuser_memory.indexState.lastReconcileAtuser_memory.indexState.reindexRequired
The API no longer exposes these legacy retrieval keys:
user_memory.retrieval.legacy_fallback_enableduser_memory.retrieval.strict_keyword_fallback- flat rerank keys such as
user_memory.retrieval.rerank_provider - nested
user_memory.retrieval.milvus.*
If these keys still exist in an older config file, the config endpoint canonicalizes them into the new hybrid structure and omits the legacy names from responses.
Use these scripts for vector-index lifecycle management:
backend/scripts/bootstrap_user_memory_vector_index.pybackend/scripts/backfill_user_memory_embeddings.pybackend/scripts/reconcile_user_memory_embeddings.pybackend/scripts/verify_user_memory_cutover.py
Recommended cutover sequence:
- bootstrap the active collection
- backfill embeddings
- run verify to compare hybrid retrieval against the recent-row baseline and inspect dry-run reconcile output
- deploy the application
- run reconcile daily during the first post-cutover week