-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Summary
When OpenViking is used behind a context-engine style integration, naive recall injection can become prompt-cache hostile even if retrieval quality is good.
In our downstream integration, we observed much lower cache reuse after switching from a more stable context layer to OpenViking-backed dynamic recall. The main reason is that retrieval results are re-injected into the prompt prefix on many turns, often with unstable ordering/content and sometimes with full L2 memory bodies.
This issue is not claiming that OpenViking core rewrites session history. The concern is that current recall usage patterns around OpenViking make it too easy for integrations to destroy prompt-cache locality.
Why this matters
For long-running agent sessions, cache stability is a major cost/latency lever.
If each turn does all of the following:
- re-run retrieval
- re-rank/re-select top memories
- inject a slightly different set/order of memories
- inject full memory bodies instead of stable summaries / references
then the prompt prefix becomes unstable and cache hit rate drops sharply, even when the visible chat history is mostly unchanged.
Observed downstream pattern
In our case, the integration behavior looked like this on many turns:
- query latest user text
- run
find(...) - read selected memory URIs
- prepend a
<relevant-memories>block to the next prompt - sometimes also prepend another dynamic helper block
Operationally, logs repeatedly showed lines like:
injecting 6 memories into context
across many turns in the same session.
Why I am filing this here
Although the immediate implementation lives in a downstream integration, the design question feels upstream-relevant:
What is the recommended OpenViking recall contract for cache-friendly integrations?
Right now it is very natural for an integration to treat recall as “retrieve + inject full content every turn”, but this tends to work against prompt caching.
Suggested upstream direction
It would be valuable if OpenViking documented and/or supported a more cache-friendly recall pattern, for example:
-
prefer stable L0/L1 outputs for prompt injection
- encourage integrations to inject short summaries / stable bullets instead of full L2 bodies by default
-
session-window stability guidance
- recommend keeping recall results sticky within a short session window unless query novelty crosses a threshold
-
stable ordering / stable identifiers
- make it easier to preserve deterministic ordering and reference-based rendering
-
two-tier recall contract
- tier A: cache-friendly prompt injection (short, stable)
- tier B: on-demand deep read of L2 memory bodies only when necessary
-
best-practice docs for context-engine integrations
- specifically covering prompt cache trade-offs
Concrete ask
Could maintainers comment on the intended best practice here?
In particular:
- Do you agree that OpenViking integrations should generally avoid injecting full L2 memory content on every turn?
- Is there already a preferred pattern for “stable recall for prompt construction”?
- If not, would you be open to adding guidance or API affordances aimed at cache-friendly context-engine integrations?
I think this is becoming important as OpenViking gets used more often as the retrieval layer behind agent frameworks, where prompt-cache behavior directly affects usability and cost.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status