Guidance/API for cache-friendly recall in context-engine integrations

## Summary

When OpenViking is used behind a context-engine style integration, naive recall injection can become **prompt-cache hostile** even if retrieval quality is good.

In our downstream integration, we observed much lower cache reuse after switching from a more stable context layer to OpenViking-backed dynamic recall. The main reason is that retrieval results are re-injected into the prompt prefix on many turns, often with unstable ordering/content and sometimes with full L2 memory bodies.

This issue is **not** claiming that OpenViking core rewrites session history. The concern is that current recall usage patterns around OpenViking make it too easy for integrations to destroy prompt-cache locality.

## Why this matters

For long-running agent sessions, cache stability is a major cost/latency lever.

If each turn does all of the following:

- re-run retrieval
- re-rank/re-select top memories
- inject a slightly different set/order of memories
- inject full memory bodies instead of stable summaries / references

then the prompt prefix becomes unstable and cache hit rate drops sharply, even when the visible chat history is mostly unchanged.

## Observed downstream pattern

In our case, the integration behavior looked like this on many turns:

- query latest user text
- run `find(...)`
- read selected memory URIs
- prepend a `<relevant-memories>` block to the next prompt
- sometimes also prepend another dynamic helper block

Operationally, logs repeatedly showed lines like:

- `injecting 6 memories into context`

across many turns in the same session.

## Why I am filing this here

Although the immediate implementation lives in a downstream integration, the design question feels upstream-relevant:

**What is the recommended OpenViking recall contract for cache-friendly integrations?**

Right now it is very natural for an integration to treat recall as “retrieve + inject full content every turn”, but this tends to work against prompt caching.

## Suggested upstream direction

It would be valuable if OpenViking documented and/or supported a more cache-friendly recall pattern, for example:

1. **prefer stable L0/L1 outputs for prompt injection**
   - encourage integrations to inject short summaries / stable bullets instead of full L2 bodies by default

2. **session-window stability guidance**
   - recommend keeping recall results sticky within a short session window unless query novelty crosses a threshold

3. **stable ordering / stable identifiers**
   - make it easier to preserve deterministic ordering and reference-based rendering

4. **two-tier recall contract**
   - tier A: cache-friendly prompt injection (short, stable)
   - tier B: on-demand deep read of L2 memory bodies only when necessary

5. **best-practice docs for context-engine integrations**
   - specifically covering prompt cache trade-offs

## Concrete ask

Could maintainers comment on the intended best practice here?

In particular:

- Do you agree that OpenViking integrations should generally avoid injecting full L2 memory content on every turn?
- Is there already a preferred pattern for “stable recall for prompt construction”?
- If not, would you be open to adding guidance or API affordances aimed at cache-friendly context-engine integrations?

I think this is becoming important as OpenViking gets used more often as the retrieval layer behind agent frameworks, where prompt-cache behavior directly affects usability and cost.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance/API for cache-friendly recall in context-engine integrations #817

Summary

Why this matters

Observed downstream pattern

Why I am filing this here

Suggested upstream direction

Concrete ask

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guidance/API for cache-friendly recall in context-engine integrations #817

Description

Summary

Why this matters

Observed downstream pattern

Why I am filing this here

Suggested upstream direction

Concrete ask

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions