Skip to content

[Feature]: prompt caching - Explicit and Implicit #367

@qdrddr

Description

@qdrddr

Use Case

An agent that uses a memory. Both explicit and Implicit caching requires prompt to be at the beginning of the request.

Problem Statement

  1. LLM use is expensive. I want to benefit from Explicit and Implicit caching available with various providers supporting this feature.
  2. Explicit caching is a two-edged sword: it can provide the benefit of significant savings, but it is not free. Explicit caching can save money, but it may end up consuming more money than it saves.

How This Feature Would Help

Reduces LLM costs

Proposed Solution

Move repetitive instruction text to the top of the prompt over the API requests.
Keep track of the statistics for repetitive text to evaluate whether the text was eligible for caching and allow the user to evaluate whether it was actually cached.

Explicit Caching

Providers such as Google and Anthropic support explicit caching.
When used with Anthropic SDK, Gemini SDK, and LiteLLM SDK (with Anthropic/Gemini provider or OpenRouter Provider with the Anthropic/Gemini models), I'd like to be able to enable explicit caching.

TTL

Via CLI/UI/Envs, I'd like to be able to:

  1. Manipulate the cache TTL
  2. Monitor cache spending vs. savings with a given TTL in production, so I could try adjusting TTL and later see how it affects my savings with the previous TTL.

Alternatives Considered

No response

Priority

Nice to have

Additional Context

The goal is to

  1. Ensure Prompt is cachable and can benefit the most from cache
  2. Provide reasonable defaults for the Explicit cache
  3. Allow the user to find the best TTL settings to optimize caching benefit

Checklist

  • I would be willing to contribute this feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions