Feature/litellm prompt caching implementation #94

esafwan · 2025-12-31T14:45:27Z

PR: Implement LiteLLM Prompt Caching

Overview

This PR integrates LiteLLM's prompt caching capabilities into HUF. This feature allows agents to cache and reuse prompt prefixes (system instructions, tool definitions, conversation history), significantly reducing inference costs and latency for supported models (e.g., Anthropic Claude 3.5 Sonnet, GPT-4o).

Changes

1. Data Model & Validation (`feat(core)`)

Agent DocType:
- Added enable_prompt_caching field.
- Added validation logic in Agent.validate() to check if the selected provider/model supports prompt caching using litellm.supports_prompt_caching(). warning is shown if unsupported.
Agent Run DocType:
- Added cached_tokens (Int) field to track token savings per run.

2. AI Provider Logic (`feat(ai)`)

Agent Integration (agent_integration.py):
- Propagates the enable_prompt_caching setting from the Agent document to the execution context.
LiteLLM Provider (litellm.py):
- Updated run() and run_stream() to pass cached_messages=True to litellm.completion.
- Implemented logic to extract prompt_tokens_details.cached_tokens from the LLM response usage data.
- Correctly maps cached tokens to the Agent Run document.

3. Documentation (`docs`)

Added PROMPT_CACHING_IMPLEMENTATION.md: A comprehensive guide covering:
- Architecture explanation.
- Supported providers (Anthropic, OpenAI, DeepSeek, etc.).
- Configuration steps.
- Verification and troubleshooting.

Benefits

Cost Efficiency: Cached tokens are typically ~90% cheaper (e.g., Anthropic charges 10% of the input price for cached reads).
Performance: Time-to-First-Token (TTFT) is reduced for requests hitting the cache.
Visibility: Users can now see exactly how many tokens were cached in the Agent Run logs.

How to Test

Prerequisites

Ensure litellm is updated to a version supporting prompt caching (e.g., >=1.40.0).
Run bench migrate to apply schema changes.

Steps

Configure Agent:
- Open an Agent document.
- Select a model that supports caching (e.g., anthropic/claude-3-5-sonnet-20240620).
- Check the new "Enable Prompt Caching" box.
- Save. (Verify no warnings appear).
Run 1 (Cache Creation):
- Send a message to the agent.
- Check the resulting Agent Run.
- Cached Tokens should be 0 (or greater if using a shared system prompt that was already cached globally).
Run 2 (Cache Read):
- Send another message to the same agent (or same agent config).
- Check the Agent Run.
- Cached Tokens should be > 0.
Validation Check:
- Switch model to one that doesn't support caching (e.g., openai/gpt-3.5-turbo older versions).
- Try to save. You should see a warning.

Commits

feat(core): Add prompt caching schema & validation
feat(ai): Implement prompt caching execution logic
docs: Add Prompt Caching implementation guide

- Update Agent DocType to include enable_prompt_caching field - Update Agent Run DocType to track cached tokens usage - Add validation logic to Agent controller to check caching support

- Update agent integration to pass caching config to provider - Implement caching logic in LiteLLM provider - Add logic to track and record cached write tokens in run logs

- Document prompt caching architecture and configuration - Explain validation logic and supported providers - Provide usage examples and implementation details

Adds prompt caching configuration to agents and tracks cached tokens. This reduces costs by reusing cached prompt content. Co-authored-by: esafwan <esafwan@gmail.com>

…ttps://github.com/tridz-dev/agent_flo into cursor/litellm-prompt-caching-implementation-c216

…://github.com/tridz-dev/agent_flo into feature/litellm-prompt-caching-implementation

- Gated Caching: Modified litellm.py to only use content array formats and cache_control headers if 'enable_prompt_caching' is checked in the Agent. - Token Tracking: Fixed issue where tokens, costs, and cache showed as 0 by correctly extracting attributes from the LiteLLM Usage object. - Stability: Resolved SyntaxError in litellm.py by fixing incomplete try-except blocks.

esafwan and others added 7 commits December 31, 2025 20:07

feat(core): Add prompt caching schema & validation

3160a0b

- Update Agent DocType to include enable_prompt_caching field - Update Agent Run DocType to track cached tokens usage - Add validation logic to Agent controller to check caching support

feat(ai): Implement prompt caching execution logic

cd11cd9

- Update agent integration to pass caching config to provider - Implement caching logic in LiteLLM provider - Add logic to track and record cached write tokens in run logs

docs: Add Prompt Caching implementation guide

96ec605

- Document prompt caching architecture and configuration - Explain validation logic and supported providers - Provide usage examples and implementation details

feat: Implement LiteLLM prompt caching for agents

ed743bc

Adds prompt caching configuration to agents and tracks cached tokens. This reduces costs by reusing cached prompt content. Co-authored-by: esafwan <esafwan@gmail.com>

Merge branch 'cursor/litellm-prompt-caching-implementation-c216' of h…

471d602

…ttps://github.com/tridz-dev/agent_flo into cursor/litellm-prompt-caching-implementation-c216

Merge branch 'feature/litellm-prompt-caching-implementation' of https…

fb4675a

…://github.com/tridz-dev/agent_flo into feature/litellm-prompt-caching-implementation

esafwan force-pushed the feature/litellm-prompt-caching-implementation branch from d3ae41b to 33a99b9 Compare January 5, 2026 12:45

update: removed prompt caching implementation MD file

b08a08e

Sanjusha-tridz marked this pull request as ready for review January 5, 2026 13:08

Sanjusha-tridz merged commit c02ce5a into develop Jan 5, 2026
1 of 3 checks passed

Sanjusha-tridz deleted the feature/litellm-prompt-caching-implementation branch January 6, 2026 04:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/litellm prompt caching implementation #94

Feature/litellm prompt caching implementation #94

Uh oh!

esafwan commented Dec 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature/litellm prompt caching implementation #94

Feature/litellm prompt caching implementation #94

Uh oh!

Conversation

esafwan commented Dec 31, 2025

PR: Implement LiteLLM Prompt Caching

Overview

Changes

1. Data Model & Validation (feat(core))

2. AI Provider Logic (feat(ai))

3. Documentation (docs)

Benefits

How to Test

Prerequisites

Steps

Commits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Data Model & Validation (`feat(core)`)

2. AI Provider Logic (`feat(ai)`)

3. Documentation (`docs`)