Feature/litellm prompt caching implementation #94
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR: Implement LiteLLM Prompt Caching
Overview
This PR integrates LiteLLM's prompt caching capabilities into HUF. This feature allows agents to cache and reuse prompt prefixes (system instructions, tool definitions, conversation history), significantly reducing inference costs and latency for supported models (e.g., Anthropic Claude 3.5 Sonnet, GPT-4o).
Changes
1. Data Model & Validation (
feat(core))enable_prompt_cachingfield.Agent.validate()to check if the selected provider/model supports prompt caching usinglitellm.supports_prompt_caching(). warning is shown if unsupported.cached_tokens(Int) field to track token savings per run.2. AI Provider Logic (
feat(ai))agent_integration.py):enable_prompt_cachingsetting from the Agent document to the execution context.litellm.py):run()andrun_stream()to passcached_messages=Truetolitellm.completion.prompt_tokens_details.cached_tokensfrom the LLM response usage data.Agent Rundocument.3. Documentation (
docs)PROMPT_CACHING_IMPLEMENTATION.md: A comprehensive guide covering:Benefits
Agent Runlogs.How to Test
Prerequisites
litellmis updated to a version supporting prompt caching (e.g.,>=1.40.0).bench migrateto apply schema changes.Steps
anthropic/claude-3-5-sonnet-20240620).Cached Tokensshould be0(or greater if using a shared system prompt that was already cached globally).Cached Tokensshould be > 0.openai/gpt-3.5-turboolder versions).Commits
feat(core): Add prompt caching schema & validationfeat(ai): Implement prompt caching execution logicdocs: Add Prompt Caching implementation guide