feat: implement explicit create_cache API#138
Merged
Conversation
This commit refactors the persistent context caching mechanism in Pollux, replacing the implicit `enable_caching=True` flag in `Config` with an explicit `create_cache` API. This decouples cache upload/warm-up from text generation, allowing for stricter validation (e.g., rejecting `system_instruction` or `tools` usage alongside a `CacheHandle` to match Gemini API constraints) and more predictable caching behavior.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
added 13 commits
March 4, 2026 02:34
Addresses P1 and P2 findings from code review: - Raises ConfigurationError if options.cache is used alongside sources. - Validates tools as dictionaries before creation in Gemini provider.
Updates caching.md to not pass sources alongside a cache handle, which is now explicitly rejected by ConfigurationError.
Move create_cache implementation from __init__.py into cache.py (create_cache_impl, _resolve_file_parts, module-level _registry), leaving __init__.create_cache as a thin provider-lifecycle wrapper. Shift cache-handle conflict validation (provider/model mismatch, system_instruction/tools/tool_choice/sources conflicts) from execute_plan() into build_plan() so errors surface at planning time before any network I/O. Retain a single runtime persistent_cache capability check in execute_plan() as a safety net for hand-built handles. The new _resolve_file_parts memoizes uploads by (file_path, mime_type) to avoid duplicate uploads for repeated file sources within a single create_cache() call.
…alls Move _resolve_file_parts() into the single-flight work function inside get_or_create_cache() so concurrent callers for the same cache key share both uploads and cache creation. Previously uploads ran before the single-flight boundary, causing duplicate uploads when two coroutines raced past the registry miss. get_or_create_cache() now accepts raw_parts (unresolved placeholders) and resolves them inside _work(). Add test_cache_single_flight_deduplicates_file_uploads to verify upload_calls==1 and cache_calls==1 under concurrency.
Include api_key in compute_cache_key() so different credentials for the same provider/model produce distinct cache entries. Prevents silent cross-account handle reuse in multi-tenant or multi-key scenarios. Also fix the create_cache() docstring example which referenced an undefined `config` variable (now uses `cfg` consistently).
- Validate tool items are dicts in create_cache_impl before uploads, preventing wasted file uploads on invalid input - Validate system_instruction type at the API boundary, converting a raw TypeError into a ConfigurationError with hint - Pass through ConfigurationError in wrap_provider_error instead of re-wrapping as CacheError
An expired CacheHandle passed via Options(cache=handle) was silently accepted, leading to a cryptic provider error. Now caught eagerly with a clear ConfigurationError before any network I/O.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the implicit
Config(enable_caching=True, ttl_seconds=...)mechanism with an explicitcreate_cache()→CacheHandle→Options(cache=handle)flow. This is a breaking change to the caching API.Why
The implicit approach coupled cache lifecycle to generation calls — caches were created as a side effect of
run(), with no way to control timing, share across calls, or handle errors independently. The explicit API separates cache creation from usage, giving callers control over when uploads and cache warming happen.Design decisions
CacheHandleis opaque and frozen. It carriesname,model,provider, andexpires_atbut callers don't inspect internals — they just pass it toOptions(cache=handle).create_cache_implvalidates all inputs (types, provider capability, tool structure) before any I/O.build_planvalidates handle compatibility (provider/model match, expiration, conflicting options) before any network calls. This ordering is load-bearing — see commit history for the cost of getting it wrong.create_cache()calls with identical content share one upload + one API call viasingleflight_cached. The registry is keyed by content hash, scoped by provider and API key.Configno longer owns caching config.enable_cachingandttl_secondsare removed fromConfig. TTL is now per-cache, passed tocreate_cache(ttl_seconds=...).What changed
src/pollux/__init__.pycreate_cache()public API,CacheHandleexportsrc/pollux/cache.pyCacheHandle,CacheRegistry,create_cache_impl, content-hash keying, file upload dedupsrc/pollux/config.pyenable_caching,ttl_secondsfieldssrc/pollux/plan.pysrc/pollux/execute.pysrc/pollux/options.pyOptions.cachefield acceptsCacheHandlesrc/pollux/providers/create_cache()method on provider interface;wrap_provider_errorpasses throughConfigurationErrortests/test_pipeline.pydocs/,cookbook/Related issue
None
Test plan
just checkpasses (lint + typecheck + 176 tests)Notes
create_cache_implis documented with a scaling note: if the parameter surface grows, a validatedCacheSpecdataclass would keep the boundary manageable.