Skip to content

Conversation

@neilmehta24
Copy link
Member

Start tracking the tokens and cache in cache_wrapper. When we receive a followup prompt, we now no longer reprocess the images.
Cases:

  • The hash of the input images changes --> Always do full reprocessing
  • The prompt is extended --> Do text-only prompt processing
  • The prompt is trimmed --> Trim the cache and do text-only prompt processing
  • The cache cannot be trimmed --> Full vision reprocessing

I added two tests, one for SWA caches which often cannot be trimmed, and one for non-SWA caches, which are usually always trimmable.

Note that there are still opportunities for improvement. Namely, we could be caching the embeddings per image so that we can selectively re-use the embeddings. This can be added as a feature in a future PR, in cases where the cache cannot be trimmed.

Note that this doesn't cache images going through the non-unified stack, that can be added in a future PR.

@github-actions github-actions bot added the CLA signed Indicates that all contributors have signed label Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA signed Indicates that all contributors have signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants