Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Currently preventing usage
Please provide a clear description of problem this feature solves
Every call to:
seems to re-load the embedder model to GPU.
Describe the feature, and optionally a solution or implementation and any alternatives
The model should be persisted on device and re-used so each subsequent query can be faster.
Additional context
No response