From f2d1d52752f752bbb1c875e75bf1f784852368c6 Mon Sep 17 00:00:00 2001
From: fzowl SPLADE ranking
An embedder that uses the VoyageAI embedding API +to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service +and does not require local model files or ONNX inference.
+ +{% highlight xml %}
+
+
+ voyage-3.5
+ voyage_api_key
+
+
+{% endhighlight %}
+
+model specifies which VoyageAI model to use.
+ Available models include voyage-3.5 (1024 dimensions, latest and best),
+ voyage-3.5-lite (512 dimensions, fastest),
+ voyage-code-3 (optimized for code), and others.
+ See the VoyageAI documentation for the full list.
+ api-key-secret-ref references a secret in Vespa's
+ secret store containing your VoyageAI API key.
+ This is required for authentication.
+ Add your VoyageAI API key to the secret store:
++vespa secret add voyage_api_key --value "pa-xxxxx..." ++ +
See the reference +for all configuration parameters including caching, retry logic, and performance tuning.
+ +
+ VoyageAI offers several embedding models optimized for different use cases.
+ The resulting tensor type can be float or
+ bfloat16 for storage efficiency.
+
Latest general-purpose models (recommended):
+tensor<float>(x[1024]) - latest and best quality, state-of-the-art for most applicationstensor<float>(x[512]) - newest lite model, excellent quality at lower cost and faster speedPrevious generation general-purpose models:
+tensor<float>(x[1024]) - high quality (use voyage-3.5 for best results)tensor<float>(x[512]) - cost-efficient (use voyage-3.5-lite for better performance)Specialized models:
+tensor<float>(x[1024]) - optimized for code search and technical contenttensor<float>(x[1024]) - optimized for financial documentstensor<float>(x[1024]) - optimized for legal documentstensor<float>(x[1024]) - supports 100+ languagesVoyageAI models distinguish between query and document embeddings for improved retrieval quality. +The embedder automatically detects the context and sets the appropriate input type:
+embed()You can disable auto-detection and set a fixed input type:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ false
+ query
+
+{% endhighlight %}
+
+The VoyageAI embedder includes several performance optimizations:
+cache-size (default: 1000).pool-size (default: 5).normalize set to true.Example with performance tuning:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ 10000
+ 20
+ true
+
+{% endhighlight %}
+
+Complete example showing document indexing and query-time embedding:
+ +Schema definition:
+
+schema doc {
+ document doc {
+ field text type string {
+ indexing: summary | index
+ }
+ }
+
+ field embedding type tensor<float>(x[1024]) {
+ indexing: input text | embed voyage | attribute | index
+ attribute {
+ distance-metric: angular
+ }
+ }
+
+ rank-profile semantic {
+ inputs {
+ query(q) tensor<float>(x[1024])
+ }
+ first-phase {
+ expression: closeness(field, embedding)
+ }
+ }
+}
+
+
+Query with embedding:
+{% highlight bash %}
+vespa query \
+ 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,q)' \
+ 'input.query(q)=embed(voyage, "machine learning tutorials")'
+{% endhighlight %}
+
+When using normalize set to true, use
+distance-metric: prenormalized-angular
+for more efficient similarity computation.
Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:
diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html index 0bf9d40a98..7c24a1d232 100644 --- a/en/reference/rag/embedding.html +++ b/en/reference/rag/embedding.html @@ -478,6 +478,206 @@+ An embedder that uses the VoyageAI API + to generate embeddings. This is an API-based embedder that does not require local model files or ONNX inference. + It calls the VoyageAI service to generate high-quality embeddings optimized for semantic search. +
+
+ The VoyageAI embedder is configured in services.xml,
+ within the container tag:
+
{% highlight xml %}
+
+
+ voyage-3.5
+ voyage_api_key
+
+
+{% endhighlight %}
+
++ The VoyageAI API key must be stored in Vespa's + secret store for secure management: +
++vespa secret add voyage_api_key --value "pa-xxxxx..." ++
+ The api-key-secret-ref parameter references the secret name.
+ Secrets are automatically refreshed when rotated without requiring application restart.
+
| Name | +Occurrence | +Description | +Type | +Default | +
|---|---|---|---|---|
| api-key-secret-ref | +One | +Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key. | +string | +N/A | +
| model | +One | +The VoyageAI model to use. Available models:
+
|
+ string | +voyage-3.5 | +
| endpoint | +One | +VoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints. | +string | +https://api.voyageai.com/v1/embeddings | +
| timeout | +One | +Request timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms. | +numeric | +30000 | +
| max-retries | +One | +Maximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound. | +numeric | +10 | +
| default-input-type | +One | +Default input type when auto-detection is disabled. Valid values: query or document. VoyageAI models use different optimizations for queries vs documents. |
+ enum | +document | +
| auto-detect-input-type | +One | +Whether to automatically detect input type based on context. When enabled, uses query type for query-time embeddings and document type for indexing. |
+ boolean | +true | +
| normalize | +One | +Whether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with prenormalized-angular distance-metric for efficient similarity computation. |
+ boolean | +false | +
| truncate | +One | +Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail. | +boolean | +true | +
| pool-size | +One | +HTTP connection pool size. Higher values improve throughput for concurrent requests but use more resources. | +numeric | +5 | +
| cache-size | +One | +LRU cache size for storing recent embeddings. Reduces duplicate API calls. Set to 0 to disable caching. Cache key includes text, input type, and embedder ID. | +numeric | +1000 | +
Basic configuration (recommended):
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+
+{% endhighlight %}
+
+High-performance configuration:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ 10000
+ 20
+ 60000
+
+{% endhighlight %}
+
+Fast and cost-efficient configuration:
+{% highlight xml %}
+
+ voyage-3.5-lite
+ voyage_api_key
+ 10000
+
+{% endhighlight %}
+
+Query-optimized configuration:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ query
+ false
+ true
+
+{% endhighlight %}
+
+Code search configuration:
+{% highlight xml %}
+
+ voyage-code-3
+ voyage_api_key
+ true
+
+{% endhighlight %}
+
+The VoyageAI embedder includes several features to reduce API costs and improve performance:
+voyage-3.5-lite for cost-sensitive applications (512 dimensions vs 1024 dimensions reduces costs while maintaining excellent quality). For best quality, use voyage-3.5.For detailed performance monitoring, the embedder emits standard Vespa embedder metrics + (see Container Metrics). + Monitor API usage and costs through the VoyageAI dashboard.
+ + +
The Huggingface tokenizer embedder is configured in services.xml,
From e8e64717a9fdd0281b0f1af7e558f9f21656c7c5 Mon Sep 17 00:00:00 2001
From: fzowl The VoyageAI embedder includes several performance optimizations: The VoyageAI embedder includes several features to reduce API costs and improve performance: Contextual model: Multimodal model (preview): VoyageAI models distinguish between query and document embeddings for improved retrieval quality.
The embedder automatically detects the context and sets the appropriate input type:Input type detection
VoyageAI performance features
-
@@ -587,8 +587,7 @@ cache-size (default: 1000).pool-size (default: 5).max-idle-connections (default: 5).max-retries (default: 10).normalize set to true.VoyageAI performance features
VoyageAI embedder reference config
true
-
- pool-size
+ max-idle-connections
One
- HTTP connection pool size. Higher values improve throughput for concurrent requests but use more resources.
+ Maximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources.
numeric
5
-
@@ -628,8 +621,7 @@ cache-size
- One
- LRU cache size for storing recent embeddings. Reduces duplicate API calls. Set to 0 to disable caching. Cache key includes text, input type, and embedder ID.
- numeric
- 1000
- Example Configurations
Example Configurations
Example Configurations
Cost and Performance Optimization
-
From 3f34645206f61c28cf09a004ff439ac217cd4540 Mon Sep 17 00:00:00 2001
From: fzowl max-idle-connections (default: 5).max-retries (default: 10).voyage-3.5-lite for cost-sensitive applications (512 dimensions vs 1024 dimensions reduces costs while maintaining excellent quality). For best quality, use voyage-3.5.VoyageAI embedder models
tensor<float>(x[1024]) - supports 100+ languages
+
+
+tensor<float>(x[1024]) (configurable: 256, 512, 1024, 2048) -
+ contextualized embeddings for document chunks with surrounding context awareness
+
+
tensor<float>(x[1024]) (configurable: 256, 512, 1024, 2048) -
+ multimodal embeddings for text, images, and video in a shared vector spaceInput type detection
VoyageAI embedder reference config
voyage-finance-2 (1024 dims) - Financial documentsvoyage-law-2 (1024 dims) - Legal documentsvoyage-multilingual-2 (1024 dims) - Multilingual supportvoyage-context-3 (1024 dims, configurable: 256/512/1024/2048) - Contextualized document chunk embeddingsvoyage-multimodal-3.5 (1024 dims, configurable: 256/512/1024/2048) - Multimodal embeddings (text, images, video) [preview]string