diff --git a/en/rag/embedding.html b/en/rag/embedding.html index e21b1a6a66..21fccad8c0 100644 --- a/en/rag/embedding.html +++ b/en/rag/embedding.html @@ -490,6 +490,162 @@
An embedder that uses the VoyageAI embedding API +to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service +and does not require local model files or ONNX inference.
+ +{% highlight xml %}
+
+
+ voyage-3.5
+ voyage_api_key
+
+
+{% endhighlight %}
+
+model specifies which VoyageAI model to use.
+ Available models include voyage-3.5 (1024 dimensions, latest and best),
+ voyage-3.5-lite (512 dimensions, fastest),
+ voyage-code-3 (optimized for code), and others.
+ See the VoyageAI documentation for the full list.
+ api-key-secret-ref references a secret in Vespa's
+ secret store containing your VoyageAI API key.
+ This is required for authentication.
+ Add your VoyageAI API key to the secret store:
++vespa secret add voyage_api_key --value "pa-xxxxx..." ++ +
See the reference +for all configuration parameters including caching, retry logic, and performance tuning.
+ +
+ VoyageAI offers several embedding models optimized for different use cases.
+ The resulting tensor type can be float or
+ bfloat16 for storage efficiency.
+
Latest general-purpose models (recommended):
+tensor<float>(x[1024]) - latest and best quality, state-of-the-art for most applicationstensor<float>(x[512]) - newest lite model, excellent quality at lower cost and faster speedPrevious generation general-purpose models:
+tensor<float>(x[1024]) - high quality (use voyage-3.5 for best results)tensor<float>(x[512]) - cost-efficient (use voyage-3.5-lite for better performance)Specialized models:
+tensor<float>(x[1024]) - optimized for code search and technical contenttensor<float>(x[1024]) - optimized for financial documentstensor<float>(x[1024]) - optimized for legal documentstensor<float>(x[1024]) - supports 100+ languagesContextual model:
+tensor<float>(x[1024]) (configurable: 256, 512, 1024, 2048) -
+ contextualized embeddings for document chunks with surrounding context awarenessMultimodal model (preview):
+tensor<float>(x[1024]) (configurable: 256, 512, 1024, 2048) -
+ multimodal embeddings for text, images, and video in a shared vector spaceVoyageAI models distinguish between query and document embeddings for improved retrieval quality. +The embedder automatically detects the context and sets the appropriate input type:
+embed()You can disable auto-detection and set a fixed input type:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ false
+ query
+
+{% endhighlight %}
+
+The VoyageAI embedder includes several performance optimizations:
+max-idle-connections (default: 5).max-retries (default: 10).normalize set to true.Example with performance tuning:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ 20
+ true
+
+{% endhighlight %}
+
+Complete example showing document indexing and query-time embedding:
+ +Schema definition:
+
+schema doc {
+ document doc {
+ field text type string {
+ indexing: summary | index
+ }
+ }
+
+ field embedding type tensor<float>(x[1024]) {
+ indexing: input text | embed voyage | attribute | index
+ attribute {
+ distance-metric: angular
+ }
+ }
+
+ rank-profile semantic {
+ inputs {
+ query(q) tensor<float>(x[1024])
+ }
+ first-phase {
+ expression: closeness(field, embedding)
+ }
+ }
+}
+
+
+Query with embedding:
+{% highlight bash %}
+vespa query \
+ 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,q)' \
+ 'input.query(q)=embed(voyage, "machine learning tutorials")'
+{% endhighlight %}
+
+When using normalize set to true, use
+distance-metric: prenormalized-angular
+for more efficient similarity computation.
Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:
diff --git a/en/reference/rag/embedding.html b/en/reference/rag/embedding.html index 0bf9d40a98..3dcfdba557 100644 --- a/en/reference/rag/embedding.html +++ b/en/reference/rag/embedding.html @@ -478,6 +478,199 @@+ An embedder that uses the VoyageAI API + to generate embeddings. This is an API-based embedder that does not require local model files or ONNX inference. + It calls the VoyageAI service to generate high-quality embeddings optimized for semantic search. +
+
+ The VoyageAI embedder is configured in services.xml,
+ within the container tag:
+
{% highlight xml %}
+
+
+ voyage-3.5
+ voyage_api_key
+
+
+{% endhighlight %}
+
++ The VoyageAI API key must be stored in Vespa's + secret store for secure management: +
++vespa secret add voyage_api_key --value "pa-xxxxx..." ++
+ The api-key-secret-ref parameter references the secret name.
+ Secrets are automatically refreshed when rotated without requiring application restart.
+
| Name | +Occurrence | +Description | +Type | +Default | +
|---|---|---|---|---|
| api-key-secret-ref | +One | +Required. Reference to the secret in Vespa's secret store containing the VoyageAI API key. | +string | +N/A | +
| model | +One | +The VoyageAI model to use. Available models:
+
|
+ string | +voyage-3.5 | +
| endpoint | +One | +VoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints. | +string | +https://api.voyageai.com/v1/embeddings | +
| timeout | +One | +Request timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms. | +numeric | +30000 | +
| max-retries | +One | +Maximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound. | +numeric | +10 | +
| default-input-type | +One | +Default input type when auto-detection is disabled. Valid values: query or document. VoyageAI models use different optimizations for queries vs documents. |
+ enum | +document | +
| auto-detect-input-type | +One | +Whether to automatically detect input type based on context. When enabled, uses query type for query-time embeddings and document type for indexing. |
+ boolean | +true | +
| normalize | +One | +Whether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with prenormalized-angular distance-metric for efficient similarity computation. |
+ boolean | +false | +
| truncate | +One | +Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail. | +boolean | +true | +
| max-idle-connections | +One | +Maximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources. | +numeric | +5 | +
Basic configuration (recommended):
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+
+{% endhighlight %}
+
+High-performance configuration:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ 20
+ 60000
+
+{% endhighlight %}
+
+Fast and cost-efficient configuration:
+{% highlight xml %}
+
+ voyage-3.5-lite
+ voyage_api_key
+
+{% endhighlight %}
+
+Query-optimized configuration:
+{% highlight xml %}
+
+ voyage-3.5
+ voyage_api_key
+ query
+ false
+ true
+
+{% endhighlight %}
+
+Code search configuration:
+{% highlight xml %}
+
+ voyage-code-3
+ voyage_api_key
+ true
+
+{% endhighlight %}
+
+The VoyageAI embedder includes several features to reduce API costs and improve performance:
+max-idle-connections (default: 5).max-retries (default: 10).voyage-3.5-lite for cost-sensitive applications (512 dimensions vs 1024 dimensions reduces costs while maintaining excellent quality). For best quality, use voyage-3.5.For detailed performance monitoring, the embedder emits standard Vespa embedder metrics + (see Container Metrics). + Monitor API usage and costs through the VoyageAI dashboard.
+ + +The Huggingface tokenizer embedder is configured in services.xml,