Skip to content

Adding VoyageAI integration#35273

Merged
bjorncs merged 12 commits intovespa-engine:masterfrom
voyage-ai:voyageai_integration
Nov 24, 2025
Merged

Adding VoyageAI integration#35273
bjorncs merged 12 commits intovespa-engine:masterfrom
voyage-ai:voyageai_integration

Conversation

@fzowl
Copy link
Contributor

@fzowl fzowl commented Nov 9, 2025

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

@bratseth bratseth requested a review from bjorncs November 10, 2025 11:46
Copy link
Member

@bjorncs bjorncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution, and for taking the time to improve Vespa!
Overall the changes look good and align well with our codebase.
Here are some initial feedback. We'll come back with more later.

element model { xsd:string }? &
element api-key-secret-name { xsd:string } &
element endpoint { xsd:string }? &
element max-batch-size { xsd:positiveInteger }? &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max batch size parameter is not used by the implementation. Remove?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remove the max-batch-size definition here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to move the user facing documentation to our documentation repository (https://github.com/vespa-engine/documentation, https://docs.vespa.ai/en/embedding.html). Please create a PR there after we have settled on the design.

For any implementation details just add them as (javadoc) comments to VoyageAIEmbedder.

Copy link
Contributor Author

@fzowl fzowl Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjorncs Sure, i'll check the repo and raise a PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fzowl fzowl requested a review from bjorncs November 12, 2025 17:46
@bjorncs
Copy link
Member

bjorncs commented Nov 13, 2025

@fzowl Some unit tests are failing in the config-model module.

[ERROR] Failures:
[ERROR]   VoyageAIEmbedderTest.testVoyageAIEmbedderMissingApiKey:87 expected: <true> but was: <false>
[ERROR] Errors:
[ERROR]   VoyageAIEmbedderTest.testMultipleVoyageAIEmbedders:157->getVoyageAIEmbedderConfig:181 » IllegalArgument No enum constant com.yahoo.embedding.voyageai.VoyageAiEmbedderConfig.DefaultInputType.Enum.QUERY
[ERROR]   VoyageAIEmbedderTest.testVoyageAIEmbedderWithFullConfiguration:30->getVoyageAIEmbedderConfig:181 » IllegalArgument No enum constant com.yahoo.embedding.voyageai.VoyageAiEmbedderConfig.DefaultInputType.Enum.QUERY
[INFO]
[ERROR] Tests run: 1819, Failures: 1, Errors: 2, Skipped: 5

Copy link
Member

@bjorncs bjorncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

element model { xsd:string }? &
element api-key-secret-name { xsd:string } &
element endpoint { xsd:string }? &
element max-batch-size { xsd:positiveInteger }? &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remove the max-batch-size definition here.

element normalize { xsd:boolean }? &
element truncate { xsd:boolean }? &
element pool-size { xsd:positiveInteger }? &
element cache-size { xsd:nonNegativeInteger }?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment regarding caching.

return new OkHttpClient.Builder()
.connectTimeout(Duration.ofMillis(config.timeout()))
.readTimeout(Duration.ofMillis(config.timeout()))
.writeTimeout(Duration.ofMillis(config.timeout()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify the callTimeout as well for a proper request-response timeout on the client side.

We want to later include a timeout value in the Context, so timeout can be tied to the logical operation (respect overall feed/query time). For now we'll use the timeout configured in the embedder, but later will override the call timeout per request and use the lowest of those two.

@fzowl fzowl requested a review from bjorncs November 17, 2025 12:35
@fzowl
Copy link
Contributor Author

fzowl commented Nov 17, 2025

@bjorncs, can you please double-check the PR?

Copy link
Member

@bjorncs bjorncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. The PR build fails with a compilation error:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.13.0:compile (default-compile) on project config-model: Compilation failure: Compilation failure:
--
  | [ERROR] /workspace/build/buildkite/vespaai/vespa-engine-vespa/config-model/src/main/java/com/yahoo/vespa/model/container/component/VoyageAIEmbedder.java:[167,20] cannot find symbol
  | [ERROR]   symbol:   method poolSize(java.lang.Integer)
  | [ERROR]   location: variable builder of type com.yahoo.embedding.voyageai.VoyageAiEmbedderConfig.Builder
  | [ERROR] /workspace/build/buildkite/vespaai/vespa-engine-vespa/config-model/src/main/java/com/yahoo/vespa/model/container/component/VoyageAIEmbedder.java:[170,20] cannot find symbol
  | [ERROR]   symbol:   method cacheSize(java.lang.Integer)
  | [ERROR]   location: variable builder of type com.yahoo.embedding.voyageai.VoyageAiEmbedderConfig.Builder


VoyageAIEmbedder =
attribute type { "voyage-ai-embedder" } &
element model { xsd:string }? &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your API the 'model' parameter is required. We should do the same here to avoid having the default model being defined by Vespa.

endpoint string default="https://api.voyageai.com/v1/embeddings"

# VoyageAI model name (e.g., voyage-3.5, voyage-3.5-lite, voyage-code-3, voyage-finance-2)
model string default="voyage-3.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the default value (see my comment regarding <model>).

@fzowl
Copy link
Contributor Author

fzowl commented Nov 17, 2025

@bjorncs Ah, sorry about these. I corrected the issues and updated the documentation PR as well. vespa-engine/documentation#4237

@bjorncs
Copy link
Member

bjorncs commented Nov 18, 2025

FYI some unit tests failed after making model mandatory

[ERROR] Failures:
[ERROR]   VoyageAIEmbedderTest.testVoyageAIEmbedderInvalidInputType:124 expected: <true> but was: <false>
[ERROR]   VoyageAIEmbedderTest.testVoyageAIEmbedderInvalidTimeout:105 expected: <true> but was: <false>
[ERROR] Errors:
[ERROR]   VoyageAIEmbedderTest.testMultipleVoyageAIEmbedders:148->loadModel:166 » IllegalArgument Invalid XML according to XML schema, error in services.xml: element "component" incomplete; missing required element "model" [25:21], input:
22:        <!-- VoyageAI Embedder with minimal configuration -->
23:        <component id="voyage-minimal" type="voyage-ai-embedder">
24:            <api-key-secret-ref>voyage_key</api-key-secret-ref>
25:        </component>
26:    </container>
27:</services>
[ERROR]   VoyageAIEmbedderTest.testVoyageAIEmbedderWithFullConfiguration:24->loadModel:166 » IllegalArgument Invalid XML according to XML schema, error in services.xml: element "component" incomplete; missing required element "model" [25:21], input:
22:        <!-- VoyageAI Embedder with minimal configuration -->
23:        <component id="voyage-minimal" type="voyage-ai-embedder">
24:            <api-key-secret-ref>voyage_key</api-key-secret-ref>
25:        </component>
26:    </container>
27:</services>
[ERROR]   VoyageAIEmbedderTest.testVoyageAIEmbedderWithMinimalConfiguration:48->loadModel:166 » IllegalArgument Invalid XML according to XML schema, error in services.xml: element "component" incomplete; missing required element "model" [25:21], input:
22:        <!-- VoyageAI Embedder with minimal configuration -->
23:        <component id="voyage-minimal" type="voyage-ai-embedder">
24:            <api-key-secret-ref>voyage_key</api-key-secret-ref>
25:        </component>
26:    </container>
27:</services>

@fzowl
Copy link
Contributor Author

fzowl commented Nov 18, 2025

@bjorncs Should be fine now.

@fzowl fzowl requested a review from bjorncs November 18, 2025 10:58
@bjorncs bjorncs merged commit a7552a0 into vespa-engine:master Nov 24, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants