feat:Add POST endpoints for corpus size and document summary to Vectara API #43

HavenDV · 2025-02-27T01:44:19Z

Summary by CodeRabbit

New Features
- Added an endpoint to compute corpus size, returning comprehensive metrics.
- Introduced an endpoint that generates summaries for individual documents.
- Implemented additional response codes to handle duplicate resource conflicts.
- Enhanced overall query processing with intelligent rewriting and improved response details.

coderabbitai · 2025-02-27T01:44:25Z

Walkthrough

This update extends the OpenAPI specification for the Vectara REST API. Two new POST endpoints have been added: one to compute the size of a corpus by its key and another to summarize a document by its ID. The modifications include new response codes (403, 404, 409) and expanded schemas with additional properties related to query rewriting and computed corpus details. Existing request and response schemas have been updated to incorporate parameters for intelligent query rewriting.

Changes

File Path	Change Summary
`src/.../Vectara/openapi.yaml`	- Added POST `/v2/corpora/{corpus_key}/compute_size` endpoint with parameters and responses (200, 403, 404). - Added POST `/v2/corpora/{corpus_key}/documents/{document_id}/summarize` endpoint. - Added 409 Conflict response for existing corpus and document creation. - Added new schema: `ComputeCorpusSizeResponse`. - Updated `QueryRequest`, `ChatRequest`, `QueryFullResponse`, and `StreamSearchResponse` schemas with `intelligent_query_rewriting` & `rewritten_queries`.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant A as Vectara API
    participant D as Database/Processing

    C->>A: POST /v2/corpora/{corpus_key}/compute_size
    A->>D: Retrieve corpus size details (documents, parts, chars)
    D-->>A: Return size data
    A-->>C: Response (200/403/404/409)

    C->>A: POST /v2/corpora/{corpus_key}/documents/{document_id}/summarize
    A->>D: Process document summarization
    D-->>A: Return summary data
    A-->>C: Response (200/403/404/409)

Poem

I'm a rabbit in the code, hopping by,
New endpoints bloom, reaching for the sky.
Summarize and compute, oh what a sight, 🐇
With schemas enriched and responses bright.
Bounce into our API world, delight!
Code flows sweetly, day and night.
Happy hops and cheers to the new light!

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d99e99e and 489ea0e.

⛔ Files ignored due to path filters (101)

src/libs/Vectara/Generated/JsonConverters.CreateLLMRequest.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.CreateLLMRequestDiscriminatorType.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.CreateLLMRequestDiscriminatorTypeNullable.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.QueryHistorySpan.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.RemoteAuth.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.RemoteAuthDiscriminatorType.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.RemoteAuthDiscriminatorTypeNullable.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.RewrittenQueryWarning.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.RewrittenQueryWarningNullable.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.SummarizeDocumentStreamedResponse.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.SummarizeDocumentStreamedResponseDiscriminatorType.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonConverters.SummarizeDocumentStreamedResponseDiscriminatorTypeNullable.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonSerializerContext.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/JsonSerializerContextTypes.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.APIKeysClient.ListApiKeys.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.ChatsClient.CreateChat.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.ChatsClient.CreateChatTurn.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.CorporaClient.ComputeCorpusSize.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.CorporaClient.CreateCorpus.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.CorporaClient.UpdateCorpus.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.DocumentsClient.SummarizeCorpusDocument.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.IAPIKeysClient.ListApiKeys.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.IChatsClient.CreateChat.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.IChatsClient.CreateChatTurn.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.ICorporaClient.ComputeCorpusSize.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.ICorporaClient.CreateCorpus.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.ICorporaClient.UpdateCorpus.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.IDocumentsClient.SummarizeCorpusDocument.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.ILargeLanguageModelsClient.CreateLLM.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.IQueriesClient.Query.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.IQueriesClient.QueryCorpus.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.IQueriesClient.SearchCorpus.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.IUsersClient.CreateUser.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.IUsersClient.ResetUserPassword.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.IndexClient.CreateCorpusDocument.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.LargeLanguageModelsClient.CreateLLM.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.BearerAuth.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.BearerAuth.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.ChatFullResponse.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.ChatRequest.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.ComputeCorpusSizeResponse.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.ComputeCorpusSizeResponse.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.Corpus.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateCorpusRequest.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateLLMRequest.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateLLMRequest.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateLLMRequestDiscriminator.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateLLMRequestDiscriminator.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateLLMRequestDiscriminatorType.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateOpenAILLMRequest.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateOpenAILLMRequest.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateOpenAILLMRequestTestModelParameters.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateOpenAILLMRequestTestModelParameters.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateUserResponse2.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.CreateUserResponse2.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.FilterExtraction.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.FilterExtraction.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.GenerationParameters.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.GenerationParametersModelParameters.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.GenerationPreset.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.HeaderAuth.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.HeaderAuth.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.KeyedSearchCorpusVariant2.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.QueryCorpusRequest.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.QueryFullResponse.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.QueryHistorySpan.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.QueryHistorySpanDiscriminatorType.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.QueryRequest.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.QueryWarning.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RemoteAuth.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RemoteAuth.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RemoteAuthDiscriminator.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RemoteAuthDiscriminator.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RemoteAuthDiscriminatorType.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.ResetUserPasswordResponse.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.ResetUserPasswordResponse.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RewrittenQuery.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RewrittenQuery.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RewrittenQuerySpan.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RewrittenQuerySpan.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RewrittenQuerySpanWarnings.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RewrittenQuerySpanWarnings.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.RewrittenQueryWarning.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.StreamSearchResponse.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentRequest.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentRequest.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentRequestModelParameters.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentRequestModelParameters.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentResponse.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentResponse.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentStreamedResponse.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentStreamedResponse.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentStreamedResponseDiscriminator.Json.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentStreamedResponseDiscriminator.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.SummarizeDocumentStreamedResponseDiscriminatorType.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.Models.UpdateCorpusRequest.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.QueriesClient.Query.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.QueriesClient.QueryCorpus.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.QueriesClient.SearchCorpus.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.UsersClient.CreateUser.g.cs is excluded by !**/generated/**
src/libs/Vectara/Generated/Vectara.UsersClient.ResetUserPassword.g.cs is excluded by !**/generated/**

📒 Files selected for processing (1)

src/libs/Vectara/openapi.yaml (33 hunks)

🔇 Additional comments (15)

src/libs/Vectara/openapi.yaml (15)

52-57: New Response Code '409' Added for Corpus Creation
A new response block with a 409 status code has been added to the Create Corpus endpoint. This appropriately handles the case where a corpus with the same key already exists. Ensure that client applications and backend error handlers are updated accordingly.

274-308: New Endpoint for Computing Corpus Size
A new POST endpoint (/v2/corpora/{corpus_key}/compute_size) has been introduced to return detailed metrics (documents, parts, characters, and metadata characters) for a corpus. The documentation and schema referencing look thorough. Please verify that the backend implementation supports these fields and that integration tests cover this endpoint.

348-352: Updated Query Parameters: Save History & Intelligent Query Rewriting
The changes add a proper schema for the save_history parameter and enrich the description of intelligent_query_rewriting in the query endpoint. These adjustments make the API self-descriptive and consistent. Double-check that the behavior for these boolean options is uniformly implemented across both query and chat endpoints.

578-583: New Response Code '409' for Document Addition
A 409 response has been added to the Add Document endpoint to signal that the document already exists. This improves error signaling for clients when attempting duplicate document uploads.

3547-3550: Intelligent Query Rewriting in ChatRequest
Within the ChatRequest schema, the intelligent_query_rewriting parameter has been added with a clear description and default value. This enhancement parallels the query endpoint and provides clients with control over query rewriting for chat interactions.

4356-4375: ComputeCorpusSizeResponse Schema Addition
The new ComputeCorpusSizeResponse schema defines metrics such as the number of documents, parts, total characters, and metadata characters. This comprehensive schema will help consumers understand corpus usage. Verify that numerical units (e.g., int64) and error scenarios are documented consistently in the implementation.

2841-2845: Inclusion of Rewritten Queries in QueryFullResponse
The addition of the rewritten_queries array within the QueryFullResponse schema provides valuable insight when intelligent query rewriting is enabled. This feature can help clients debug and improve query outcomes.

3597-3601: Rewritten Queries in ChatFullResponse
Similarly, the ChatFullResponse now exposes a rewritten_queries field, which is beneficial for transparency in chat interactions when the rewriting feature is active. Ensure that downstream processing makes use of this field as intended.

4301-4325: RewrittenQuerySpan Schema Update
The updated RewrittenQuerySpan schema now details properties such as the corpus key, latency, warnings, and filter extraction. This granularity is useful for tracing how queries are modified internally. Confirm that the discriminator and overall structure are in sync with the rewriting engine's behavior.

4327-4334: Expanded RewrittenQueryWarning Enum
The enum for rewritten query warnings now explicitly lists possible issues like no_filter_attrs, extracted_empty_filter, and others. This should help clients handle and log specific rewriting failures. Make sure these values match the error reporting from the backend.

4442-4460: SummarizeDocumentRequest Schema
The SummarizeDocumentRequest schema now requires llm_name and includes fields for prompt_template, optional model parameters, and a streaming option. This clear structure will support flexible summarization requests. Verify that the default values and descriptions are consistent with the summarize service’s behavior.

4461-4470: SummarizeDocumentResponse Schema
The response schema for document summarization is concise, returning the generated summary and the rendered prompt. It is recommended to ensure that any error handling specific to document summarization is addressed separately in the API documentation or through additional response codes.

4470-4487: SummarizeDocumentStreamedResponse for Streaming Summaries
The streamed response for document summarization is structured using a oneOf construct to handle different event types (e.g., generation chunks, end events, errors). Ensure that the discriminator property and mapping are correctly implemented so that clients can seamlessly process streamed events.

3421-3429: CreateLLMRequest Schema Update via OneOf
The CreateLLMRequest now leverages a oneOf construct to support different LLM types (e.g., OpenAI-compatible models). This design promotes flexibility. Please confirm that the discriminator is correctly set and that consuming services can correctly interpret the provided model details.

3431-3465: CreateOpenAILLMRequest Schema Enhancements
The CreateOpenAILLMRequest schema has been enhanced to require fields such as type, name, model, and uri (with format validations) as well as to include authentication details via RemoteAuth. These improvements ensure that all critical configurations for OpenAI-compatible LLMs are captured.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

feat: Updated OpenAPI spec

489ea0e

github-actions bot approved these changes Feb 27, 2025

View reviewed changes

github-actions bot merged commit bea516d into main Feb 27, 2025
3 of 4 checks passed

coderabbitai bot changed the title ~~feat:@coderabbitai~~ feat:Add POST endpoints for corpus size and document summary to Vectara API Feb 27, 2025

coderabbitai bot mentioned this pull request Mar 27, 2025

feat:Update Vectara REST API OpenAPI spec with new endpoints and pagination #48

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat:Add POST endpoints for corpus size and document summary to Vectara API #43

feat:Add POST endpoints for corpus size and document summary to Vectara API #43

Uh oh!

HavenDV commented Feb 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 27, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat:Add POST endpoints for corpus size and document summary to Vectara API #43

feat:Add POST endpoints for corpus size and document summary to Vectara API #43

Uh oh!

Conversation

HavenDV commented Feb 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HavenDV commented Feb 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 27, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)