Skip to content

Conversation

@usnavy13
Copy link

Summary

Add support for the responseModalities parameter in Google and VertexAI LLM classes to enable native Gemini image generation models (gemini-2.5-flash-image, gemini-3-pro-image-preview, etc.) to return images alongside text.

Changes

  • Add responseModalities?: ('TEXT' | 'IMAGE' | 'AUDIO')[] to GoogleClientOptions and VertexAIClientOptions types
  • Pass responseModalities to generationConfig in CustomChatGoogleGenerativeAI constructor
  • Handle inlineData (image) parts in response processing (convertResponseContentToChatGenerationChunk and mapGenerateContentResultToChatResult), converting them to image_url content blocks with base64 data URLs
  • Add responseModalities to generation config in VertexAI CustomChatConnection.formatData

Usage

const llmConfig = {
  provider: Providers.GOOGLE,
  model: 'gemini-2.5-flash-image',
  responseModalities: ['TEXT', 'IMAGE'],
};

Tested Models

  • gemini-2.5-flash-image - Returns text + PNG image
  • gemini-3-pro-image-preview - Returns JPEG image

Related

Add support for the `responseModalities` parameter in Google and VertexAI
LLM classes to enable native Gemini image generation models to return
images alongside text.

Changes:
- Add `responseModalities` to `GoogleClientOptions` and `VertexAIClientOptions` types
- Pass `responseModalities` to `generationConfig` in `CustomChatGoogleGenerativeAI`
- Handle `inlineData` (image) parts in response processing, converting to `image_url` content blocks
- Add `responseModalities` to generation config in VertexAI `CustomChatConnection.formatData`

This enables models like `gemini-2.5-flash-image` and `gemini-3-pro-image-preview`
to generate and return images when `responseModalities: ['TEXT', 'IMAGE']` is set.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant