-
-
Notifications
You must be signed in to change notification settings - Fork 6.6k
🍌 feat: Gemini Image Generation Tool (Nano Banana) #10676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Refactored the credentials path to follow a consistent pattern with other Google service integrations, allowing for an environment variable override. * Updated documentation in README-GeminiNanoBanana.md to reflect the new credentials handling approach and removed references to hardcoded paths.
- Bump @google/genai package version to ^1.19.0 for improved functionality. - Refactor GeminiImageGen to createGeminiImageTool for better clarity and consistency. - Enhance manifest.json for Gemini Image Tools with updated descriptions and icon. - Add SVG icon for Gemini Image Tools. - Implement progress tracking for Gemini image generation in the UI. - Introduce new toolkit and context handling for image generation tools. This update improves the Gemini image generation capabilities and user experience.
…icon - Deleted the obsolete PNG file for Gemini image generation. - Updated the SVG icon with a new design featuring a gradient and shadow effect, enhancing visual appeal and consistency.
|
@danny-avila Corresponding Docs PR LibreChat-AI/librechat.ai#452 |
|
shouldn't it also work natively? |
|
nvm, that should invoke tools too lmao |
I was thinking about that but this would be a departure from how the project handles image tools. I organized it similar to the openai tools so the workflows stay the same for users |
|
@danny-avila with native multimodal image generation models appearing, it would be great to implement this functionality actually!
|
|
This is great that it's a tool - it can be called by other models. But yes, there's more models that can natively return text AND images (and audio?), so that would be good if it can handle that too. |
|
@danny-avila can you review this? |
- Updated .env.example to include new environment variables for Google Cloud region, service account configuration, and Gemini API key options. - Modified GeminiImageGen.js to support both user-provided API keys and Vertex AI service accounts, improving flexibility in client initialization. - Updated manifest.json to reflect changes in authentication methods for the Gemini Image Tools. - Bumped @google/genai package version to 1.19.0 in package-lock.json for compatibility with new features.
|
Correct me if I'm wrong, but looking at the PR, I get the impression that the tool will only work with the Gemini or Vertex AI API. I think it would be nice to have the option to make it work with any OpenAI-compatible API, such as OpenRouter. |
- Adjusted the return statement in getDefaultServiceKeyPath function for improved readability by formatting it across multiple lines. This change enhances code clarity without altering functionality.
Resolved conflicts: - api/package.json: Keep @google/genai (new SDK), accept dev changes - package-lock.json: Regenerated with npm install
Resolved conflicts in peerDependencies by keeping both: - @google/genai (from feature branch) - @aws-sdk/client-bedrock-runtime (from dev) Also merged transitive dependencies in package-lock.json.
|
@danny-avila |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a comprehensive Gemini Image Generation tool that integrates Google's Gemini image generation models (Nano Banana and Nano Banana Pro) into LibreChat's agent system. The implementation takes a tool-based approach due to architectural limitations in the current agents package that prevent native model support for image generation with responseModalities parameters.
Key Changes:
- Multi-authentication support (user-provided keys, admin API keys, or Vertex AI service accounts) with flexible fallback
- Complete image generation workflow including text-to-image and image-to-image with context
- Integration with all LibreChat storage strategies (local, S3, Azure, Firebase)
Reviewed changes
Copilot reviewed 17 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
api/app/clients/tools/structured/GeminiImageGen.js |
Core tool implementation with authentication, image generation, and storage handling |
packages/api/src/tools/toolkits/gemini.ts |
Tool schema definition with comprehensive parameters for aspect ratio and image size |
packages/api/src/tools/toolkits/imageContext.ts |
Reusable helper for building image context strings for tools |
packages/api/src/endpoints/google/initialize.ts |
Updated service key path to use 'api/data/auth.json' for consistency |
api/app/clients/tools/util/handleTools.js |
Tool loading integration with image context builder |
api/app/clients/tools/manifest.json |
Tool registration with flexible authentication config |
packages/api/package.json |
Added @google/genai dependency |
api/models/tx.js |
Token pricing configuration for image generation models |
packages/data-provider/src/config.ts |
Added gemini_image_gen to imageGenTools set |
client/src/components/Chat/Messages/Content/Part.tsx |
Client-side tool detection for rendering |
client/src/components/Chat/Messages/Content/Parts/OpenAIImageGen/ProgressText.tsx |
Progress display messaging for Gemini image generation |
client/public/assets/gemini_image_gen.svg |
Custom icon for the tool |
.env.example |
Comprehensive documentation for configuration options |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!fs.existsSync(userDir)) { | ||
| fs.mkdirSync(userDir, { recursive: true }); | ||
| } | ||
|
|
||
| const filePath = path.join(userDir, imageName); | ||
| fs.writeFileSync(filePath, Buffer.from(base64Data, 'base64')); |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The synchronous file system operations (fs.existsSync, fs.mkdirSync, fs.writeFileSync) can block the event loop and impact performance under load. Consider using the asynchronous equivalents (fs.promises.access, fs.promises.mkdir, fs.promises.writeFile) to avoid blocking, especially since this function is already async and the caller expects a promise.
| "@azure/identity": "^4.7.0", | ||
| "@azure/search-documents": "^12.0.0", | ||
| "@azure/storage-blob": "^12.27.0", | ||
| "@google/genai": "^1.19.0", |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package.json specifies version ^1.19.0 for @google/genai, but the package-lock.json has installed version 1.33.0. This version mismatch should be resolved by updating the package.json to match the actual installed version or by ensuring the lock file reflects the specified version constraint. Consider updating package.json to "^1.33.0" to match what's actually installed.
| */ | ||
| function getDefaultServiceKeyPath() { | ||
| return ( | ||
| process.env.GOOGLE_SERVICE_KEY_FILE || path.join(__dirname, '../../../..', 'data', 'auth.json') |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path construction using relative path segments from __dirname ('../../../../data/auth.json') is inconsistent with the updated path in initialize.ts which uses 'api/data/auth.json'. Given that this file is located at api/app/clients/tools/structured/GeminiImageGen.js, the path '../../../..' would resolve to the project root, making it 'data/auth.json'. However, the updated Google endpoint uses 'api/data/auth.json'. For consistency with the main Google endpoint and to match the updated path pattern, this should be changed to use path.join(process.cwd(), 'api', 'data', 'auth.json') instead of the __dirname-based approach.
| process.env.GOOGLE_SERVICE_KEY_FILE || path.join(__dirname, '../../../..', 'data', 'auth.json') | |
| process.env.GOOGLE_SERVICE_KEY_FILE || | |
| path.join(process.cwd(), 'api', 'data', 'auth.json') |
| # Set this to enable Vertex AI and allow tool without requiring API keys | ||
| # GEMINI_VERTEX_ENABLED=true | ||
|
|
||
| # Vertex AI model for image generation (defaults to gemini-2.5-flash-image) |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states "Vertex AI model for image generation (defaults to gemini-2.5-flash-image)" but refers to it as "Nano Banana" in the PR description. The model name "gemini-2.5-flash-image" is the actual API model identifier, while "Nano Banana" appears to be an informal name. Consider clarifying this in the comment to avoid confusion, perhaps: "Model for image generation (defaults to gemini-2.5-flash-image, known as Nano Banana)".
| # Vertex AI model for image generation (defaults to gemini-2.5-flash-image) | |
| # Vertex AI model for image generation (defaults to gemini-2.5-flash-image, known as "Nano Banana") |
… resolution - Changed the default service key path to use process.cwd() for better compatibility. - Replaced synchronous file system operations with asynchronous promises for mkdir and writeFile, enhancing performance and error handling. - Added error handling for credential file access to prevent crashes when the file does not exist.
- Refactored API key checks to improve clarity and consistency. - Removed redundant checks for user-provided keys, enhancing code readability. - Ensured proper logging for API key usage across different configurations.
- Added a check to ensure imageSize is only applied if the gemini model does not include 'gemini-2.5-flash-image', improving compatibility. - Enhanced the logic for setting imageConfig to prevent potential issues with unsupported configurations.
- Simplified the handling of API keys by removing redundant checks for user-provided keys. - Updated logging to reflect the new priority order for API key usage, enhancing clarity and consistency. - Improved code readability by consolidating key retrieval logic.
|
Made a few necessary changes to get this ready for merge:
I was hesitant to merge this since LC is moving away from native tools (especially AI provider API wrappers) in favor of MCP, but I am only merging given the amount of work already put into this by others. Maintaining this tool will not be high priority. |
|
Also worth mentioning: |
Do you have any plans for multi-modal image generation? |
Yes but not a high priority feature right now. It would be better to allow multi-modal generation via chat instead of tool, but at least the tool extends "nano banana" to other models. |
|
Appreciate it man! |
Thanks, and I agree its not a great template for future tools but it fits the weird place we are in with all the SDKaos going on right now. Ideally native multimodal input and output models should be supported like other models. I took a shot at implementing but ended up deep in the agents package and decided to bring this home instead. |
|
Does it respect GOOGLE_REVERSE_PROXY in .env? |
* Added fully functioning Agent Tool supporting Google's Nano Banana * 🔧 refactor: Update Google credentials handling in GeminiImageGen.js * Refactored the credentials path to follow a consistent pattern with other Google service integrations, allowing for an environment variable override. * Updated documentation in README-GeminiNanoBanana.md to reflect the new credentials handling approach and removed references to hardcoded paths. * 🛠️ refactor: Remove unnecessary whitespace in handleTools.js * 🔧 feat: Update Gemini Image Generation Tool - Bump @google/genai package version to ^1.19.0 for improved functionality. - Refactor GeminiImageGen to createGeminiImageTool for better clarity and consistency. - Enhance manifest.json for Gemini Image Tools with updated descriptions and icon. - Add SVG icon for Gemini Image Tools. - Implement progress tracking for Gemini image generation in the UI. - Introduce new toolkit and context handling for image generation tools. This update improves the Gemini image generation capabilities and user experience. * 🗑️ chore: Remove outdated Gemini image generation PNG and update SVG icon - Deleted the obsolete PNG file for Gemini image generation. - Updated the SVG icon with a new design featuring a gradient and shadow effect, enhancing visual appeal and consistency. * fix: ESLint formatting and unused variable in GeminiImageGen * fix: Update default model to gemini-2.5-flash-image * ✨ feat: Enhance Gemini Image Generation Configuration - Updated .env.example to include new environment variables for Google Cloud region, service account configuration, and Gemini API key options. - Modified GeminiImageGen.js to support both user-provided API keys and Vertex AI service accounts, improving flexibility in client initialization. - Updated manifest.json to reflect changes in authentication methods for the Gemini Image Tools. - Bumped @google/genai package version to 1.19.0 in package-lock.json for compatibility with new features. * 🔧 fix: Format Default Service Key Path in GeminiImageGen.js - Adjusted the return statement in getDefaultServiceKeyPath function for improved readability by formatting it across multiple lines. This change enhances code clarity without altering functionality. * ✨ feat: Enhance Gemini Image Generation with Token Usage Tracking - Added `recordTokenUsage` function to track token usage for balance management. - Integrated token recording into the image generation process. - Updated Gemini image generation tool to accept optional `aspectRatio` and `imageSize` parameters for improved image customization. - Updated token values for new Gemini models in the transaction model. - Improved documentation for image generation tool descriptions and parameters. * ✨ feat: Add new Gemini models for image generation token limits - Introduced token limits for 'gemini-3-pro-image' and 'gemini-2.5-flash-image' models. - Updated token values to enhance the Gemini image generation capabilities. * 🔧 fix: Update Google Service Key Path for Consistency in Initialization (danny-avila#11001) * 🔧 refactor: Update GeminiImageGen for improved file handling and path resolution - Changed the default service key path to use process.cwd() for better compatibility. - Replaced synchronous file system operations with asynchronous promises for mkdir and writeFile, enhancing performance and error handling. - Added error handling for credential file access to prevent crashes when the file does not exist. * 🔧 refactor: Update GeminiImageGen to streamline API key handling - Refactored API key checks to improve clarity and consistency. - Removed redundant checks for user-provided keys, enhancing code readability. - Ensured proper logging for API key usage across different configurations. * 🔧 fix: Update GeminiImageGen to handle imageSize support conditionally - Added a check to ensure imageSize is only applied if the gemini model does not include 'gemini-2.5-flash-image', improving compatibility. - Enhanced the logic for setting imageConfig to prevent potential issues with unsupported configurations. * 🔧 refactor: Simplify local storage condition in createGeminiImageTool function * 🔧 feat: Enhance image format handling in GeminiImageGen with conversion support * 🔧 refactor: Streamline API key initialization in GeminiImageGen - Simplified the handling of API keys by removing redundant checks for user-provided keys. - Updated logging to reflect the new priority order for API key usage, enhancing clarity and consistency. - Improved code readability by consolidating key retrieval logic. --------- Co-authored-by: Dev Bhanushali <dev.bhanushali@hingehealth.com> Co-authored-by: Danny Avila <danny@librechat.ai>
* Added fully functioning Agent Tool supporting Google's Nano Banana * 🔧 refactor: Update Google credentials handling in GeminiImageGen.js * Refactored the credentials path to follow a consistent pattern with other Google service integrations, allowing for an environment variable override. * Updated documentation in README-GeminiNanoBanana.md to reflect the new credentials handling approach and removed references to hardcoded paths. * 🛠️ refactor: Remove unnecessary whitespace in handleTools.js * 🔧 feat: Update Gemini Image Generation Tool - Bump @google/genai package version to ^1.19.0 for improved functionality. - Refactor GeminiImageGen to createGeminiImageTool for better clarity and consistency. - Enhance manifest.json for Gemini Image Tools with updated descriptions and icon. - Add SVG icon for Gemini Image Tools. - Implement progress tracking for Gemini image generation in the UI. - Introduce new toolkit and context handling for image generation tools. This update improves the Gemini image generation capabilities and user experience. * 🗑️ chore: Remove outdated Gemini image generation PNG and update SVG icon - Deleted the obsolete PNG file for Gemini image generation. - Updated the SVG icon with a new design featuring a gradient and shadow effect, enhancing visual appeal and consistency. * fix: ESLint formatting and unused variable in GeminiImageGen * fix: Update default model to gemini-2.5-flash-image * ✨ feat: Enhance Gemini Image Generation Configuration - Updated .env.example to include new environment variables for Google Cloud region, service account configuration, and Gemini API key options. - Modified GeminiImageGen.js to support both user-provided API keys and Vertex AI service accounts, improving flexibility in client initialization. - Updated manifest.json to reflect changes in authentication methods for the Gemini Image Tools. - Bumped @google/genai package version to 1.19.0 in package-lock.json for compatibility with new features. * 🔧 fix: Format Default Service Key Path in GeminiImageGen.js - Adjusted the return statement in getDefaultServiceKeyPath function for improved readability by formatting it across multiple lines. This change enhances code clarity without altering functionality. * ✨ feat: Enhance Gemini Image Generation with Token Usage Tracking - Added `recordTokenUsage` function to track token usage for balance management. - Integrated token recording into the image generation process. - Updated Gemini image generation tool to accept optional `aspectRatio` and `imageSize` parameters for improved image customization. - Updated token values for new Gemini models in the transaction model. - Improved documentation for image generation tool descriptions and parameters. * ✨ feat: Add new Gemini models for image generation token limits - Introduced token limits for 'gemini-3-pro-image' and 'gemini-2.5-flash-image' models. - Updated token values to enhance the Gemini image generation capabilities. * 🔧 fix: Update Google Service Key Path for Consistency in Initialization (danny-avila#11001) * 🔧 refactor: Update GeminiImageGen for improved file handling and path resolution - Changed the default service key path to use process.cwd() for better compatibility. - Replaced synchronous file system operations with asynchronous promises for mkdir and writeFile, enhancing performance and error handling. - Added error handling for credential file access to prevent crashes when the file does not exist. * 🔧 refactor: Update GeminiImageGen to streamline API key handling - Refactored API key checks to improve clarity and consistency. - Removed redundant checks for user-provided keys, enhancing code readability. - Ensured proper logging for API key usage across different configurations. * 🔧 fix: Update GeminiImageGen to handle imageSize support conditionally - Added a check to ensure imageSize is only applied if the gemini model does not include 'gemini-2.5-flash-image', improving compatibility. - Enhanced the logic for setting imageConfig to prevent potential issues with unsupported configurations. * 🔧 refactor: Simplify local storage condition in createGeminiImageTool function * 🔧 feat: Enhance image format handling in GeminiImageGen with conversion support * 🔧 refactor: Streamline API key initialization in GeminiImageGen - Simplified the handling of API keys by removing redundant checks for user-provided keys. - Updated logging to reflect the new priority order for API key usage, enhancing clarity and consistency. - Improved code readability by consolidating key retrieval logic. --------- Co-authored-by: Dev Bhanushali <dev.bhanushali@hingehealth.com> Co-authored-by: Danny Avila <danny@librechat.ai>
* Added fully functioning Agent Tool supporting Google's Nano Banana * 🔧 refactor: Update Google credentials handling in GeminiImageGen.js * Refactored the credentials path to follow a consistent pattern with other Google service integrations, allowing for an environment variable override. * Updated documentation in README-GeminiNanoBanana.md to reflect the new credentials handling approach and removed references to hardcoded paths. * 🛠️ refactor: Remove unnecessary whitespace in handleTools.js * 🔧 feat: Update Gemini Image Generation Tool - Bump @google/genai package version to ^1.19.0 for improved functionality. - Refactor GeminiImageGen to createGeminiImageTool for better clarity and consistency. - Enhance manifest.json for Gemini Image Tools with updated descriptions and icon. - Add SVG icon for Gemini Image Tools. - Implement progress tracking for Gemini image generation in the UI. - Introduce new toolkit and context handling for image generation tools. This update improves the Gemini image generation capabilities and user experience. * 🗑️ chore: Remove outdated Gemini image generation PNG and update SVG icon - Deleted the obsolete PNG file for Gemini image generation. - Updated the SVG icon with a new design featuring a gradient and shadow effect, enhancing visual appeal and consistency. * fix: ESLint formatting and unused variable in GeminiImageGen * fix: Update default model to gemini-2.5-flash-image * ✨ feat: Enhance Gemini Image Generation Configuration - Updated .env.example to include new environment variables for Google Cloud region, service account configuration, and Gemini API key options. - Modified GeminiImageGen.js to support both user-provided API keys and Vertex AI service accounts, improving flexibility in client initialization. - Updated manifest.json to reflect changes in authentication methods for the Gemini Image Tools. - Bumped @google/genai package version to 1.19.0 in package-lock.json for compatibility with new features. * 🔧 fix: Format Default Service Key Path in GeminiImageGen.js - Adjusted the return statement in getDefaultServiceKeyPath function for improved readability by formatting it across multiple lines. This change enhances code clarity without altering functionality. * ✨ feat: Enhance Gemini Image Generation with Token Usage Tracking - Added `recordTokenUsage` function to track token usage for balance management. - Integrated token recording into the image generation process. - Updated Gemini image generation tool to accept optional `aspectRatio` and `imageSize` parameters for improved image customization. - Updated token values for new Gemini models in the transaction model. - Improved documentation for image generation tool descriptions and parameters. * ✨ feat: Add new Gemini models for image generation token limits - Introduced token limits for 'gemini-3-pro-image' and 'gemini-2.5-flash-image' models. - Updated token values to enhance the Gemini image generation capabilities. * 🔧 fix: Update Google Service Key Path for Consistency in Initialization (danny-avila#11001) * 🔧 refactor: Update GeminiImageGen for improved file handling and path resolution - Changed the default service key path to use process.cwd() for better compatibility. - Replaced synchronous file system operations with asynchronous promises for mkdir and writeFile, enhancing performance and error handling. - Added error handling for credential file access to prevent crashes when the file does not exist. * 🔧 refactor: Update GeminiImageGen to streamline API key handling - Refactored API key checks to improve clarity and consistency. - Removed redundant checks for user-provided keys, enhancing code readability. - Ensured proper logging for API key usage across different configurations. * 🔧 fix: Update GeminiImageGen to handle imageSize support conditionally - Added a check to ensure imageSize is only applied if the gemini model does not include 'gemini-2.5-flash-image', improving compatibility. - Enhanced the logic for setting imageConfig to prevent potential issues with unsupported configurations. * 🔧 refactor: Simplify local storage condition in createGeminiImageTool function * 🔧 feat: Enhance image format handling in GeminiImageGen with conversion support * 🔧 refactor: Streamline API key initialization in GeminiImageGen - Simplified the handling of API keys by removing redundant checks for user-provided keys. - Updated logging to reflect the new priority order for API key usage, enhancing clarity and consistency. - Improved code readability by consolidating key retrieval logic. --------- Co-authored-by: Dev Bhanushali <dev.bhanushali@hingehealth.com> Co-authored-by: Danny Avila <danny@librechat.ai>
|
Why it always generate multiple images in response to a single user request? Even if we ask it to only generate one image per one request in the agent's prompt, it still generates multiple images. |
I do not have this issue |
This also happened to me when, in the configuration for the Agent, I set the "Model" to |


Summary
This PR adds a comprehensive Gemini Image Generation tool for LibreChat Agents with flexible authentication supporting both Gemini API and Google Cloud Vertex AI.
Key Features:
gemini-2.5-flash-image) and Nano Banana Pro (gemini-3-pro-image)loadServiceKeypattern as main Google endpoint and Anthropic Vertex AIWhy a Tool-Based Approach?
Gemini's image generation models (
gemini-2.5-flash-image,gemini-3-pro-image-preview, etc.) require special API parameters to return images:Architectural Constraint
LibreChat's current chat architecture routes all endpoint requests (including Google) through the unified agents system (
/api/agents/chat/:endpoint), which uses the@librechat/agentspackage with LangChain.The limitation:
@librechat/agentspackage'sCustomChatGoogleGenerativeAIclass creates the Google client with a hardcodedgenerationConfigthat doesn't includeresponseModalities@langchain/google-genaipackage doesn't currently expose this parameterresponseModalities: ['TEXT', 'IMAGE'], Gemini returns text descriptions instead of actual imagesWhy native model selection won't work (yet):
The Tool Approach Works
This tool uses the
@google/genaiSDK directly (not LangChain), which:responseModalities: ['TEXT', 'IMAGE']Future Native Support
Native support for Gemini image models as selectable endpoints would require:
@librechat/agentsto supportresponseModalitiesinCustomChatGoogleGenerativeAIA separate feature request has been opened for the
@librechat/agentspackage.danny-avila/agents#41
Configuration Options
Authentication Priority:
GEMINI_API_KEYenv var (admin-configured)GOOGLE_KEYenv var (shared with Google chat endpoint)This allows:
Related:
feat/anthropic-vertex-aibranchChange Type
Testing
Tested locally with multiple authentication configurations:
Vertex AI Configuration:
GEMINI_VERTEX_ENABLED=truegemini-2.5-flash-image)API Key Configuration:
GEMINI_API_KEYenv varGOOGLE_KEYenv varGeneral Testing:
Test Configuration:
Checklist
.env.example)