A comprehensive web application that combines AI-powered logo generation, 3D model creation, video generation, image editing, enterprise-grade image quality scoring, and PowerPoint presentation creation using Azure OpenAI services, Azure AI Vision, and Hugging Face Spaces. Create professional logos, generate 3D models from images, edit images with AI, analyze multiple images, and generate compelling presentations with professional-grade quality assessment - all in one platform.
- Copy the example environment file:
copy .env.example .env.local- Edit
.env.localand add your Azure OpenAI credentials:ation, image editing, and PowerPoint presentation creation using Azure OpenAI services and Hugging Face Spaces. Create professional logos, generate 3D models from images, edit images with AI, analyze multiple images, and generate compelling presentations - all in one platform.
- Uses Azure OpenAI's GPT Image API (gpt-image-1)
- Generate multiple images at once (configurable 1-10 images)
- Professional, business-ready results
- One-click individual downloads
- NEW: Generate 3D models from images using Hugging Face Spaces (frogleo/Image-to-3D)
- Upload any image to create 3D mesh files (GLB/OBJ formats)
- Interactive 3D preview in the browser
- Download generated 3D models for use in other applications
- Authenticated Hugging Face token support for enhanced performance
- Timing reports for generation duration tracking
- NEW: Generate videos using Azure OpenAI's Sora model
- Create short video clips (3-15 seconds) from text descriptions
- Multiple resolution options (480x480, 1280x720, 1920x1080)
- Realistic and imaginative video scenes
- Download generated videos in MP4 format
- Professional video quality with AI-powered content creation
- Batch Editing: Edit multiple images simultaneously
- Iterative Editing: Sequential editing workflow with history tracking
- Mask-Based Editing: Precise editing with custom masks
- Canvas Mask Editor: Create masks with brush/eraser tools
- β¨ Prompt Enhancement System: Structured prompt building with comprehensive categories
- Product Details: Material, color, and finish specifications
- Background & Surface: Environment and surface selection
- Lighting & Camera: Professional photography settings
- Style Options: Photorealistic, minimalistic, lifestyle, and artistic styles
- Smart Prompt Generation: Structured format for better AI understanding
- Product Integrity Protection: Maintains original product structure and proportions
- Undo/redo functionality with edit history
- Multi-image upload and analysis using GPT-4o Vision
- Intelligent slide content generation
- Editable JSON content for customization
- PowerPoint file generation and download
- Copy-formatted content for API usage
- Multi-Method Analysis: Three complementary scoring approaches for comprehensive evaluation
- Azure AI Vision Integration: Professional image captioning with confidence scores
- GPT-4o Vision Analysis: Advanced AI-powered detailed image descriptions (25-word limit)
- Azure Computer Vision Multimodal Embeddings: Direct image-text comparison in shared vector space
- Multi-Model Embedding Comparison: Side-by-side analysis across Ada-002, 3-Small, and 3-Large models
- Realistic Similarity Guidelines: Updated thresholds - Extremely Similar (85-100%), Very Similar (70-85%), Moderate (50-70%), Somewhat Related (30-50%), Very Different (0-30%)
- Advanced Discrimination: 3072-dimensional embeddings for nuanced semantic understanding
- Token Usage Tracking: Cost monitoring across all AI model interactions
- Professional Accuracy: Production-ready scoring with comprehensive debugging and validation
- Responsive Design: Perfect on desktop and mobile
- Intuitive Navigation: Shared navigation across all features
- Accessibility: WCAG 2.1 AA compliance features
- Real-time Feedback: Loading states and progress indicators
- Error Handling: User-friendly error messages
Before running this application, you need:
- Azure Services:
- Azure OpenAI Resource: For GPT Image, GPT-4o, Sora, and text embeddings
- Azure AI Vision Resource: For enterprise-grade image captioning
- Sora Model Access: For video generation (requires preview access)
- Model Deployments:
- Deploy the
gpt-image-1model for image generation/editing - Deploy the
gpt-4omodel for vision and analysis - Deploy the
soramodel for video generation - Deploy the
text-embedding-ada-002model for semantic similarity
- Deploy the
- API Access:
- GPT Image API access (limited - apply here)
- GPT-4o Vision API access
- Sora Model API access (requires preview access from Azure OpenAI)
- Azure AI Vision API access for Computer Vision services
- Hugging Face account and token (recommended for 3D generation)
- PowerPoint Generator Service (Required for PowerPoint features):
- The PowerPoint generation functionality requires a separate service
- Download and run the PowerPoint service from: https://github.com/ahemavathy/Powerpoint
- By default, the service should run on
http://localhost:5000 - Configure the base URL in your
.env.localfile if running on a different port
The platform consists of six main sections:
Generate professional logos, images, 3D models, and videos from descriptions or uploaded images
Upload images, analyze with GPT-4o, and create PowerPoint presentations
Upload and edit multiple images simultaneously with AI, featuring comprehensive prompt enhancement system with structured categories for product details, environment, lighting, and style options
Advanced editing workflow with history tracking and mask support
Create precise editing masks with canvas-based tools
Comprehensive image-prompt alignment evaluation using three scoring methods: Azure AI Vision captioning, GPT-4o detailed analysis, and direct multimodal embeddings with comparative analysis across multiple embedding models (Ada-002, 3-Small, 3-Large)
# Clone the repository
git clone <your-repo-url>
cd GPTImage
# Install dependencies
npm install- Copy the example environment file:
copy .env.example .env.local- Edit
.env.localand add your Azure OpenAI credentials:
# Azure OpenAI GPT Image API (for logo generation and editing)
AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com
AZURE_OPENAI_API_KEY=your_gpt_image_api_key_here
AZURE_OPENAI_DEPLOYMENT_NAME=your_gpt_image_deployment_name
# Azure OpenAI GPT-4o API (for image analysis and presentation generation)
AZURE_OPENAI_GPT4O_ENDPOINT=https://your-gpt4o-resource.openai.azure.com
AZURE_OPENAI_GPT4O_API_KEY=your_gpt4o_api_key_here
AZURE_OPENAI_GPT4O_DEPLOYMENT_NAME=your_gpt4o_deployment_name
# Azure AI Vision (Computer Vision) for image quality scoring
AZURE_AI_VISION_ENDPOINT=https://your-computer-vision-resource.cognitiveservices.azure.com/
AZURE_AI_VISION_API_KEY=your_computer_vision_api_key_here
# Azure OpenAI Embeddings (for comprehensive similarity analysis)
# Note: Uses same AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY as above
# Requires deployments: text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
# GPT-4o System Prompt (configurable)
AZURE_OPENAI_GPT4O_SYSTEM_PROMPT="Your system prompt for presentation generation..."
# PowerPoint Generator Service (if running on different port)
POWERPOINT_API_BASE_URL=http://localhost:5000
# Azure OpenAI Sora API (for video generation - separate resource)
AZURE_OPENAI_SORA_ENDPOINT=https://your-sora-resource.openai.azure.com
AZURE_OPENAI_SORA_API_KEY=your_sora_api_key_here
AZURE_OPENAI_SORA_DEPLOYMENT_NAME=your_sora_deployment_name
# Hugging Face Token (for 3D model generation)
HUGGINGFACE_HUB_TOKEN=your_hugging_face_token_hereImportant: The PowerPoint generation features require a separate service to be running.
- Download the PowerPoint Service:
git clone https://github.com/ahemavathy/Powerpoint.git
cd Powerpoint-
Follow the setup instructions in the PowerPoint repository to install dependencies and run the service
-
Start the PowerPoint Service (typically runs on
http://localhost:5000) -
Verify the service is running by visiting
http://localhost:5000in your browser
Note: If you run the PowerPoint service on a different port, update the POWERPOINT_API_BASE_URL in your .env.local file accordingly.
# Development mode
npm run dev
# Production build
npm run build
npm startThe application will be available at http://localhost:3000
Note: The image scoring feature requires Python dependencies. If you encounter scoring errors, ensure you've installed the requirements: pip install -r requirements.txt
- Navigate to Home: The main page (
/) for content generation - Choose Content Type: Toggle between "Generate Images", "Generate 3D Model", and "Generate Video (Sora)"
- Enter Business Description: Describe your business with details about:
- Type of business and industry
- Preferred colors and style
- Target audience
- Set Image Count: Choose how many logo variations to generate (1-10)
- Generate: Click "Generate Logo" and wait for AI processing
- Download: Download individual logos or all at once
- Upload Image: Select an image file to convert into a 3D model
- Generate: Click "Generate 3D Model" and wait for processing (1-3 minutes)
- Preview: View the interactive 3D model in your browser
- Download: Download the generated 3D model files (GLB/OBJ formats)
- Enter Video Concept: Describe your video scene in detail:
- Action or scene you want to see
- Setting and environment
- Style and mood
- Set Parameters:
- Duration: Choose video length (3-15 seconds)
- Resolution: Select from 480x480, 1280x720, or 1920x1080
- Generate: Click "Generate Video with Sora" and wait for processing (can take several minutes)
- Preview & Download: Watch the generated video and download the MP4 file
- Navigate to Analyze: Go to
/analyzepage - Upload Images: Drag and drop multiple images (supports various formats)
- Edit Guidelines: Modify or use the pre-filled presentation guidelines
- Generate Content: Click "Generate ppt slide content" for AI analysis
- Edit JSON: Customize the generated slide content in the editable pane
- Create PowerPoint: Click "Generate PowerPoint" to create and download PPT file
- Copy for API: Use the formatted content for external API calls
- Navigate to Edit: Go to
/editpage - Upload Images: Select one or multiple images to edit
- Add Mask (Optional): Upload a mask image for precise editing areas (collapsible section)
- Describe Changes: Enter detailed editing instructions
- β¨ Enhance Prompt (Optional): Use the comprehensive prompt enhancement system:
- Product Details: Select material (ceramic, glass, metal), color, and finish options
- Background & Surface: Choose surfaces (wooden table, marble countertop) and backgrounds
- Lighting: Pick from soft ambient, studio lighting, natural sunlight options
- Camera Angle: Select viewpoint (top-down, front view, 45Β° angle, macro shot)
- Style: Choose photorealistic, minimalistic, lifestyle, or artistic styles
- Preview & Apply: See enhanced prompt preview before applying
- Set Output Count: Choose how many edited versions to generate
- Process: Click "Edit Image" and wait for AI processing
- Download: Save individual or all edited images
- Navigate to Iterative: Go to
/iterativepage - Upload Base Image: Start with your initial image
- Sequential Edits: Make progressive changes with each edit building on the previous
- Use History: View edit history and undo/redo changes
- Apply Masks: Upload and apply masks for precise control
- Track Progress: See your editing journey on the canvas
- Navigate to Mask Editor: Go to
/maskpage - Upload Image: Start with your base image
- Select Tools: Use brush (paint) or eraser tools
- Adjust Size: Modify brush size for precision
- Create Mask: Paint areas to be edited (black) or kept (white)
- Download Mask: Save your custom mask for use in editing
- Navigate to Scoring: Go to
/scoringpage - Upload Image: Select an AI-generated image to evaluate
- Enter Prompt: Input the original text prompt used to generate the image
- Score Image: Click "Score Image Quality" to analyze alignment across three methods
- View Comprehensive Results:
- Azure Vision Caption Score: Uses 3-Large model (3072D) for improved discrimination
- GPT-4o Similarity Score: Advanced AI analysis with detailed image descriptions
- MM Embedding Score: Direct multimodal comparison in shared vector space
- Embedding Model Comparison: Side-by-side analysis of Ada-002, 3-Small, and 3-Large performance
- Advanced Analytics:
- Token Usage Tracking: Monitor costs across all AI models
- Processing Time: Performance metrics for each scoring method
- Model Insights: Understand which embedding models provide better discrimination
- Realistic Interpretation: Updated similarity guidelines based on actual cosine similarity performance
- Multiple Tests: Clear and test different image-prompt combinations with comprehensive feedback
- "Modern tech startup focused on sustainable energy solutions, minimalist design, green and blue colors"
- "Luxury restaurant specializing in Italian cuisine, elegant and sophisticated style, gold and black colors"
- "Fitness gym targeting young professionals, bold and energetic design, red and black colors"
- "Eco-friendly cleaning products company, nature-inspired, green and white palette"
- "Create a compelling presentation that highlights our new air fryer's sophistication and strengthens our brand's positioning as a premium kitchen appliance"
- "Develop a professional pitch deck for our sustainable fashion startup targeting eco-conscious millennials"
- "Generate slides for a quarterly business review focusing on growth metrics and future opportunities"
- "A cat playing piano in a jazz bar with warm, moody lighting"
- "A drone flyover of a modern city skyline at sunset"
- "Close-up of coffee being poured into a cup in slow motion"
- "A person walking through a field of sunflowers on a sunny day"
- "Ocean waves crashing against rocks on a cliff during golden hour"
- "A chef preparing a gourmet dish in a professional kitchen"
- Product photos: sneakers, furniture, electronics, toys
- Character images: people, animals, cartoon characters
- Objects: vehicles, buildings, sculptures, logos
- Artwork: paintings, sketches, digital art
- Natural objects: fruits, plants, rocks, shells
Basic Prompts:
- "Change the background to a sunset with mountains"
- "Add sunglasses to the person in the photo"
- "Make the logo more colorful and vibrant"
- "Remove the background and make it transparent"
- "Add a professional business suit to the person"
- "Replace the sky with a starry night" (with mask covering sky area)
- "Change only the shirt color to red" (with mask covering shirt area)
- "Add modern architectural elements to the building facade"
Enhanced Prompts (using prompt enhancement feature):
- "glossy white ceramic blender on a marble countertop, soft ambient light, front view, minimalistic style"
- "matte black stainless steel coffee maker with plain white background, studio lighting, 45Β° angle, product photography style"
- "textured wood cutting board on a wooden table, natural sunlight, top-down view, lifestyle shot style"
- "brushed metal kitchen appliance with blurred kitchen background, golden hour light, three-quarter view, photorealistic style"
Testing Prompts with Generated Images:
- "professional business logo" β Test with corporate logo images
- "modern tech startup logo" β Evaluate with minimalist tech designs
- "luxury restaurant branding" β Score against elegant dining imagery
- "eco-friendly product design" β Assess green/sustainable themed images
- "fitness gym promotional image" β Rate energetic workout visuals
- "coffee shop atmosphere" β Score cozy cafe environment photos
- Create an Azure OpenAI resource in the Azure portal
- Deploy the
gpt-image-1model:- Go to Azure OpenAI Studio
- Navigate to Deployments
- Create a new deployment with
gpt-image-1model
- Get your endpoint and API key from the Azure portal
The application uses the Azure OpenAI Image Generation API with these parameters:
- Model:
gpt-image-1 - Size:
1024x1024 - Quality:
high - Output Format:
PNG
- Framework: Next.js 15 with App Router
- Language: TypeScript
- Styling: Tailwind CSS
- Icons: Lucide React
- APIs:
- Azure OpenAI GPT Image API (gpt-image-1)
- Azure OpenAI GPT-4o Vision API
- Azure OpenAI Sora API (video generation)
- Azure OpenAI Embeddings (Ada-002, 3-Small, 3-Large)
- Azure AI Vision (Computer Vision)
- Azure Computer Vision Multimodal Embeddings
- Hugging Face Spaces API (frogleo/Image-to-3D)
- External PowerPoint Generator API
- State Management: React Hooks
- File Handling: FormData, Buffer processing
- Accessibility: ARIA attributes, semantic HTML
GPTImage/
βββ .env.example # Environment template
βββ ARCHITECTURE.md # Detailed architecture documentation
βββ README.md # This file
βββ requirements.txt # Python dependencies for image scoring
βββ package.json # Dependencies and scripts
βββ next.config.js # Next.js configuration
βββ tailwind.config.js # Tailwind CSS configuration
βββ tsconfig.json # TypeScript configuration
βββ scripts/
β βββ working_3d_gen.py # 3D model generation script
β βββ test_api_route.py # API testing utilities
βββ .github/
β βββ copilot-instructions.md # AI assistant guidelines
βββ public/
β βββ generated-images/ # Generated logo storage
β βββ generated-videos/ # Generated video storage
β βββ generated-3d-models/ # Generated 3D model storage
β βββ edited-images/ # Edited image storage
β βββ .gitkeep # Preserve directory structure
βββ src/
βββ components/
β βββ Navigation.tsx # Shared navigation component
β βββ ImageScorer.tsx # Advanced scoring component with multi-model analysis
βββ app/
βββ globals.css # Global styles
βββ layout.tsx # Root layout
βββ page.tsx # Home - Logo generation
βββ analyze/
β βββ page.tsx # PPT generation & analysis
βββ edit/
β βββ page.tsx # Batch image editing
βββ iterative/
β βββ page.tsx # Sequential editing workflow
βββ mask/
β βββ page.tsx # Canvas-based mask editor
βββ scoring/
β βββ page.tsx # Image quality scoring interface
βββ api/
βββ generate-logo/
β βββ route.ts # Logo generation API
βββ generate-video/
β βββ route.ts # Video generation API
βββ generate-3d/
β βββ route.ts # 3D model generation API
βββ analyze-images/
β βββ route.ts # Image analysis API
βββ edit-image/
β βββ route.ts # Image editing API
βββ score-image/
βββ route.ts # Multi-method scoring API: Azure Vision, GPT-4o, Multimodal Embeddings
-
"Azure OpenAI configuration is missing"
- Ensure all environment variables are set in
.env.local - Check that both GPT Image and GPT-4o resources are active
- Verify deployment names match your Azure deployments
- Ensure all environment variables are set in
-
"Failed to generate/edit images"
- Verify your API keys and deployment names
- Check if you have access to both APIs
- Ensure your Azure subscription has sufficient credits
-
"PowerPoint generation failed"
- Service Not Running: Ensure the PowerPoint service is running on the configured port
- Check if
http://localhost:5000(or your configured URL) is accessible - Download and set up the service from: https://github.com/ahemavathy/Powerpoint
- Check if
- Port Configuration: Verify
POWERPOINT_API_BASE_URLin your.env.localmatches the running service - Image Upload Issues: Check if image upload to PowerPoint API succeeded
- JSON Format: Ensure slide content JSON is properly formatted
- Service Dependencies: Make sure the PowerPoint service has all required dependencies installed
- Service Not Running: Ensure the PowerPoint service is running on the configured port
-
"The default export is not a React Component"
- Clear Next.js cache:
rm -rf .next - Restart the development server
- Check for syntax errors in page components
- Clear Next.js cache:
-
Content Filter Errors
- Ensure descriptions don't contain inappropriate content
- Try rephrasing prompts with more professional language
- Use the prompt enhancement feature for structured, professional prompts
- Check Azure OpenAI content policy guidelines
-
Prompt Enhancement Issues
- Dropdown Visibility: All dropdown options now have improved text contrast for better readability
- Option Selection: Choose complementary options (e.g., ceramic + glossy finish + studio lighting)
- Preview Not Updating: Check that enhancement options are selected before clicking "Apply Enhanced Prompt"
- Reset Feature: Use "Reset All" to clear all enhancement selections and start over
- Slow image generation: Azure OpenAI processing time varies
- Large file uploads: Check file size limits and network speed
- Memory issues: Reduce number of images processed simultaneously
Azure OpenAI services have rate limits:
- Wait between requests if hitting rate limits
- Consider upgrading your Azure OpenAI pricing tier
- Monitor usage in Azure portal
- Clone & Install:
git clone <repo> && cd GPTImage && npm install - Configure: Copy
.env.exampleto.env.localand add your Azure credentials - PowerPoint Service: Clone and run https://github.com/ahemavathy/Powerpoint (required for presentation features)
- Run:
npm run devand openhttp://localhost:3000 - Explore:
- Generate logos on the home page
- Analyze images and create presentations on
/analyze - Edit images on
/editor/iterative - Create masks on
/mask - Score image quality on
/scoring
| Feature | Description | Page |
|---|---|---|
| Logo Generation | AI-powered logo creation from business descriptions | / |
| Video Generation | AI-powered video creation using Sora model | / |
| 3D Model Generation | Generate 3D models (GLB/OBJ) from images | / |
| Multi-Image Analysis | GPT-4o vision analysis for presentation content | /analyze |
| Batch Editing | Edit multiple images simultaneously | /edit |
| Iterative Editing | Sequential editing with history tracking | /iterative |
| Mask Editor | Create precise editing masks | /mask |
| Image Quality Scoring | Azure AI Vision + Azure OpenAI embeddings scoring | /scoring |
| PowerPoint Generation | Convert analysis to downloadable presentations | /analyze |
- π ARCHITECTURE.md: Detailed technical architecture
- π Azure OpenAI Documentation
- π― GPT Image API Access
- ποΈ Next.js Documentation
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and test thoroughly
- Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues related to:
- Application bugs: Create an issue in this repository
- Azure OpenAI: Check the Azure OpenAI documentation
- GPT Image API access: Apply here
- GPT-4o Vision API: Check Azure OpenAI service availability
If you find this project useful, please consider giving it a star! It helps others discover the project and motivates continued development.
Built with β€οΈ using Azure OpenAI, Next.js, and TypeScript