A full-stack web application that leverages cutting-edge AI models for image analysis and generation. Built with FastAPI (backend) and Next.js (frontend).
- Caption Generation - Automatically generates descriptive captions for images
- Visual Question Answering (VQA) - Ask questions about images and get AI-powered answers
- Object Detection - Identifies and lists objects present in images
- Text-to-Image - Generate images from text descriptions
- Image Variation - Create variations of existing images with custom modifications
Backend:
- FastAPI (Python web framework)
- SQLAlchemy (Database ORM)
- Google Gemini 2.5 Flash (Vision analysis)
- HuggingFace Stable Diffusion XL (Image generation)
- AWS S3 (Image storage)
- SQLite/PostgreSQL (Database)
Frontend:
- Next.js 16 (React framework)
- TypeScript
- Tailwind CSS
- App Router
- Python 3.11+
- Node.js 18+
- AWS Account (S3 bucket)
- Google AI Studio API Key
- HuggingFace API Token
- Navigate to backend directory:
cd backend- Install dependencies:
uv sync- Create
.envfile:
# AI API Keys
GOOGLE_API_KEY=your_google_api_key
HUGGINGFACE_TOKEN=your_huggingface_token
# AWS S3
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=ap-southeast-1
AWS_BUCKET_NAME=your-bucket-name
# Database (optional - defaults to SQLite)
DATABASE_URL=sqlite:///./app.db
# Generation Provider (optional)
GENERATION_PROVIDER=huggingface- Start the backend server:
uv run uvicorn app.main:app --reload --port 8000Backend will be available at http://127.0.0.1:8000
- Navigate to frontend directory:
cd frontend- Install dependencies:
npm install- Create
.env.localfile:
NEXT_PUBLIC_API_URL=http://127.0.0.1:8000- Start the development server:
npm run devFrontend will be available at http://localhost:3000
Easy way to start both servers at once:
.\scripts\start-app-split.ps1chmod +x scripts/*.sh # Make scripts executable (first time only)
./scripts/start-app-split.shThis opens two terminal windows:
- Backend server on http://127.0.0.1:8000
- Frontend server on http://localhost:3000
See scripts/README.md for more options (single terminal, individual servers, etc.)
- Start both backend and frontend servers (see Quick Start above)
- Open http://localhost:3000 in your browser
- Select a feature from the home page
- Upload images or enter prompts
- View AI-generated results
┌─────────────┐ HTTP/REST ┌──────────────┐
│ Next.js │◄───────────────────►│ FastAPI │
│ Frontend │ (CORS enabled) │ Backend │
│ (Port 3000)│ │ (Port 8000) │
└─────────────┘ └──────┬───────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌─────▼──────┐ ┌─────▼──────┐ ┌─────▼──────┐
│ Google │ │ HuggingFace│ │ AWS S3 │
│ Gemini │ │ Stable │ │ Storage │
│ 2.5 Flash │ │ Diffusion │ │ │
│ (Vision) │ │ (Gen) │ │ (Presigned │
└────────────┘ └────────────┘ │ URLs) │
└─────┬──────┘
│
┌─────▼──────┐
│ SQLite/ │
│ PostgreSQL │
│ Database │
└────────────┘
Frontend Layer (Next.js)
- UI Components: Reusable React components (ImageUpload, LoadingSpinner, etc.)
- Pages: Route-specific pages for each feature (/caption, /vqa, etc.)
- API Service: TypeScript client for backend communication with type safety
- State Management: React hooks for local state and async operations
Backend Layer (FastAPI)
- API Endpoints: RESTful endpoints for analysis and generation
- AI Service: Integration layer for AI model APIs (Gemini, HuggingFace)
- Database Models: SQLAlchemy models for data persistence
- S3 Service: Image upload/storage with presigned URL generation
- Background Tasks: Async job processing for image generation
Data Flow
- User uploads image → Frontend sends to
/uploadendpoint - Backend uploads to S3 → Returns presigned URL
- User triggers analysis/generation → API calls AI services
- Results stored in database → Returned to frontend
- Frontend displays results with images from S3
Request:
curl -X POST "http://127.0.0.1:8000/api/analyze/caption" \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://your-bucket.s3.amazonaws.com/uploads/image.png"
}'Response:
{
"success": true,
"data": {
"id": 1,
"caption": "A serene mountain landscape with clouds at sunset",
"created_at": "2025-12-20T10:30:00"
}
}Request:
curl -X POST "http://127.0.0.1:8000/api/analyze/vqa" \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://your-bucket.s3.amazonaws.com/uploads/image.png",
"question": "What color is the car?"
}'Response:
{
"success": true,
"data": {
"id": 2,
"question": "What color is the car?",
"answer": "The car is red",
"created_at": "2025-12-20T10:31:00"
}
}Step 1 - Start Generation:
curl -X POST "http://127.0.0.1:8000/api/generate/text-to-image" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A futuristic city with flying cars at night"
}'Response:
{
"success": true,
"message": "Image generation job started",
"data": {
"job_id": "abc123",
"status": "pending",
"check_status_url": "/api/jobs/abc123"
}
}Step 2 - Check Status (Poll every 3-5 seconds):
curl -X GET "http://127.0.0.1:8000/api/jobs/abc123"Response (completed):
{
"success": true,
"data": {
"job_id": "abc123",
"task_type": "text_to_image",
"status": "completed",
"result_image_url": "https://your-bucket.s3.amazonaws.com/generated/result.png",
"created_at": "2025-12-20T10:32:00",
"completed_at": "2025-12-20T10:32:25"
}
}- Visit Google AI Studio
- Sign in with your Google account
- Click "Get API Key" → "Create API key"
- Copy the key and add to backend
.env:GOOGLE_API_KEY=AIzaSyXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
- Visit HuggingFace Tokens
- Sign in or create an account
- Click "New token" → Select "Read" access
- Copy the token and add to backend
.env:HUGGINGFACE_TOKEN=hf_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
- Log into AWS Console
- Navigate to IAM → Users → Create User
- Attach policy:
AmazonS3FullAccess(or custom policy) - Create Access Key → Download credentials
- Create an S3 bucket in your preferred region
- Configure CORS (see S3_CORS_SETUP.md for details)
- Add credentials to backend
.env:AWS_ACCESS_KEY_ID=AKIAXXXXXXXXXXXXXXXX AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX AWS_REGION=ap-southeast-1 AWS_BUCKET_NAME=your-bucket-name
Important:
- For image downloads to work properly, configure S3 CORS (see S3_CORS_SETUP.md)
- For production, use IAM roles and restrict S3 bucket policies appropriately
- Download feature works without CORS using fallback methods, but CORS improves UX
Multi-Modal-Image-Analysis-and-Generation-Platform/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI app & endpoints
│ │ ├── ai_service.py # AI model integration
│ │ ├── models.py # Database models
│ │ ├── database.py # Database configuration
│ │ └── s3_service.py # S3 upload handling
│ ├── requirements.txt
│ └── pyproject.toml
├── frontend/
│ ├── app/
│ │ ├── page.tsx # Home page
│ │ ├── caption/ # Caption generation page
│ │ ├── vqa/ # Visual Q&A page
│ │ ├── object-detection/ # Object detection page
│ │ ├── text-to-image/ # Text-to-image page
│ │ └── variation/ # Image variation page
│ ├── components/ # Shared React components
│ ├── lib/
│ │ └── api.ts # API service layer
│ └── package.json
└── EVALUATION.md # Testing documentation
POST /upload- Upload image to S3POST /api/analyze/caption- Generate image captionPOST /api/analyze/vqa- Visual question answeringPOST /api/analyze/object-detection- Detect objects
POST /api/generate/text-to-image- Generate image from textPOST /api/generate/variation- Create image variationGET /api/jobs/{job_id}- Check generation job status
All features have been tested and verified working. See EVALUATION.md for detailed test results and examples.
Test Results: 6/6 features passing (100%)
Demo Video: Link
To create a demo video:
- Record a 5-minute walkthrough showing all features
- Upload to YouTube, Loom, or similar platform
- Add the link above
- S3 bucket uses presigned URLs (7-day expiry)
- CORS enabled for frontend access
- API keys managed via environment variables
- Database credentials secured
This project is for educational purposes.
Akhilesh Malthi
Note: Make sure both backend and frontend servers are running for the application to work properly.