Skip to content

SuvroBaner/immersive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

54 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎨 Immersive Craft AI Platform

Python FastAPI Pydantic License

Bridge the gap between handmade quality and professional online presentation.

The Immersive Craft AI Platform empowers household artists to showcase their work professionally. This repository houses the backend microservices ecosystem, currently featuring the Text Service (Live) and the Image Service (In Development).


🧩 Microservices Overview

The platform is built on a scalable microservices architecture, separating lightweight CPU tasks from heavy GPU processing.

1. ✍️ Immersive Text Service (Live)

  • Status: βœ… Production Ready
  • Role: The "AI Storyteller"
  • Tech: FastAPI, Async I/O, Google Gemini (Multimodal)
  • Function: Transforms raw images and simple seller notes into evocative, SEO-ready product descriptions in milliseconds.

2. πŸ“Έ Immersive Image Service (In Development)

  • Status: πŸ—οΈ Construction in Progress
  • Role: The "AI Photographer"
  • Tech: Python, Celery, Redis, Kubernetes (GKE), KEDA
  • Function: Asynchronous GPU pipeline for professional image correction:
    • Perspective Correction
    • Shadow Removal & Lighting Enhancement
    • AI Super-Resolution (Upscaling)
    • Background Manipulation
    • Intelligent Cropping
    • Interactive 3D Model

πŸ—οΈ Architecture

The platform uses a Strategy Pattern for flexible AI providers and an asynchronous event-driven architecture for handling heavy workloads.

Text Service Workflow (Strategy Pattern)

A central Factory determines which AI Provider to instantiate based on runtime configuration, ensuring a unified interface (AIModelProvider) regardless of the underlying model.

graph LR
    Client[Client App] -->|POST Request| API[FastAPI Endpoint]
    API -->|Get Provider| Factory[Provider Factory]
    Factory -->|Read Config| Settings[Settings / .env]
    
    Factory -->|Instantiate| Strategy{Strategy Selection}
    
    Strategy -->|Default| Gemini[Gemini Provider]
    Strategy -->|Debug| Mock[Mock Provider]
    
    Gemini -->|Async Call| Google[Google Gemini API]
    Mock -->|Simulate| Local[Local Response]
Loading

Image Service Workflow (Async GPU Pipeline)

Asynchronous GPU pipeline for professional image correction

Work in Progress (stay tuned)


✨ Key Features

The Text Service, a high-performance microservice that uses Multimodal AI to transform raw images and simple seller notes into evocative, SEO-ready product descriptions.

  • 🧠 Multimodal AI: Analyzes visual cues (images) combined with text inputs using Google Gemini.
  • πŸ”Œ Provider Factory Pattern: flexible architecture allowing hot-swapping of AI backends (Gemini, OpenAI, etc.) via configuration.
  • ⚑ High Performance: Fully asynchronous (non-blocking) I/O using httpx and FastAPI.
  • πŸ›‘οΈ Robust Configuration: Type-safe settings management using Pydantic Settings with nested environment variable support.
  • πŸ§ͺ Developer Friendly: Built-in Mock Mode for zero-cost testing and rapid UI development.

The Image Service, an asynchronous GPU pipeline for professional image correction Work in Progress (stay tuned)


πŸš€ Quick Start

Prerequisites

  • Python 3.10 or higher
python3 --version
  • uv package manager (recommended) or pip

  • A Google API key for Gemini (get one from Google AI Studio)

Installation

  1. Clone the repository (if not already done)

    git clone https://github.com/SuvroBaner/immersive
    cd immersive/services/text-service
  2. Install packages & dependencies (this will automatically create and manage the virtual environment)

    # Using uv (recommended)
    uv sync
    source .venv/bin/activate  # On macOS/Linux
    # .venv\Scripts\Activate.ps1  # On Windows PowerShell

    Or using standard Python:

    python -m venv .venv
    source .venv/bin/activate
  3. Configure environment variables

    Create a .env file in services/text-service/:

    touch .env  # or use your preferred editor

    Make a copy of example.env

    # Create .env file
     cp example.env .env

    Add your configuration:

    # Default provider (gemini, mock)
    DEFAULT_PROVIDER=gemini
    
    # Enable mock mode for testing (set to true to bypass API calls)
    MOCK_MODE=false
    
    # Provider-specific settings (Gemini)
    # Note: Use double underscores (__) to nest settings
    PROVIDER_SETTINGS__GEMINI__API_KEY=your_google_api_key_here
    PROVIDER_SETTINGS__GEMINI__MODEL_NAME=gemini-2.5-flash
    
    # OPENAI (for future use/testing)
    # PROVIDER_SETTINGS__OPENAI__API_KEY=sk-yyy
    # PROVIDER_SETTINGS__OPENAI__MODEL_NAME=gpt-4o-mini

    Important: The nested configuration format (PROVIDER_SETTINGS__GEMINI__API_KEY) is required. This allows Pydantic to properly map environment variables to the nested ProviderConfig structure.

  4. Run the Server

    # From services/text-service/
    uvicorn app.main:app --reload --port 8000

    The API will be available at http://127.0.0.1:8000


πŸ“š API Documentation (Text Service)

Endpoints

GET /

Health check endpoint.

Response:

{
  "status": "ok",
  "service": "text-service"
}

GET /v1/providers

List available AI providers and their configuration status. Useful for debugging if your .env keys are being loaded correctly (without leaking the actual keys).

Response:

{
  "default_provider": "gemini",
  "mock_mode": false,
  "available_providers": ["gemini", "mock"],
  "configured_providers": {
    "gemini": {
      "model_name": "gemini-2.5-flash",
      "api_key_configured": true
    }
  }
}

POST /v1/content/generate

Generate product descriptions, titles, and structured marketing content from an image and seller inputs.

Query Parameters:

  • provider (optional): Override the default provider (e.g., ?provider=gemini)

Request Body:

{
  "image_url": "https://example.com/product-image.jpg",
  "seller_inputs": {
    "item_name": "Handcrafted Clay Pot",
    "materials": "Natural terracotta clay, white paint",
    "inspiration": "Made during the rainy season, inspired by my garden",
    "category": "Pottery"
  },
  "config": {
    "tone": "evocative",
    "language": "en-IN",
    "target_platform": "web"
  }
}

Response:

{
  "generated_content": {
    "title": "Handcrafted Clay Pot - Pottery Collection",
    "description": "This exquisite clay pot showcases the artistry...",
    "product_facts": [
      "Handcrafted pottery made with premium natural terracotta clay",
      "Unique design inspired by the rainy season garden",
      "One-of-a-kind piece, no two items are exactly alike"
    ],
    "blog_snippet_idea": "Discover the story behind this stunning clay pot..."
  },
  "ai_model_used": "gemini-2.5-flash",
  "latency_ms": 1250.5,
  "metadata": {
    "provider": "gemini"
  }
}

Example cURL:

curl -X POST "http://127.0.0.1:8000/v1/content/generate?provider=gemini" \
  -H "Content-Type: application/json" \
  -d '{
    "image_url": "https://images.unsplash.com/photo-1604264726154-26480e76f4e1",
    "seller_inputs": {
      "item_name": "Clay Pot",
      "materials": "Natural terracotta clay, white paint",
      "inspiration": "Made this during the rainy season, inspired by my garden",
      "category": "Pottery"
    },
    "config": {
      "tone": "evocative",
      "language": "en-IN",
      "target_platform": "web"
    }
  }'

πŸ—οΈ Project Structure

The repository is structured to support multiple microservices in a monorepo format.

immersive/
β”œβ”€β”€ README.md
β”œβ”€β”€ LICENSE
└── services/
    └── text-service/
        β”œβ”€β”€ app/
        β”‚   β”œβ”€β”€ main.py              # FastAPI application and routes
        β”‚   β”œβ”€β”€ models.py            # Pydantic request/response models
        β”‚   β”œβ”€β”€ config.py            # Settings and configuration
        β”‚   └── core/
        β”‚       β”œβ”€β”€ base.py          # Abstract provider interface
        β”‚       β”œβ”€β”€ factory.py       # Provider factory pattern
        β”‚       β”œβ”€β”€ providers/
        β”‚       β”‚   β”œβ”€β”€ gemini.py    # Google Gemini implementation
        β”‚       β”‚   └── mock.py      # Mock provider for testing
        β”‚       └── prompts/
        β”‚           β”œβ”€β”€ base_templates.py
        β”‚           └── provider_specific/
        β”‚               └── gemini_templates.py
        β”œβ”€β”€ pyproject.toml           # Project dependencies
        β”œβ”€β”€ Dockerfile               # Container configuration
        └── .env                     # Environment variables (create this)

πŸ”§ Developer Guide

Adding a New AI Provider (Text Service)

The platform is designed to be extensible. To add a new provider (e.g., Anthropic):

  1. Create a new provider class: Create app/core/providers/anthropic.py inheriting from AIModelProvider.
  2. Implement Interface: Implement the generate_content method.
  3. Register: Add the class to the _providers dict in app/core/factory.py.
  4. Configure dd a new ProviderConfig entry in app/config.py defaults.
  5. Add to the .env file: Add environment variable support following the PROVIDER_SETTINGS__PROVIDER__KEY pattern

Running Tests

cd services/text-service
pytest

πŸ› Troubleshooting

api_key_configured: false in /v1/providers

This means your API key isn't being loaded. Ensure:

  • Your .env file uses the nested format: PROVIDER_SETTINGS__GEMINI__API_KEY=your_key (double underscore)
  • The .env file is in services/text-service/ directory
  • The service has been restarted after adding the key

model_name: null in /v1/providers

Add the model name to your .env:

PROVIDER_SETTINGS__GEMINI__MODEL_NAME=gemini-2.5-flash

500 Internal Server Error on /v1/content/generate

Check the service logs for detailed error messages. Common issues:

  • Invalid or missing API key
  • Network issues fetching the image URL
  • Provider-specific API errors
  • Template/Prompt Error : Check logs. Usually caused by missing double braces {{ }} in JSON prompt templates.

ModuleNotFoundError

Virtual Environment Issues

  • Ensure you ran source .venv/bin/activate
  • Ensure you do uv sync

πŸ› οΈ Engineering Platform

We believe in a "Golden Path" for development. Instead of managing complex Kubernetes manifests manually, our engineering team uses Canvas, an internal developer platform that abstracts infrastructure complexity.

  • Standardization: Every microservice (Text, Image) is defined by a simple canvas.yaml blueprint.
  • Automation: The Canvas Engine handles security hardening, logging sidecars, and scaling automatically.
  • Velocity: Developers focus on code, not clusters.

πŸ‘‰ Read the full Canvas Platform Documentation


πŸ“ License

See LICENSE file for details.


🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


πŸ“§ Support

For questions or issues, please open an issue in the repository.

About

Generate an immersive content

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published