Vision QA Server

Description

Vision QA Server is a robust backend application designed to "see" and understand images to answer natural language questions. Built with FastAPI, it leverages the Google Cloud Vision API to extract comprehensive metadata (objects, text, faces, colors, landmarks) and uses a custom internal heuristic engine to interpret user questions and provide context-aware answers.

Unlike simple API wrappers, this project implements a QuestionAnalyzer that categorizes intent (e.g., counting objects, identifying colors, reading text) to synthesize human-like responses. It supports both stateless REST calls and stateful WebSocket connections for real-time applications.

Features

Smart Question Analysis: Automatically categorizes questions into types such as count, identify, read_text, color, location, and yes_no to generate relevant answers.
Comprehensive Image Analysis: Detects objects, labels, text (OCR), faces (with emotion), landmarks, logos, and dominant colors.
Real-Time WebSockets: Full WebSocket support for continuous interaction, including connection management and live status updates.
Cloud Native: deeply integrated with Google Cloud Platform:
- Vision API: For core image intelligence.
- Cloud Storage: Automatically uploads and hosts analyzed images.
- Secret Manager: Securely manages credentials for production deployments.
Production Ready: Includes a Dockerfile for containerization and shell scripts for streamlined deployment to Google Cloud Run.

Installation

Prerequisites

Python 3.11+
Google Cloud Platform Account with Vision API enabled.
Google Cloud SDK (gcloud) installed (for deployment).

Local Setup

Clone the repository:

git clone https://github.com/nikelroid/qa-image-server.git
cd qa-image-server

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
(Dependencies include fastapi, uvicorn, google-cloud-vision, and pydantic)
Configure Credentials: You must set the GOOGLE_CREDENTIALS_JSON environment variable or place your Service Account JSON in the root directory and reference it.
```
export GOOGLE_CREDENTIALS_JSON='{...your_service_account_json...}'
# OR
export GOOGLE_APPLICATION_CREDENTIALS="path/to/your-key.json"
```

Usage

Running Locally

Start the server using Uvicorn:

uvicorn app:app --host 0.0.0.0 --port 8080 --reload

The API will be available at http://localhost:8080.

API Endpoints

POST /analyze-image: Send a base64 encoded image and a question.

{
  "image": "base64_encoded_string_here...",
  "question": "How many cars are in this image?"
}

GET /health: Check service status and Cloud connections.
WS /ws/{client_id}: Connect via WebSocket for streamed analysis.

Deployment

The project includes a streamlined deployment script for Google Cloud Run.

Edit deploy.sh to set your PROJECT_ID, REGION, and BUCKET_NAME.
Run the script:
```
chmod +x deploy.sh
./deploy.sh
```
This script handles project configuration, Secret Manager permissions, Docker builds, and Cloud Run deployment automatically.

Contributing

Contributions are welcome!

Fork the repository.
Create a feature branch (git checkout -b feature/NewFeature).
Commit your changes.
Push to the branch.
Open a Pull Request.

License

Distributed under the Apache License, Version 2.0. See LICENSE for more information.

Contact

For support or inquiries, please open an issue in the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
check.sh		check.sh
deploy.sh		deploy.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision QA Server

Description

Features

Installation

Prerequisites

Local Setup

Usage

Running Locally

API Endpoints

Deployment

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision QA Server

Description

Features

Installation

Prerequisites

Local Setup

Usage

Running Locally

API Endpoints

Deployment

Contributing

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages