Vision QA Server is a robust backend application designed to "see" and understand images to answer natural language questions. Built with FastAPI, it leverages the Google Cloud Vision API to extract comprehensive metadata (objects, text, faces, colors, landmarks) and uses a custom internal heuristic engine to interpret user questions and provide context-aware answers.
Unlike simple API wrappers, this project implements a QuestionAnalyzer that categorizes intent (e.g., counting objects, identifying colors, reading text) to synthesize human-like responses. It supports both stateless REST calls and stateful WebSocket connections for real-time applications.
- Smart Question Analysis: Automatically categorizes questions into types such as
count,identify,read_text,color,location, andyes_noto generate relevant answers. - Comprehensive Image Analysis: Detects objects, labels, text (OCR), faces (with emotion), landmarks, logos, and dominant colors.
- Real-Time WebSockets: Full WebSocket support for continuous interaction, including connection management and live status updates.
- Cloud Native: deeply integrated with Google Cloud Platform:
- Vision API: For core image intelligence.
- Cloud Storage: Automatically uploads and hosts analyzed images.
- Secret Manager: Securely manages credentials for production deployments.
- Production Ready: Includes a Dockerfile for containerization and shell scripts for streamlined deployment to Google Cloud Run.
- Python 3.11+
- Google Cloud Platform Account with Vision API enabled.
- Google Cloud SDK (
gcloud) installed (for deployment).
-
Clone the repository:
git clone https://github.com/nikelroid/qa-image-server.git cd qa-image-server -
Create a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows use: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
(Dependencies include
fastapi,uvicorn,google-cloud-vision, andpydantic) -
Configure Credentials: You must set the
GOOGLE_CREDENTIALS_JSONenvironment variable or place your Service Account JSON in the root directory and reference it.export GOOGLE_CREDENTIALS_JSON='{...your_service_account_json...}' # OR export GOOGLE_APPLICATION_CREDENTIALS="path/to/your-key.json"
Start the server using Uvicorn:
uvicorn app:app --host 0.0.0.0 --port 8080 --reloadThe API will be available at http://localhost:8080.
- POST
/analyze-image: Send a base64 encoded image and a question.{ "image": "base64_encoded_string_here...", "question": "How many cars are in this image?" } - GET
/health: Check service status and Cloud connections. - WS
/ws/{client_id}: Connect via WebSocket for streamed analysis.
The project includes a streamlined deployment script for Google Cloud Run.
- Edit
deploy.shto set yourPROJECT_ID,REGION, andBUCKET_NAME. - Run the script:
This script handles project configuration, Secret Manager permissions, Docker builds, and Cloud Run deployment automatically.
chmod +x deploy.sh ./deploy.sh
Contributions are welcome!
- Fork the repository.
- Create a feature branch (
git checkout -b feature/NewFeature). - Commit your changes.
- Push to the branch.
- Open a Pull Request.
Distributed under the Apache License, Version 2.0. See LICENSE for more information.
For support or inquiries, please open an issue in the GitHub repository.