Skip to content

Archillesjakins/Health-Transcribe-main

Repository files navigation

Health Transcribe

A web-based application that enables real-time, multilingual translation between patients and healthcare providers. This application converts spoken input into text, provides a live transcript, and offers a translated version with audio playback.

Features

  • Voice-to-Text: Uses browser-based Web Speech API for transcription, with optional Google Speech-to-Text API enhancement
  • Medical Terminology Enhancement: Uses Google Gemini to improve medical term accuracy
  • Real-Time Translation: Translates text between languages using Google Cloud Translation API (with Gemini as fallback)
  • Audio Playback: "Speak" button for audio playback of translated text
  • Mobile-First Design: Responsive and optimized for both mobile and desktop use
  • Dual Transcript Display: Shows both original and translated transcripts in real-time
  • Language Selection: Allows users to choose input and output languages

Tech Stack

  • Backend: FastAPI
  • Frontend: React with Next.js
  • Speech Recognition: Browser-based Web Speech API (with optional Google Speech-to-Text API)
  • Translation: Google Cloud Translation API (with Gemini as fallback)
  • AI Enhancement: Google Gemini API
  • Deployment: Google cloud, Docker (with options for Google Cloud Run and other platforms)

API Requirements

The application is designed to work with minimal API requirements:

  • Required: Google Gemini API (for medical term enhancement and fallback translation)
  • Optional: Google Cloud Translation API (for better translation quality)
  • Optional: Google Cloud Speech-to-Text API (for better transcription quality)

If the optional APIs are not configured, the application will fall back to browser-based speech recognition and Gemini-based translation.

Google Cloud API Setup

Setting up Google Cloud Translation API

  1. Create a Google Cloud Project:

    • Go to the Google Cloud Console
    • Create a new project or select an existing one
    • Note your Project ID
  2. Enable the Translation API:

    • In the Google Cloud Console, go to "APIs & Services" > "Library"
    • Search for "Cloud Translation API"
    • Click on it and press "Enable"
  3. Create Service Account Credentials:

    • Go to "APIs & Services" > "Credentials"
    • Click "Create Credentials" > "Service Account"
    • Fill in the service account details and grant it the "Cloud Translation API User" role
    • Create a key for this service account (JSON format)
    • Download the JSON key file and rename it to api.json
  4. Place the api.json file: The application will look for your credentials file in these locations (in order):

    • The path specified in the GOOGLE_APPLICATION_CREDENTIALS environment variable
    • Current directory (api.json)
    • Backend directory (backend/api.json)
    • Backend app directory (backend/app/api.json)

    Using the setup script (recommended): We've provided a setup script that will help you place the credentials file in the correct location:

    ./setup_google_credentials.sh
    

    This script will:

    • Ask for the location of your api.json file if not found in the current directory
    • Copy it to the backend directory
    • Set the GOOGLE_APPLICATION_CREDENTIALS environment variable in your shell configuration
  5. Verify Your Credentials: Run the test script to verify your credentials are working:

    cd backend
    python -m app.test_credentials
    
  6. Select Translation Provider in the UI:

    • In the application, use the "Translation Quality" dropdown to select:
      • "Google Translate API" for the best translation quality
      • "Gemini API" for enhanced context in medical translations
      • "Auto" to use Google if available, otherwise Gemini

Setting up Google Cloud Speech-to-Text API (Optional)

  1. Enable the Speech-to-Text API:

    • In the Google Cloud Console, go to "APIs & Services" > "Library"
    • Search for "Speech-to-Text API"
    • Click on it and press "Enable"
  2. Use the same service account credentials as for the Translation API

Setup Instructions

Prerequisites

  • Python 3.9+
  • Node.js 16+
  • Google Gemini API key (required)
  • Google Cloud account with Translation API enabled (optional but recommended)
  • Google Cloud account with Speech-to-Text API enabled (optional)

Backend Setup

  1. Navigate to the backend directory:

    cd backend
    
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Create a .env file with your API keys:

    # Required
    GEMINI_API_KEY=your_gemini_api_key
    
    # Optional (for Google Cloud APIs)
    GOOGLE_APPLICATION_CREDENTIALS=path/to/your/credentials.json
    
  5. Run the server:

    uvicorn app.main:app --reload --port 6000
    

Frontend Setup

  1. Navigate to the frontend directory:

    cd frontend
    
  2. Install dependencies:

    npm install
    
  3. Create a .env.local file with your API endpoint:

    NEXT_PUBLIC_API_URL=http://localhost:8000*
    
  4. Run the development server:

    npm run dev
    
  5. Open http://localhost:3000 in your browser

Using the Application Without Google Cloud APIs

The application is designed to work even without Google Cloud APIs:

  1. Speech Recognition: Uses the browser's built-in Web Speech API
  2. Translation: Uses Google Gemini as a fallback translator
  3. Language Support: Provides a default set of common languages

While the application works without Google Cloud APIs, using them will provide:

  • Better transcription accuracy, especially for medical terms
  • More reliable translation
  • Support for more languages

Deployment

The application is configured for deployment on Vercel abd Google cloud. You can deploy both the frontend and backend to Google run using the provided scripts.

This script will guide you through deploying:

  • Backend only
  • Frontend only
  • Both backend and frontend

Manual Deployment

If you prefer to deploy manually:

Backend Deployment

  1. Navigate to the backend directory:

    cd backend
    
  2. Deploy to Vercel:

    vercel --prod
    
  3. Set up environment variables in the Vercel project settings:

    • GEMINI_API_KEY: Your Google Gemini API key
    • GOOGLE_CREDENTIALS_JSON: The JSON string generated by the prepare_vercel_credentials.py script

For more reliable and consistent deployment, we've added Docker support to the application. This allows you to containerize both the frontend and backend, making deployment more predictable across different environments.

Docker Setup

  1. Make sure you have Docker and Docker Compose installed on your machine.

  2. For local development with Docker:

    # Start both frontend and backend
    docker-compose up

License

MIT

About

AI web-based application that enables real-time, multilingual translation using Gemini and Google API for more contextual translation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors