A web-based application that enables real-time, multilingual translation between patients and healthcare providers. This application converts spoken input into text, provides a live transcript, and offers a translated version with audio playback.
- Voice-to-Text: Uses browser-based Web Speech API for transcription, with optional Google Speech-to-Text API enhancement
- Medical Terminology Enhancement: Uses Google Gemini to improve medical term accuracy
- Real-Time Translation: Translates text between languages using Google Cloud Translation API (with Gemini as fallback)
- Audio Playback: "Speak" button for audio playback of translated text
- Mobile-First Design: Responsive and optimized for both mobile and desktop use
- Dual Transcript Display: Shows both original and translated transcripts in real-time
- Language Selection: Allows users to choose input and output languages
- Backend: FastAPI
- Frontend: React with Next.js
- Speech Recognition: Browser-based Web Speech API (with optional Google Speech-to-Text API)
- Translation: Google Cloud Translation API (with Gemini as fallback)
- AI Enhancement: Google Gemini API
- Deployment: Google cloud, Docker (with options for Google Cloud Run and other platforms)
The application is designed to work with minimal API requirements:
- Required: Google Gemini API (for medical term enhancement and fallback translation)
- Optional: Google Cloud Translation API (for better translation quality)
- Optional: Google Cloud Speech-to-Text API (for better transcription quality)
If the optional APIs are not configured, the application will fall back to browser-based speech recognition and Gemini-based translation.
-
Create a Google Cloud Project:
- Go to the Google Cloud Console
- Create a new project or select an existing one
- Note your Project ID
-
Enable the Translation API:
- In the Google Cloud Console, go to "APIs & Services" > "Library"
- Search for "Cloud Translation API"
- Click on it and press "Enable"
-
Create Service Account Credentials:
- Go to "APIs & Services" > "Credentials"
- Click "Create Credentials" > "Service Account"
- Fill in the service account details and grant it the "Cloud Translation API User" role
- Create a key for this service account (JSON format)
- Download the JSON key file and rename it to
api.json
-
Place the api.json file: The application will look for your credentials file in these locations (in order):
- The path specified in the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable - Current directory (
api.json) - Backend directory (
backend/api.json) - Backend app directory (
backend/app/api.json)
Using the setup script (recommended): We've provided a setup script that will help you place the credentials file in the correct location:
./setup_google_credentials.shThis script will:
- Ask for the location of your api.json file if not found in the current directory
- Copy it to the backend directory
- Set the GOOGLE_APPLICATION_CREDENTIALS environment variable in your shell configuration
- The path specified in the
-
Verify Your Credentials: Run the test script to verify your credentials are working:
cd backend python -m app.test_credentials -
Select Translation Provider in the UI:
- In the application, use the "Translation Quality" dropdown to select:
- "Google Translate API" for the best translation quality
- "Gemini API" for enhanced context in medical translations
- "Auto" to use Google if available, otherwise Gemini
- In the application, use the "Translation Quality" dropdown to select:
-
Enable the Speech-to-Text API:
- In the Google Cloud Console, go to "APIs & Services" > "Library"
- Search for "Speech-to-Text API"
- Click on it and press "Enable"
-
Use the same service account credentials as for the Translation API
- Python 3.9+
- Node.js 16+
- Google Gemini API key (required)
- Google Cloud account with Translation API enabled (optional but recommended)
- Google Cloud account with Speech-to-Text API enabled (optional)
-
Navigate to the backend directory:
cd backend -
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies:
pip install -r requirements.txt -
Create a
.envfile with your API keys:# Required GEMINI_API_KEY=your_gemini_api_key # Optional (for Google Cloud APIs) GOOGLE_APPLICATION_CREDENTIALS=path/to/your/credentials.json -
Run the server:
uvicorn app.main:app --reload --port 6000
-
Navigate to the frontend directory:
cd frontend -
Install dependencies:
npm install -
Create a
.env.localfile with your API endpoint:NEXT_PUBLIC_API_URL=http://localhost:8000* -
Run the development server:
npm run dev -
Open http://localhost:3000 in your browser
The application is designed to work even without Google Cloud APIs:
- Speech Recognition: Uses the browser's built-in Web Speech API
- Translation: Uses Google Gemini as a fallback translator
- Language Support: Provides a default set of common languages
While the application works without Google Cloud APIs, using them will provide:
- Better transcription accuracy, especially for medical terms
- More reliable translation
- Support for more languages
The application is configured for deployment on Vercel abd Google cloud. You can deploy both the frontend and backend to Google run using the provided scripts.
This script will guide you through deploying:
- Backend only
- Frontend only
- Both backend and frontend
If you prefer to deploy manually:
-
Navigate to the backend directory:
cd backend -
Deploy to Vercel:
vercel --prod -
Set up environment variables in the Vercel project settings:
GEMINI_API_KEY: Your Google Gemini API keyGOOGLE_CREDENTIALS_JSON: The JSON string generated by the prepare_vercel_credentials.py script
For more reliable and consistent deployment, we've added Docker support to the application. This allows you to containerize both the frontend and backend, making deployment more predictable across different environments.
-
Make sure you have Docker and Docker Compose installed on your machine.
-
For local development with Docker:
# Start both frontend and backend docker-compose up
MIT