A real-time speech translation application with facial tracking that displays translations in a 3D speech bubble following your mouth movements.
Polyglot Mirror combines speech recognition, translation, and face tracking technologies to create an interactive language learning and communication tool. Speak in English, and watch as your words are translated in real-time into one of 11 supported languages, displayed in a speech bubble that moves with your face.
- Real-time Speech Recognition: Browser-based Web Speech API captures your voice
- Instant Translation: Google Cloud Translation API translates to 11 languages (500K characters/month free)
- Face Tracking: MediaPipe detects and tracks facial landmarks in real-time
- 3D Speech Bubble: React Three Fiber renders a speech bubble that follows your mouth
- Low Latency: Optimized pipeline for minimal delay between speech and translation
- Frontend: Next.js 16, React 19, TypeScript
- 3D Graphics: React Three Fiber, Three.js, GSAP
- Face Tracking: MediaPipe Tasks Vision
- Speech Recognition: Web Speech API (browser-based)
- Translation: Google Cloud Translation API
- Styling: Tailwind CSS
- Camera Capture: MediaPipe accesses your webcam and detects facial landmarks
- Speech Input: Web Speech API listens to your microphone and converts speech to text
- Translation: Text is sent to Google Cloud Translation API
- Display: Translated text appears in a 3D speech bubble that follows your mouth position
- Real-time Updates: The bubble moves smoothly as you speak and move your head
Webcam → MediaPipe Face Detection → Mouth Coordinates
↓
Microphone → Web Speech API → Text → Translation API → Translated Text
↓
React Three Fiber Speech Bubble
polyglot-mirror/
├── src/
│ └── app/
│ ├── page.tsx # Entry point
│ ├── PolyglotMirror.tsx # Main application component
│ ├── Experience.tsx # 3D scene setup (camera, lighting)
│ ├── SpeechBubble.tsx # 3D speech bubble component
│ ├── hooks/
│ │ ├── useSpeechRecognition.ts # Speech-to-text hook
│ │ └── useTranslation.ts # Translation API hook
│ ├── components/
│ │ ├── LanguageSelector.tsx # Language dropdown selector
│ │ ├── TranscriptDisplay.tsx # Shows original text
│ │ └── ControlPanel.tsx # Start/Stop listening controls
│ └── api/
│ └── translate/
│ └── route.ts # Translation API endpoint
├── public/ # Static assets
├── .env.local # Environment variables (not committed)
├── package.json # Dependencies
└── README.md # This file
The application uses a modular architecture with separation of concerns:
- Hooks: Encapsulate speech recognition and translation logic
- Components: Reusable UI elements (language selector, controls, transcript)
- API Routes: Server-side translation endpoint
- 3D Scene: React Three Fiber components for rendering
- Node.js: Version 20 or higher
- Modern Browser: Chrome, Edge, or Safari with WebRTC support
- Google Cloud Account: For Translation API (free tier: 500K characters/month)
- Webcam & Microphone: For face tracking and speech recognition
git clone https://github.com/yourusername/polyglot-mirror.git
cd polyglot-mirror
npm install- Go to Google Cloud Console
- Create a new project or select an existing one
- Enable the Cloud Translation API:
- Navigate to "APIs & Services" → "Library"
- Search for "Cloud Translation API"
- Click "Enable"
- Create an API Key:
- Go to "APIs & Services" → "Credentials"
- Click "Create Credentials" → "API Key"
- (Recommended) Click "Restrict Key" and limit it to "Cloud Translation API"
- Copy your API key
Create a .env.local file in the project root:
GOOGLE_CLOUD_API_KEY=your_api_key_hereImportant: Never commit your .env.local file to version control. It's already included in .gitignore.
npm run devOpen http://localhost:3000 in your browser.
-
Grant Permissions:
- Allow camera access when prompted (for face tracking)
- Allow microphone access when prompted (for speech recognition)
-
Select Target Language:
- Use the language dropdown to choose your desired translation language
- Default is Spanish (Español)
-
Start Listening:
- Click "Start Listening" button
- The button will turn red and say "Stop Listening"
- Begin speaking in English
-
See Translation:
- Your original text appears at the bottom of the screen
- Translated text appears in the speech bubble above your face
- The bubble follows your mouth movements in real-time
-
Stop Listening:
- Click "Stop Listening" to pause speech recognition
- Speak clearly and at a moderate pace
- Ensure good lighting for accurate face tracking
- Position your face within the camera frame
- Minimize background noise for better speech recognition
- Keep your browser tab active (Web Speech API pauses on inactive tabs)
The application supports translation into 11 languages:
| Language | Code | Language | Code |
|---|---|---|---|
| Spanish | es |
Japanese | ja |
| French | fr |
Korean | ko |
| German | de |
Chinese | zh |
| Italian | it |
Arabic | ar |
| Portuguese | pt |
Hindi | hi |
| Gujarati | gu |
To add a new language:
- Open
src/app/api/translate/route.ts - Add the language code and name to the
LANGUAGE_NAMESobject:const LANGUAGE_NAMES: Record<string, string> = { // ... existing languages ru: 'Russian', // Add new language };
- Open
src/app/components/LanguageSelector.tsx - Add the language to the options array:
<option value="ru">Russian (Русский)</option>
- Speech Bubble: Edit
src/app/SpeechBubble.tsxto customize appearance, size, or animation - Controls: Modify
src/app/components/ControlPanel.tsxfor button styling or layout - Language Selector: Update
src/app/components/LanguageSelector.tsxfor dropdown customization
- Add Text-to-Speech: Integrate Web Speech Synthesis API to read translations aloud
- Multiple Languages: Support translating from any language (not just English)
- Recording: Add ability to record and save translation sessions
- Offline Mode: Cache translations for common phrases
npm run build
npm start- Technology: Web Speech API (built into modern browsers)
- Language: English (en-US) by default
- Continuous Mode: Captures speech continuously until stopped
- Interim Results: Real-time transcription as you speak
Implementation: See src/app/hooks/useSpeechRecognition.ts:1
- Service: Google Cloud Translation API v2
- Endpoint:
https://translation.googleapis.com/language/translate/v2 - Method: REST API (no additional packages required)
- Free Tier: 500,000 characters/month
- Latency: ~200-500ms per translation
Implementation: See src/app/api/translate/route.ts:1
- Technology: MediaPipe Face Landmarker
- Model: Face detection with 478 landmarks
- Target: Mouth region (for speech bubble positioning)
- Performance: ~30-60 FPS depending on hardware
Key landmarks used:
- Upper lip center
- Lower lip center
- Mouth corners
- Library: React Three Fiber (React renderer for Three.js)
- Animation: GSAP for smooth speech bubble movement
- Scene: Orthographic camera positioned in front of face
- Lighting: Ambient + directional lights for depth
Implementation: See src/app/Experience.tsx:1 and src/app/SpeechBubble.tsx:1
- Ensure you've granted camera permissions in your browser
- Check that no other application is using the camera
- Try refreshing the page and re-granting permissions
- Verify your browser supports WebRTC (Chrome, Edge, Safari recommended)
- Grant microphone permissions when prompted
- Check browser settings:
chrome://settings/content/microphone - Ensure microphone is not muted in system settings
- Test microphone in another application to verify it works
- "Google Cloud API key not configured": Ensure
.env.localexists withGOOGLE_CLOUD_API_KEY - "Translation service error": Verify API key is valid and Cloud Translation API is enabled
- Rate limit errors: Check you haven't exceeded 500K characters/month free tier
- Network errors: Verify internet connection
- Only works in HTTPS: Ensure you're using
https://orlocalhost - Browser compatibility: Use Chrome, Edge, or Safari (Firefox has limited support)
- Language: Web Speech API requires specific language codes (currently set to en-US)
- Background tab: Keep the browser tab active (API pauses on background tabs)
- Poor lighting: Ensure adequate lighting on your face
- Face position: Keep your face centered and within camera frame
- Performance: Close other applications if tracking is laggy
- Model loading: Wait a few seconds for MediaPipe model to load on first run
- Hardware acceleration: Enable GPU acceleration in browser settings
- Close unused tabs: Free up system resources
- Update browser: Ensure you're on the latest version
- Reduce quality: Lower camera resolution in browser settings if needed
- Free Tier: 500,000 characters/month
- Pricing after free tier: $20 per 1 million characters
- Rate Limits: 1000 requests/minute
- Character Counting: Counts input characters (not output)
Estimated Usage:
- Average sentence: 50 characters
- 10,000 translations/month = ~500,000 characters
- Conclusion: Free tier is sufficient for personal use and demos
- Support translating from any language (not just English)
- Add text-to-speech to hear translations
- Implement conversation mode (two-way translation)
- Add translation history/session recording
- Support multiple faces (group conversations)
- Add gesture recognition for commands
- Implement offline mode with cached translations
- Add custom vocabulary/phrase books
- Support video recording of sessions
- Create mobile app version
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Commit your changes:
git commit -m 'Add some feature' - Push to the branch:
git push origin feature/your-feature-name - Open a Pull Request
This project is open source and available under the MIT License.
- MediaPipe: Google's open-source ML framework for face tracking
- Google Cloud Translation: Reliable, fast translation service
- Web Speech API: Browser-based speech recognition
- React Three Fiber: Declarative 3D rendering in React
- Next.js: Full-stack React framework
For questions, issues, or feedback:
- Open an issue on GitHub
- Email: your.email@example.com
Built with by [Your Name]
Status: Active Development | Version: 0.1.0 | Last Updated: January 2026