Real-time multilingual translation platform powered by AI and Computer Vision
Features • Architecture • Quick Start • Tech Stack • Team
Transearly is an innovative AI-powered translation platform that combines advanced computer vision with natural language processing to provide seamless, real-time translation services. The platform supports multiple translation modes including text, voice, image, and document translation.
- 📸 Image Translation with OCR - Detect and translate text in images using Google Cloud Vision API
- 🎤 Voice Translation - Real-time speech-to-text translation
- 📄 Document Translation - Support for PDF, DOCX, XLSX, PPTX, CSV formats
- 💬 Text Translation - Fast and accurate text translation across 8+ languages
- 📱 Mobile-First Design - Beautiful React Native mobile application
- ⚡ Real-time Processing - WebSocket integration for live translation updates
- 🎨 Interactive UI - Bounding box detection with tap-to-translate popups
- Smart OCR Detection - Uses Google Cloud Vision API for accurate text detection
- Interactive Bounding Boxes - Tap on detected text regions to view translations
- Multi-language Support - Automatically detects and translates text in images
- Paragraph Grouping - Intelligently groups text into meaningful segments
- Real-time Recording - Record and translate voice input
- Speech-to-Text - Google Cloud Speech API for accurate transcription
- Audio File Support - MP3, WAV, FLAC, OGG, WEBM formats
- Auto Translation - Automatically translates transcribed text
- Multiple Languages - Support for 8+ languages
- File Format Support - PDF, DOCX, XLSX, PPTX, CSV
- Batch Processing - Queue-based translation for large documents
- Progress Tracking - Real-time translation progress via WebSocket
- Format Preservation - Maintains original document formatting
- Instant Translation - Fast text-to-text translation
- Context-Aware - Preserves semantic meaning
- Copy to Clipboard - Easy result sharing
┌─────────────────────────────────────────────────────────────┐
│ TRANSEARLY PLATFORM │
└─────────────────────────────────────────────────────────────┘
┌──────────────────────┐ ┌──────────────────────────┐
│ Mobile App │ │ Backend API │
│ (React Native) │◄───────►│ (NestJS) │
│ │ REST │ │
│ • Camera Screen │ WebSocket • Translation Service │
│ • Voice Recording │ │ • Queue Processing │
│ • Text Input │ │ • Google Vision OCR │
│ • File Upload │ │ • AI Translation │
└──────────────────────┘ └──────────────────────────┘
│
┌──────────────────────┼──────────────────┐
│ │ │
┌─────▼──────┐ ┌───────▼──────┐ ┌──────▼─────┐
│ Google │ │ OpenRouter │ │ Redis │
│ Cloud │ │ (Gemini) │ │ Queue │
│ Vision │ │ │ │ │
└────────────┘ └──────────────┘ └────────────┘
transearly/
├── transearly-api/ # Backend NestJS API Server
│ ├── src/
│ │ ├── modules/
│ │ │ └── translator/ # Translation services & controllers
│ │ ├── main.ts
│ │ └── app.module.ts
│ └── package.json
│
└── mobile-app/ # React Native Mobile Application
├── src/
│ ├── screens/ # App screens
│ ├── components/ # Reusable components
│ ├── services/ # API integration
│ └── navigation/ # Navigation setup
└── package.json
- Node.js 18+
- npm or yarn
- Expo CLI (for mobile app)
- Google Cloud Vision API credentials
- OpenRouter API key
git clone https://github.com/your-org/transearly.git
cd transearlycd transearly-api
npm install
# Create .env file
cp .env.example .env
# Add your credentials to .env
GOOGLE_APPLICATION_CREDENTIALS=./google-cloud-key.json
OPENROUTER_API_KEY=your_openrouter_key
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1/chat/completions
OPENROUTER_MODEL=google/gemini-2.0-flash-exp:free
# Start the server
npm run start:devThe API will run on http://localhost:5010
cd mobile-app
npm install
# Create .env file
cp .env.example .env
# Update API_URL in src/config/api.config.js
# For local development: http://YOUR_LOCAL_IP:5010
# Start the app
npm start
# or use tunnel mode
npm run tunnel- Go to Google Cloud Console
- Create a new project or select existing one
- Enable the following APIs:
- Cloud Vision API (for image OCR)
- Cloud Speech-to-Text API (for audio transcription)
- Create a Service Account and download JSON key
- Save the key as
google-cloud-key.jsonintransearly-api/
- Framework: NestJS (Node.js)
- Queue: Bull + Redis
- OCR: Google Cloud Vision API
- Speech-to-Text: Google Cloud Speech API
- AI Translation: OpenRouter (Gemini Flash)
- WebSocket: Socket.io
- Document Processing: pdf-lib, mammoth, exceljs, pptxgenjs
- Framework: React Native + Expo
- UI Components: React Native core components
- Navigation: React Navigation
- HTTP Client: Axios
- Real-time: Socket.io Client
- Media: Expo Camera, Expo Audio
- Google Cloud Vision API - Text detection and OCR
- Google Cloud Speech API - Speech-to-Text transcription
- OpenRouter - AI translation (Gemini Flash model)
- Redis - Queue management
🇻🇳 Vietnamese | 🇬🇧 English | 🇪🇸 Spanish | 🇫🇷 French 🇩🇪 German | 🇯🇵 Japanese | 🇰🇷 Korean | 🇨🇳 Chinese
┌─────────────────────────────────┐
│ [<] Image Translation [🇻🇳] │
├─────────────────────────────────┤
│ │
│ ┌──────────────────┐ │
│ │ 祝通宝玉萃院... │◄─ Tap │
│ └──────────────────┘ │
│ │
│ ┌─────────────────────────┐ │
│ │ Translation │ │
│ │ Original: 祝通宝玉... │ │
│ │ Translated: Chúc bạn... │ │
│ └─────────────────────────┘ │
│ │
│ [📷 Translate] │
└─────────────────────────────────┘
POST /translator/text # Translate text
POST /translator/image # Translate image with OCR
POST /translator/audio # Translate audio/voice (Speech-to-Text + Translation)
POST /translator/upload # Upload document for translation
GET /translator/status/:jobId # Check translation job status
GET /translator/download/:fileName # Download translated document
{
"success": true,
"targetLanguage": "Vietnamese",
"segments": [
{
"position": {
"x": 15.5,
"y": 20.3,
"width": 30.2,
"height": 5.1
},
"original": "祝 通 宝 玉 萃 院 之 旅 充 满 快 乐",
"translated": "Chúc chuyến đi đến học viện Ngọc Bảo Tụy đầy niềm vui"
}
]
}{
"success": true,
"originalText": "Hello, how are you today?",
"translatedText": "Xin chào, hôm nay bạn khỏe không?",
"audioDetails": {
"duration": 3,
"language": "en-US"
}
}
Trương Nguyễn Tiến Đạt Full-stack Developer |
Nguyễn Minh Thắng Full-stack Developer |
Nguyễn Bá Trung Nguyên Backend Developer |
Nguyễn Hữu Anh Tuấn Backend Developer |
✅ Text Translation ✅ Image Translation with OCR ✅ Voice Recording & Translation (Speech-to-Text + AI Translation) ✅ Audio File Support (MP3, WAV, FLAC, OGG, WEBM) ✅ Document Translation (PDF, DOCX, XLSX, PPTX, CSV) ✅ Real-time WebSocket Updates ✅ Interactive Bounding Box UI 🚧 Multi-language Audio Output 🚧 Offline Translation Mode 🚧 Translation History & Favorites
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Cloud Vision API for powerful OCR capabilities
- OpenRouter for AI translation services
- NestJS for the robust backend framework
- Expo for streamlined React Native development
For questions or support, please open an issue or contact the team.