VisualVroom - ViT-based Deaf Driver Assistance Wearable App

[2025 ICT Award Korea 대학부 - 한국정보처리학회 대상]

🚗 Overview

"Hear the road, see the road." VisualVroom is an innovative wearable application that pairs smartphones and smartwatches to provide deaf drivers with real-time visual and haptic alerts for traffic sounds. Using AI-powered audio analysis, the app detects emergency vehicles, motorcycles, and car horns while determining their direction, delivering critical safety information through visual cues and vibration patterns.

✨ Key Features

🔊 Recognition and Identification of Traffic Sounds

Vehicle Type Detection: Distinguishes between sirens, motorcycles, and car horns
Directional Awareness: Uses smartphone stereo microphones to identify sound direction (left/right)
Real-time Processing: Continuous audio monitoring with instant alerts

🤟 Speech-to-Sign Language Conversion

Live Transcription: Converts speech to text using Google Speech-to-Text
Sign Language Generation: Creates sign language images using Google Gemini
Accessibility Support: Helps deaf drivers communicate with law enforcement and others

🏗️ Architecture

Audio Processing Pipeline

Audio Capture: Stereo microphones capture ambient sound
Feature Extraction:
- Generate spectrograms for frequency analysis
- Extract MFCC (Mel-Frequency Cepstral Coefficients) features
- Stitch features into a unified image representation
AI Classification: Vision Transformer (ViT) model processes the audio-visual representation
Direction Detection: Amplitude analysis determines left/right orientation
Alert Delivery: Results sent to smartwatch for haptic and visual feedback

🛠️ Technology Stack

Frontend (Android)

Language: Java
IDE: Android Studio
Framework: Android SDK (API Level 30+)
Wearable: WearOS by Google
UI Components:
- Lottie animations
- Material Design components
- ViewPager2 for tabbed interface

Backend

Framework: FastAPI (Python)
AI/ML:
- PyTorch with Vision Transformer (ViT)
- librosa for audio processing
- Whisper AI for speech-to-text
Audio Processing:
- soundfile, pydub for audio manipulation
- numpy for numerical operations
Infrastructure: Google Compute Engine

APIs & Services

Google Speech-to-Text: Audio transcription
Google Gemini: Sign language image generation
Google Wearable API: Watch communication

📱 Application Structure

Mobile App (`mobile/`)

mobile/
├── src/main/java/edu/skku/cs/visualvroomandroid/
│   ├── MainActivity.java                 # Main activity with tab navigation
│   ├── AudioRecorderFragment.java        # Sound detection interface
│   ├── SpeechToTextFragment.java         # Speech-to-sign conversion
│   ├── AudioRecorder.java               # Audio recording logic
│   ├── AudioRecordingService.java       # Background audio service
│   ├── WearNotificationService.java     # Watch communication
│   └── dto/                             # Data transfer objects
├── src/main/res/
│   ├── layout/                          # UI layouts (portrait/landscape)
│   ├── raw/                             # Lottie animation files
│   └── values/                          # App resources
└── AndroidManifest.xml                  # App permissions and services

Wear App (`wear/`)

wear/
├── src/main/java/edu/skku/cs/visualvroomandroid/presentation/
│   └── MainActivity.java                # Watch app main activity
├── src/main/res/layout/
│   └── activity_main.xml               # Watch UI layout
└── AndroidManifest.xml                 # Watch app manifest

Backend (`direction/backend/`)

backend/
└── main.py                             # FastAPI server with:
                                        # - ViT model inference
                                        # - Audio processing pipeline
                                        # - Whisper transcription
                                        # - API endpoints

🚀 Getting Started

Prerequisites

Android Studio (latest version)
Android device with API level 30+
WearOS smartwatch (optional but recommended)
Python 3.8+ (for backend development)

Installation

Frontend Setup

Clone the frontend repository:

git clone https://github.com/GDG-SKKU/VisualVroom_Android_GDG.git
cd VisualVroom_Android_GDG

Open in Android Studio and build the project
Grant required permissions:
- Microphone access
- Location access
- Notification permissions

Backend Setup

Clone the backend repository:

git clone https://github.com/GDG-SKKU/VisualVroom_Backend_GDG.git
cd VisualVroom_Backend_GDG

Install dependencies:

pip install -r requirements.txt

Run the FastAPI server:

python main.py

Usage

Sound Detection Mode:
- Launch the app and navigate to "Audio Recorder" tab
- Tap the microphone button to start continuous monitoring
- Visual alerts appear on phone, haptic feedback on watch
Speech-to-Sign Mode:
- Navigate to "Speech to Text" tab
- Tap record button and speak
- View transcribed text and generated sign language images

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
direction		direction
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisualVroom - ViT-based Deaf Driver Assistance Wearable App

🚗 Overview

✨ Key Features

🔊 Recognition and Identification of Traffic Sounds

🤟 Speech-to-Sign Language Conversion

🏗️ Architecture

Audio Processing Pipeline

🛠️ Technology Stack

Frontend (Android)

Backend

APIs & Services

📱 Application Structure

Mobile App (`mobile/`)

Wear App (`wear/`)

Backend (`direction/backend/`)

🚀 Getting Started

Prerequisites

Installation

Frontend Setup

Backend Setup

Usage

🔧 Configuration

Audio Settings

Model Parameters

💰 Cost Estimation

Google Speech-to-Text

Google Gemini

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VisualVroom - ViT-based Deaf Driver Assistance Wearable App

🚗 Overview

✨ Key Features

🔊 Recognition and Identification of Traffic Sounds

🤟 Speech-to-Sign Language Conversion

🏗️ Architecture

Audio Processing Pipeline

🛠️ Technology Stack

Frontend (Android)

Backend

APIs & Services

📱 Application Structure

Mobile App (mobile/)

Wear App (wear/)

Backend (direction/backend/)

🚀 Getting Started

Prerequisites

Installation

Frontend Setup

Backend Setup

Usage

🔧 Configuration

Audio Settings

Model Parameters

💰 Cost Estimation

Google Speech-to-Text

Google Gemini

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Mobile App (`mobile/`)

Wear App (`wear/`)

Backend (`direction/backend/`)

Packages