ReturnZero Speech-to-Text Kotlin Client

This project demonstrates how to use ReturnZero's Speech-to-Text API with Kotlin using gRPC for streaming audio recognition.

Download Audio Example

YOUTUBE_ID=''
yt-dlp --audio-quality 0 --audio-format wav  --extract-audio https://www.youtube.com/watch\?v\=${YOUTUBE_ID} -o ${YOUTUBE_ID}.wav

Prerequisites

JDK 17 or higher
Gradle (for local execution)
Docker (for containerized execution)
RTZR AI API credentials (client ID and client secret)
Audio file for testing (WAV, AU, or AIFF format)

Project Structure

├── Dockerfile                         # Docker configuration
├── build.gradle.kts                   # Gradle build configuration
├── settings.gradle.kts                # Gradle settings
├── run-local.sh                       # Script to run locally with Gradle
├── build-and-run.sh                   # Script to build and run with Docker
├── src/
│   ├── main/
│   │   ├── kotlin/
│   │   │   └── ai/
│   │   │       └── returnzero/
│   │   │           ├── Main.kt                # Main application
│   │   │           ├── ReturnZeroClient.kt    # API client
│   │   │           └── FileStreamer.kt        # Audio file streaming utility
│   │   └── proto/
│   │       └── vito-stt-client.proto          # gRPC protocol definition

Running the Application

You can run this application in two ways:

1. Local Execution (run-local.sh)

Run directly on your local machine using Gradle.

Set your API credentials as environment variables:

export RTZR_CLIENT_ID="your_client_id"
export RTZR_CLIENT_SECRET="your_client_secret"

Make the script executable:
```
chmod +x run-local.sh
```

Run the application with an audio file:

# Normal mode (play once)
./run-local.sh /path/to/your/audio/file.wav

# Repeat mode (infinite loop)
./run-local.sh --repeat /path/to/your/audio/file.wav

2. Docker Execution (build-and-run.sh)

Build and run in a Docker container.

Set your API credentials as environment variables:

export RTZR_CLIENT_ID="your_client_id"
export RTZR_CLIENT_SECRET="your_client_secret"

Make the script executable:
```
chmod +x build-and-run.sh
```

Run the application with an audio file:

# Normal mode (play once)
./build-and-run.sh /path/to/your/audio/file.wav

# Repeat mode (infinite loop)
./build-and-run.sh --repeat /path/to/your/audio/file.wav

This script will:

Build a Docker image for the application
Mount the directory containing your audio file
Run the container with your API credentials
Process the audio file and output the transcription results

Command Line Options

The application supports the following command line options:

--repeat: Enable infinite loop mode for audio streaming. The audio file will be repeated continuously until the application is terminated.

How It Works

The application authenticates with RTZR STT API to obtain an access token
It establishes a gRPC connection to the streaming STT service
The audio file is read and streamed in chunks, simulating real-time audio
The API returns both interim and final transcription results
Results are printed to the console as they are received

Customizing

Speech Recognition Parameters

To modify the speech recognition parameters, edit the DecoderConfig in Main.kt:

val config = DecoderConfig.newBuilder()
    .setSampleRate(8000)                       // Audio sample rate (Hz)
    .setEncoding(DecoderConfig.AudioEncoding.LINEAR16)  // Audio encoding
    .setUseItn(true)                          // Inverse text normalization
    .setUseDisfluencyFilter(false)            // Filter disfluencies
    .setUseProfanityFilter(false)             // Filter profanity
    .setModelName("sommers_ko")               // Language model to use
    .build()

Keyword Boosting

You can enhance recognition for specific keywords by adding them to the configuration:

.addAllKeywords(listOf(
    "keyword1",          // Default score: 2.0
    "keyword2:3.5",      // Higher score: 3.5 (better recognition)
    "keyword3:-1"        // Lower score: -1 (reduced recognition)
))

Notes for keyword boosting:

Scores must be between -5.0 and 5.0
Korean keywords must use Korean pronunciation (e.g., "에스티티" instead of "STT")
Each keyword must be max 20 characters and you can add up to 100 keywords

Troubleshooting

Authentication Error: Verify your client ID and client secret are correct.
File Format Error: Ensure your audio file is in one of the supported formats.
Docker Issues: Make sure Docker is installed and running correctly.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReturnZero Speech-to-Text Kotlin Client

Download Audio Example

Prerequisites

Project Structure

Running the Application

1. Local Execution (run-local.sh)

2. Docker Execution (build-and-run.sh)

Command Line Options

How It Works

Customizing

Speech Recognition Parameters

Keyword Boosting

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
build-and-run.sh		build-and-run.sh
build.gradle.kts		build.gradle.kts
run-local.sh		run-local.sh
settings.gradle.kts		settings.gradle.kts

vito-ai/kotlin-sample

Folders and files

Latest commit

History

Repository files navigation

ReturnZero Speech-to-Text Kotlin Client

Download Audio Example

Prerequisites

Project Structure

Running the Application

1. Local Execution (run-local.sh)

2. Docker Execution (build-and-run.sh)

Command Line Options

How It Works

Customizing

Speech Recognition Parameters

Keyword Boosting

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages