SpeechToText

Overview

This script implements a speech recognition service that listens for audio input and provides recognized text through a FastAPI-based web server. It uses a speech recognition engine (default: Vosk) to continuously process speech from the microphone and updates the recognized text in real time.

The FastAPI server offers multiple endpoints for connectivity, including WebSocket support for real-time updates to connected clients.

The script runs the speech recognition loop in a separate thread, allowing it to continuously capture speech while the server handles API requests and WebSocket connections concurrently.

Instructions

Warning: Do not use with Python 3.13, as most plugins are not yet compatible. Instead, use Python 3.11 or 3.12.
I personally use 3.12.8 at the moment.

How to run

... if you have multiple python installations (and the default installation is not 3.12):

In your terminal locate the Python312 folder.
For me it's: C:\Users\Admin\AppData\Local\Programs\Python\Python312

Make sure the terminal runs on this folder!

In the terminal run: python.exe -m pip install RequiredPackageNameHere
Refer to the package section below.
In VSCode or your editor of choice open a terminal (doesn't need to be on the Python312 folder path.)
and run: py -V:3.12 .\speech_to_text.py

OR (Recommended) you can setup a venv (virtual environment) based on your python 3.12 installation.

OR you can switch the global installation by going into the windows enviroment variables and changing the entries

C:\Users\Admin\AppData\Local\Programs\Python\Python313\Scripts\

and

C:\Users\Admin\AppData\Local\Programs\Python\Python313\

in the system (variable) path to use Python312 (make sure python 3.12 is installed).

When you want to target a different Vosk model for speech recognition just change the vosk_model_path path in the corresponding config.json (example: keyword_recognition_config.json).

You can download additional Vosk models from https://alphacephei.com/vosk/models.

Required Packages

Important:
Check the requirements.txt for a detailed overview!

PyAudio
PyAudio provides Python bindings for PortAudio v19, the cross-platform audio I/O library.
soundfile
Library for reading and writing sound files.
uvicorn
Uvicorn is an ASGI web server implementation for Python.
fastapi
FastAPI is a modern, high-performance, web framework for building APIs with Python based on standard Python type hints.
Levenshtein
Levenshtein is a string metric that measures the minimum number of single-character edits (insertions, deletions, etc.). In STT the package is useful for implementing fuzzy matching to account for mispronunciations or transcription errors in keyword detection.

Depending on the model(s) you want to use, not all of these packages may be required. Select the ones you need.

vosk (Recommended!)
Vosk is an offline open source speech recognition toolkit.
tensorflow
TensorFlow is an open source software library for high performance numerical computation.
openai-whisper
- torch
  PyTorch, a deep learning framework.
- whisper (Not sure if required!)
  A text-to-speech library.
- openai-whisper
  OpenAI's Whisper model for speech-to-text transcription.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
images		images
model		model
voskModels/vosk-model-small-en-us-0.15		voskModels/vosk-model-small-en-us-0.15
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_godot_script.gd		example_godot_script.gd
keyword_recognition_config.json		keyword_recognition_config.json
requirements.txt		requirements.txt
speech_to_text.py		speech_to_text.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechToText

Overview

Instructions

How to run

Required Packages

Sources

Speech recognition in python (great for general learning)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpeechToText

Overview

Instructions

How to run

Required Packages

Sources

Speech recognition in python (great for general learning)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages