Skip to content

Maiki04/PythonSpeechRecognition1

 
 

Repository files navigation

SpeechToText

Overview

This script implements a speech recognition service that listens for audio input and provides recognized text through a FastAPI-based web server. It uses a speech recognition engine (default: Vosk) to continuously process speech from the microphone and updates the recognized text in real time.

The FastAPI server offers multiple endpoints for connectivity, including WebSocket support for real-time updates to connected clients.

The script runs the speech recognition loop in a separate thread, allowing it to continuously capture speech while the server handles API requests and WebSocket connections concurrently.

Instructions

Warning: Do not use with Python 3.13, as most plugins are not yet compatible. Instead, use Python 3.11 or 3.12.
I personally use 3.12.8 at the moment.

How to run

... if you have multiple python installations (and the default installation is not 3.12):

  • In your terminal locate the Python312 folder.
    For me it's: C:\Users\Admin\AppData\Local\Programs\Python\Python312

Make sure the terminal runs on this folder!

  • In the terminal run: python.exe -m pip install RequiredPackageNameHere
    Refer to the package section below.

  • In VSCode or your editor of choice open a terminal (doesn't need to be on the Python312 folder path.)
    and run: py -V:3.12 .\speech_to_text.py

OR (Recommended) you can setup a venv (virtual environment) based on your python 3.12 installation.

OR you can switch the global installation by going into the windows enviroment variables and changing the entries

  • C:\Users\Admin\AppData\Local\Programs\Python\Python313\Scripts\

and

  • C:\Users\Admin\AppData\Local\Programs\Python\Python313\

in the system (variable) path to use Python312 (make sure python 3.12 is installed).


When you want to target a different Vosk model for speech recognition just change the vosk_model_path path in the corresponding config.json (example: keyword_recognition_config.json).

You can download additional Vosk models from https://alphacephei.com/vosk/models.

Required Packages

Important:
Check the requirements.txt for a detailed overview!

  • PyAudio
    PyAudio provides Python bindings for PortAudio v19, the cross-platform audio I/O library.

  • soundfile
    Library for reading and writing sound files.

  • uvicorn
    Uvicorn is an ASGI web server implementation for Python.

  • fastapi
    FastAPI is a modern, high-performance, web framework for building APIs with Python based on standard Python type hints.

  • Levenshtein
    Levenshtein is a string metric that measures the minimum number of single-character edits (insertions, deletions, etc.). In STT the package is useful for implementing fuzzy matching to account for mispronunciations or transcription errors in keyword detection.


Depending on the model(s) you want to use, not all of these packages may be required. Select the ones you need.

  • vosk (Recommended!)
    Vosk is an offline open source speech recognition toolkit.

  • tensorflow
    TensorFlow is an open source software library for high performance numerical computation.

  • openai-whisper

    • torch
      PyTorch, a deep learning framework.

    • whisper (Not sure if required!)
      A text-to-speech library.

    • openai-whisper
      OpenAI's Whisper model for speech-to-text transcription.

Sources

Speech recognition in python (great for general learning)

About

This project showcases real-time speech-to-text using Python and web frameworks, integrated with Godot. The Python script processes audio input, converts it to text, and serves it via RESTful or WebSocket endpoints. The Godot example fetches and processes the text, demonstrating the integration of speech recognition into an application.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 64.6%
  • GDScript 35.4%