Speech-to-text App (Flask & OpenAI)

speech-to-text.mov

Run app locally using Docker

Install Docker

Run the following:

docker pull ameliaes/speech-to-text:latest
docker run -p 5000:5000 ameliaes/speech-to-text

View app at http://127.0.0.1:5000/

What I have learnt:

Set up a Flask web server to serve HTML and handle POST requests. Learnt how to use Flasks @app.route()
Used OpenAI Whisper to transcribe audio files on the server side. This is a model that runs locally without using an API or a paid service!
Learnt about recording audio from the browser using MediaRecorder API.
Learnt some more JavaScript functions eg.
- Asynchronous programming with JavaScript. This is method that enables your programme to start a long-running task but still be responsive to other events whilst that task is running. So in this project this was used when asking for permission to access the user's microphone in the browser.
  - Callbacks can be used to implement asynchronous functions BUT these can get very nested and then hard to debug (ie. "callback hell"). Instead, we can use JavaScript promises. Here the function starts and returns a Promise object. We can attach handlers to this promise object which will be executed when the operation has succeeded or failed.

What next...

I'm working on a couple of GitHub issues to make this app better: adding more E2E tests, alert the user if they record files that are too large (whilst recording) and check the security of the app.

Essentially this project was a quick introduction into using Flask with OpenAI Whisper. In the long term the goal is to set up an offline, home voice assistant. So instead of using Alexa and sending our data to Amazon we can have our own local system running off a Raspberry Pi. It's pretty amazing that the offline models from OpenAI are freely available to make this home system possible.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
cypress		cypress
static		static
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cypress.config.js		cypress.config.js
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-text App (Flask & OpenAI)

Run app locally using Docker

What I have learnt:

What next...

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech-to-text App (Flask & OpenAI)

Run app locally using Docker

What I have learnt:

What next...

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages