Skip to content

ssafar/whisper_win32

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a GUI for OpenAI's Whisper TTS model... or anything with a compatible API.

main_window

Project status

As you might have guessed from the existence of various buttons for which there isn't a lot of explanation, this is a side project of mine that I have, for some reason, decided to put on the internet. Do not expect it to work in any meaningful way. (It might work though.)

The code might also provide excellent examples for

  • how to do some stuff in win32 (tray icons! global hotkeys! COM automation!)
  • how to not do stuff in win32 (the code is ugly, could use some cleanup, and this was my first attempt at tray icons, global hotkeys and COM automation.)

Usage

You hit record. You talk. You then press the button again (it is now labeled "Stop").

This will cause your recorded speech to be converted to MP3 and sent to Whisper. Once it responds, we insert the result into the application in the foreground.

You might already have noticed that this has questionable levels of usability, given how "the application in the front" is this GUI. To solve this, we are registering a global hotkey on F8. This corresponds to the record / stop button.

Setup

settings

You should start by either setting an OpenAI API key for the official OpenAI Whisper API, or, if you're running Whisper locally, pointing it at an endpoint that has the same API.

Building

You need libcurl & liblame to be available in c:\devel to compile this. At some point I should put the zip file containing them somewhere.

FAQ

How do we insert the text?

If the application in the foreground happens to be Emacs, we try to connect to its server and insert the text that way. For everyone else, we copy the text to the clipboard and send a literal Ctrl-V to the app in the foreground.

About

A win32 GUI to use OpenAI's Whisper model (or anything compatible)

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors