This is a GUI for OpenAI's Whisper TTS model... or anything with a compatible API.
As you might have guessed from the existence of various buttons for which there isn't a lot of explanation, this is a side project of mine that I have, for some reason, decided to put on the internet. Do not expect it to work in any meaningful way. (It might work though.)
The code might also provide excellent examples for
- how to do some stuff in win32 (tray icons! global hotkeys! COM automation!)
- how to not do stuff in win32 (the code is ugly, could use some cleanup, and this was my first attempt at tray icons, global hotkeys and COM automation.)
You hit record. You talk. You then press the button again (it is now labeled "Stop").
This will cause your recorded speech to be converted to MP3 and sent to Whisper. Once it responds, we insert the result into the application in the foreground.
You might already have noticed that this has questionable levels of usability, given how "the application in the front" is this GUI. To solve this, we are registering a global hotkey on F8. This corresponds to the record / stop button.
You should start by either setting an OpenAI API key for the official OpenAI Whisper API, or, if you're running Whisper locally, pointing it at an endpoint that has the same API.
You need libcurl & liblame to be available in c:\devel to compile this. At some point I should put the zip file containing them somewhere.
If the application in the foreground happens to be Emacs, we try to connect to its server and insert the text that way. For everyone else, we copy the text to the clipboard and send a literal Ctrl-V to the app in the foreground.
