New Features: LLM Post-Processing, Find-and-Replace, Updated Whisper Models, More Providers (Claude, Gemini, Deepgram, Groq), and More#102
Conversation
- Added comprehensive support for multiple transcription models and APIs - New Whisper models (distil-large-v3, large-v3-turbo) - Vosk model support - Deepgram, Groq transcription APIs - Introduced LLM post-processing features - Cleanup and instruction modes - Support for multiple LLM providers (OpenAI, Anthropic, Google, Groq, Ollama) - Enhanced configuration and usability - Updated Python support to 3.12 - Modernized build system - Improved settings window - Added clipboard and find/replace text processing - Improved media and recording controls - Enhanced security with keyring API key storage
|
I just checked out this fork, and it's a great change. I recommend this PR get merged. |
|
@gamedevsam thanks! I will say that I haven't been able to test it on Linux, and it's entirely possible I broke the |
|
I have been using this fork now for about 2 weeks and it's been working great. @savbell I see that you haven't made contributions to this repo in some time, would you be interested in allowing Tom to become its maintainer? He seems to be engaged and have good command of the codebase. I can help with linux testing and fixing merge related issues. Thoughs? FYI @TomFrankly. |
|
@gamedevsam I'm really glad it's working for you! That said, this project was a 2-week fever dream of vibe-coding for me, and I probably wouldn't be able to maintain it long-term. I'm kinda hoping an intrepid developer can take the ideas I've added and turn them into properly polished software! I'm working on a video all about voice typing, so my main motivation for making my fork was to establish something that works on Windows. When I started the research process, I couldn't find much (as opposed to MacOS, which has several good options). |
|
Out of all people, never would I have thought that Thomas Frank would be improving one of my most used tools. All things considered, I think it's pretty clear that if WhisperWriter is to become more than just a solid base, it needs a new maintainer. |
|
really nice pull request here -- thanks for sharing. quite sad there's no real progress in the base SW. |
|
@bjspi its seems the original dev has moved onto other projects. I don't have time to maintain the project either, but since the code is GPL, anyone can fork it and even rebrand it as a new project if they like! |
|
@bjspi I hear your frustration, but keep in mind the original developer doesn't owe anyone anything. This is an open source project and anyone is free to fork it and continue development on the fork. @TomFrankly has done some great work but is understandably not willing to take on the responsibilities of being a core maintainer, I'm personally on the fence about it, will think about it a bit more. I have come to rely on this tool for my core workflows, so I have an interest in ensuring it continues to work correctly. For the record, I am NOT OK with anyone making direct or indirect comments that imply @savbell is doing anything wrong by choosing to not maintain this project. It's her right to do whatever she wants, including deleting this repo if she sees fit. Developers of open source software don't have to maintain their software, regardless of how popular it gets. It's up to them to maintain it, abandon it, or hand it over to someone else. Anyone who insinuates they're doing anything wrong by exercising their rights is entitled and I'm not going to be quiet about it. @savbell and @TomFrankly deserve nothing but gratitude for sharing their work with the world. Voicing any feelings other than gratitude for these two individuals in a public forum is inappropriate in my opinion. It's not sad that this PR has not been merged, it's an opportunity for you if you want to take the mantle of maintainer, and if not, it's an opportunity for someone else who has the time, skills and interest to fork the repo and keep it going. |
|
@gamedevsam also, my PR shouldn't be merged because I'm pretty sure Claude mercilessly destroyed any Linux functionality that may or may not have been working in the original version 😁 My version is only tested on Windows, so folks may find that it doesn't work on other platforms! |
|
thanks for your detailed reply and for sharing your perspective! Just to clarify, I absolutely didn't mean to criticize savbell or anyone involved here. I totally get (and fully respect) that open source maintainers owe nobody anything (and actually I did never write anything opposed to this), and I'm genuinely grateful for all the work that's gone into this project . 🙏 My point was simply that it's a bit of a pity to see such a great project slow down, especially when there are so many useful contributions sitting around in all sorts of different PRs, and it gets harder to find improvements or keep track of what’s still relevant. That's all I wanted to say 😅 it seems you interpreted way too much into my message which I never wrote. Anyway, thanks for keeping the conversation going and for all your efforts in keeping the project alive! |
|
Have any of you considered using Nvidias parakeet or qwen rather than Whisper? Parakeet ends up being faster with lower error rates, but forces you to use an nvidia card. https://huggingface.co/spaces/hf-audio/open_asr_leaderboard I just started making changes to a parakeet fork of whisper-writer, but it looks like some of these whisper-writer forks already have the features I'd want (minus using parakeet/qwen). |
An interesting suggestion, improved performance is always on the table. That said, I currently use Whisper Large V3 Turbo, which is virtually real-time on an RTX 4070 while also offering robust multilingual support. That said, I'd be interested in seeing how well your implementation performs.
And yes, the state of the main repo is "wild west" as of now. Which I can understand because maintaining such a tool does require some effort. So pick the fork that you appreciate the most and enjoy the work. |
|
@AJolly if I could clone myself, I'd get one of those clones working on Parakeet inclusion! I'd be quite interested to see its performance impact on lower-end machines. But yeah, this repo is the wild west. This PR I made also needs some work. Pretty sure it needs Torch to be upgraded to a new version. And for the foreseeable future, I truly have no time to work on it. I mostly added all my features as part of a YouTube video project that I still haven't gotten around to making 😅 |
|
@AJolly I've been using Tom's fork successfully, I think it offers meaningful improvements to whisper-writer I couldn't live without. I don't mind becoming the maintainer since Tom doesn't have the time to work on it right now. @TomFrankly I won't commit to any timelines, but will look into testing your fork in Linux. I don't mind reviewing PRs, testing out changes, and helping to set the current & future direction for the project. If that sounds good to you, feel free to fork my version of whisper-writer and open a PR once you're satisfied with your changes. |
|
Exciting stuff, definitely! I have forked @TomFrankly 's work and merged some of the pull requests I found on there. |
|
@JdotCarver parakeet "feels" better to me, on a 3090. I'm pretty sensitive to latency improvements, use 144hz screens, and extremely fast hardware - 14900ks, 9950x, so it may not matter as much to other users. Anecdotally, other people like parakeet because it works reasonably well even on CPU's. (I think this is for CPU usage, but have not tested it - https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/nemo-transducer-models.html#sherpa-onnx-nemo-parakeet-tdt-0-6b-v2-int8-english) Whisper v3 large is 60x real time speed, Parakeet is 3380x (and less parameters, less vram, etc), so more than 50x faster. The more useful part is it feels a bit more accurate than whisper. I have not run extensive testing on the error rates, just for my own personal day to day usage. I've stopped using my fancy $500 BP894 microphone and instead I'm now mostly using a much cheaper wireless modmic, since it's easier and parakeet handles the drop in audio quality. v3 came out very recently which adds multilingual support. I have not yet tried hosting parakeet via emulating an openAI api, which might be the easier way to go. example: https://github.com/Shadowfita/parakeet-tdt-0.6b-v2-fastapi or https://github.com/ScottMcMac/parakeetv2API/
|
The fact that STT engines have progressed so much that they has become accessible to people who do not have professional microphones is really a godsend. That is great to read.
The TomFrankly fork, that many of us, myself included, are running should be a very good starting point for this, as it introduced the capability for many other APIs. Though you might want to look at @gamedevsam's fork as they are putting effort into re-centralizing and maintaining the project. |

Hey all,
I've spent the last week working on a fork that added a bunch of features I wanted, and I'm opening this PR so y'all can include them if you like.
First, I'd like to say a huge thank-you to @savbell and the other contributors. This was easily the best Windows-based alternative to MacWhisper's dictation feature that I could find when I started this project.
I should also note that I'm not a Python developer. I believe this entire project was originally pair-programmed with LLMs, and my contributions are no exception. I have a decent JavaScript foundation, but this was my first time working with Python. Additionally, all of my edits are Windows-focused. I don't have a Linux machine to test on.
Added
Changed
Fixed
Security