Skip to content

New Features: LLM Post-Processing, Find-and-Replace, Updated Whisper Models, More Providers (Claude, Gemini, Deepgram, Groq), and More#102

Open
TomFrankly wants to merge 4 commits intosavbell:mainfrom
TomFrankly:main
Open

New Features: LLM Post-Processing, Find-and-Replace, Updated Whisper Models, More Providers (Claude, Gemini, Deepgram, Groq), and More#102
TomFrankly wants to merge 4 commits intosavbell:mainfrom
TomFrankly:main

Conversation

@TomFrankly
Copy link

Hey all,

I've spent the last week working on a fork that added a bunch of features I wanted, and I'm opening this PR so y'all can include them if you like.

First, I'd like to say a huge thank-you to @savbell and the other contributors. This was easily the best Windows-based alternative to MacWhisper's dictation feature that I could find when I started this project.

I should also note that I'm not a Python developer. I believe this entire project was originally pair-programmed with LLMs, and my contributions are no exception. I have a decent JavaScript foundation, but this was my first time working with Python. Additionally, all of my edits are Windows-focused. I don't have a Linux machine to test on.

Added

  • Support for newer Whisper models, such as distil-large-v3 and large-v3-turbo
  • Support for Vosk models
  • Support for additional transcription APIs and models:
    • Deepgram (nova-3, nova-2)
    • Groq (whisper-large-v3-turbo, distil-whisper-large-v3-en, whisper-large-v3)
  • LLM Cleanup and LLM Instruction modes
    • Two new activation modes, which will each send the transcript to a chosen LLM provider/model with custom system instructions.
    • Ideal use: Use one as a "cleanup" mode, and the other as an "instruction" mode.
    • In addition to system instructions in the settings, you can add a text file and have its content appeneded as additional instructions.
  • LLM model providers
    • OpenAI (ChatGPT)
    • Anthropic (Claude)
    • Google (Gemini)
    • Groq (all models)
    • Ollama (local LLM processing - I recommend "airat/karen-the-editor-v2-strict" for cleanup and "llama3.2" for instruction)
  • Text (Clipboard) Cleanup Feature
    • Another activation key that will send the current clipboard text to a LLM provider/model with custom system instructions.
    • Ideal use: Use this to clean up text that's already been printed into the current application.
  • Find and Replace Feature
    • Set a TXT or JSON file where you can set your own custom find and replace values.
    • Use a TXT file for simple find and replace operations. Each line should have a comma separated find and replace value, e.g. "find,replace".
    • Use a JSON file to get regex support and the ability to do text transformation on regex capture groups (see updated README for a tutorial).
  • More graceful degradation for lower-powered computers
    • If you don't have an NVIDIA GPU or don't want to install CUDA tools, you can use your CPU with smaller models, or use API providers for transcription.
  • Clipboard Input and Threshold
    • If the transcript contains more characters than the Clipboard Threshold, it will be pasted into the current application. This will replace the default behavior of simulating keyboard input, which can be unreliable.
  • Setting to pause currently playing audio during recording

Changed

  • Updated Python requirement to 3.12
  • Updated dependencies
  • Updated sound device property with dropdown, showing available input devices
  • Modernized build system using hatchling and a .toml file instead of requirements.txt
  • Clicking "X" to exit Settings no longer closes the app
  • Increased the default font size in Settings, added scrollable sections for longer tabs

Fixed

  • Fixed bug that would activate recording when only modifier keys were pressed
  • Fixed bug that didn't allow for non-Space keys to be used when setting hotkeys

Security

  • Switched to using keyring to store API keys
  • Added an explicit "Continuous API" checkbox in recording settings. Must be checked to use Continuous mode while using any remote API.
  • Added a "Continuous Timeout" setting in recording settings. After this many seconds of silence, continuous mode will automatically deactivate.
  • Set Recording status window to pulse when continuous mode is activated and a remote API is being used.

- Added comprehensive support for multiple transcription models and APIs
  - New Whisper models (distil-large-v3, large-v3-turbo)
  - Vosk model support
  - Deepgram, Groq transcription APIs
- Introduced LLM post-processing features
  - Cleanup and instruction modes
  - Support for multiple LLM providers (OpenAI, Anthropic, Google, Groq, Ollama)
- Enhanced configuration and usability
  - Updated Python support to 3.12
  - Modernized build system
  - Improved settings window
- Added clipboard and find/replace text processing
- Improved media and recording controls
- Enhanced security with keyring API key storage
@gamedevsam
Copy link

I just checked out this fork, and it's a great change. I recommend this PR get merged.

@TomFrankly
Copy link
Author

@gamedevsam thanks!

I will say that I haven't been able to test it on Linux, and it's entirely possible I broke the evdev functionality with my changes.

@gamedevsam
Copy link

I have been using this fork now for about 2 weeks and it's been working great. @savbell I see that you haven't made contributions to this repo in some time, would you be interested in allowing Tom to become its maintainer? He seems to be engaged and have good command of the codebase. I can help with linux testing and fixing merge related issues. Thoughs? FYI @TomFrankly.

@TomFrankly
Copy link
Author

@gamedevsam I'm really glad it's working for you!

That said, this project was a 2-week fever dream of vibe-coding for me, and I probably wouldn't be able to maintain it long-term. I'm kinda hoping an intrepid developer can take the ideas I've added and turn them into properly polished software!

I'm working on a video all about voice typing, so my main motivation for making my fork was to establish something that works on Windows. When I started the research process, I couldn't find much (as opposed to MacOS, which has several good options).

@JdotCarver
Copy link

Out of all people, never would I have thought that Thomas Frank would be improving one of my most used tools.
Never been a fan of your content, but love the work ethic. 👍

All things considered, I think it's pretty clear that if WhisperWriter is to become more than just a solid base, it needs a new maintainer.

@bjspi
Copy link

bjspi commented Aug 4, 2025

really nice pull request here -- thanks for sharing.
very sad it doesn't make it into the main repository and no one really detects it unless you search for it here ...

quite sad there's no real progress in the base SW.

@TomFrankly
Copy link
Author

@bjspi its seems the original dev has moved onto other projects. I don't have time to maintain the project either, but since the code is GPL, anyone can fork it and even rebrand it as a new project if they like!

@gamedevsam
Copy link

@bjspi I hear your frustration, but keep in mind the original developer doesn't owe anyone anything. This is an open source project and anyone is free to fork it and continue development on the fork. @TomFrankly has done some great work but is understandably not willing to take on the responsibilities of being a core maintainer, I'm personally on the fence about it, will think about it a bit more. I have come to rely on this tool for my core workflows, so I have an interest in ensuring it continues to work correctly.

For the record, I am NOT OK with anyone making direct or indirect comments that imply @savbell is doing anything wrong by choosing to not maintain this project. It's her right to do whatever she wants, including deleting this repo if she sees fit. Developers of open source software don't have to maintain their software, regardless of how popular it gets. It's up to them to maintain it, abandon it, or hand it over to someone else. Anyone who insinuates they're doing anything wrong by exercising their rights is entitled and I'm not going to be quiet about it.

@savbell and @TomFrankly deserve nothing but gratitude for sharing their work with the world. Voicing any feelings other than gratitude for these two individuals in a public forum is inappropriate in my opinion. It's not sad that this PR has not been merged, it's an opportunity for you if you want to take the mantle of maintainer, and if not, it's an opportunity for someone else who has the time, skills and interest to fork the repo and keep it going.

@TomFrankly
Copy link
Author

@gamedevsam also, my PR shouldn't be merged because I'm pretty sure Claude mercilessly destroyed any Linux functionality that may or may not have been working in the original version 😁

My version is only tested on Windows, so folks may find that it doesn't work on other platforms!

@bjspi
Copy link

bjspi commented Aug 5, 2025

@gamedevsam

thanks for your detailed reply and for sharing your perspective!

Just to clarify, I absolutely didn't mean to criticize savbell or anyone involved here. I totally get (and fully respect) that open source maintainers owe nobody anything (and actually I did never write anything opposed to this), and I'm genuinely grateful for all the work that's gone into this project . 🙏

My point was simply that it's a bit of a pity to see such a great project slow down, especially when there are so many useful contributions sitting around in all sorts of different PRs, and it gets harder to find improvements or keep track of what’s still relevant. That's all I wanted to say 😅 it seems you interpreted way too much into my message which I never wrote.

Anyway, thanks for keeping the conversation going and for all your efforts in keeping the project alive!

@AJolly
Copy link

AJolly commented Aug 27, 2025

Have any of you considered using Nvidias parakeet or qwen rather than Whisper? Parakeet ends up being faster with lower error rates, but forces you to use an nvidia card. https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

I just started making changes to a parakeet fork of whisper-writer, but it looks like some of these whisper-writer forks already have the features I'd want (minus using parakeet/qwen).

@JdotCarver
Copy link

Have any of you considered using Nvidias parakeet or qwen rather than Whisper? Parakeet ends up being faster with lower error rates, but forces you to use an nvidia card. https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

An interesting suggestion, improved performance is always on the table.

That said, I currently use Whisper Large V3 Turbo, which is virtually real-time on an RTX 4070 while also offering robust multilingual support.
In practical terms, the performance gains from switching to another model —even if measurable— are likely to be marginal relative to the engineering effort required. For example, even a 50% speedup on a 300ms operation only saves 150ms, which is often negligible in real-world usage.

That said, I'd be interested in seeing how well your implementation performs.

I just started making changes to a parakeet fork of whisper-writer, but it looks like some of these whisper-writer forks already have the features I'd want (minus using parakeet/qwen).

And yes, the state of the main repo is "wild west" as of now. Which I can understand because maintaining such a tool does require some effort. So pick the fork that you appreciate the most and enjoy the work.

@TomFrankly
Copy link
Author

@AJolly if I could clone myself, I'd get one of those clones working on Parakeet inclusion! I'd be quite interested to see its performance impact on lower-end machines.

But yeah, this repo is the wild west. This PR I made also needs some work. Pretty sure it needs Torch to be upgraded to a new version. And for the foreseeable future, I truly have no time to work on it. I mostly added all my features as part of a YouTube video project that I still haven't gotten around to making 😅

@gamedevsam
Copy link

gamedevsam commented Aug 27, 2025

@AJolly I've been using Tom's fork successfully, I think it offers meaningful improvements to whisper-writer I couldn't live without. I don't mind becoming the maintainer since Tom doesn't have the time to work on it right now.

@TomFrankly I won't commit to any timelines, but will look into testing your fork in Linux.

I don't mind reviewing PRs, testing out changes, and helping to set the current & future direction for the project.

If that sounds good to you, feel free to fork my version of whisper-writer and open a PR once you're satisfied with your changes.

@JdotCarver
Copy link

Exciting stuff, definitely!

I have forked @TomFrankly 's work and merged some of the pull requests I found on there.
I also have slightly improved the RegEx functionnality. So have a look-see if you find anything useful there.

https://github.com/JdotCarver/whisper-writer

@AJolly
Copy link

AJolly commented Sep 4, 2025

@JdotCarver parakeet "feels" better to me, on a 3090. I'm pretty sensitive to latency improvements, use 144hz screens, and extremely fast hardware - 14900ks, 9950x, so it may not matter as much to other users. Anecdotally, other people like parakeet because it works reasonably well even on CPU's. (I think this is for CPU usage, but have not tested it - https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/nemo-transducer-models.html#sherpa-onnx-nemo-parakeet-tdt-0-6b-v2-int8-english)

Whisper v3 large is 60x real time speed, Parakeet is 3380x (and less parameters, less vram, etc), so more than 50x faster.

The more useful part is it feels a bit more accurate than whisper. I have not run extensive testing on the error rates, just for my own personal day to day usage. I've stopped using my fancy $500 BP894 microphone and instead I'm now mostly using a much cheaper wireless modmic, since it's easier and parakeet handles the drop in audio quality.

v3 came out very recently which adds multilingual support.

I have not yet tried hosting parakeet via emulating an openAI api, which might be the easier way to go. example: https://github.com/Shadowfita/parakeet-tdt-0.6b-v2-fastapi or https://github.com/ScottMcMac/parakeetv2API/
That way we can keep the entire whisper-writer code base as is, and just connect to a different backend.

image

@JdotCarver
Copy link

JdotCarver commented Sep 7, 2025

@AJolly

The more useful part is it feels a bit more accurate than whisper. I have not run extensive testing on the error rates, just for my own personal day to day usage. I've stopped using my fancy $500 BP894 microphone and instead I'm now mostly using a much cheaper wireless modmic, since it's easier and parakeet handles the drop in audio quality.

The fact that STT engines have progressed so much that they has become accessible to people who do not have professional microphones is really a godsend. That is great to read.
I indeed do have a setup where microphone quality is a non-issue, so I was not aware of that fact.

That way we can keep the entire whisper-writer code base as is, and just connect to a different backend.

The TomFrankly fork, that many of us, myself included, are running should be a very good starting point for this, as it introduced the capability for many other APIs. Though you might want to look at @gamedevsam's fork as they are putting effort into re-centralizing and maintaining the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants