SpeechScan is a lightweight desktop application (PyQt5) that transcribes audio recordings using AssemblyAI and counts word occurrences.
It supports two input modes:
- Local file (
.mp3or other formats converted to.mp3), - YouTube link (audio automatically downloaded via
yt-dlp).
The app supports ~99 languages (English, Polish, German, French, Spanish, Italian, Portuguese, Russian, Japanese, Turkish…), automatically detected by AssemblyAI.
License:
CC0-1.0(see LICENSE for details)
- Python 3.12–3.14
- PyQt5 – GUI
- yt-dlp – YouTube audio download
- requests – API communication
- AssemblyAI API – transcription
- Poetry – dependency and package management
- requirements.txt / requirements-dev.txt – pip installation
- mkdocs – documentation (
docs/,mkdocs.yml) - pre-commit – auto-formatting and linting
- CI/CD – GitHub workflows (
.github/) - logging_config – colored logs with environment detection
- PyInstaller –
.exebuild (SpeechScan.spec) - mypy – static type checking
- Black – code formatting
- Ruff – linting and style enforcement
-
User input
- Choose File (local audio) or YouTube (download audio via
yt-dlp). - Provide AssemblyAI API key.
- Choose File (local audio) or YouTube (download audio via
-
Audio processing
- File is uploaded to AssemblyAI.
- A transcription job is created.
- The app polls the API until transcription completes.
-
Text analysis
- Transcript is cleaned and normalized.
- Word frequency table is generated.
-
Presentation
- Results are displayed in the GUI.
- Logs are saved (colored console, optional file logging).
All long-running tasks (download, transcription, counting) run in QThreads to keep the UI responsive.
## 🗂️ Project Structure
SpeechScan/
├─ .gitignore # Git ignore rules
├─ .pre-commit-config.yaml # pre-commit hooks (Black, Ruff, mypy, etc.)
├─ LICENSE # License (CC0-1.0, see exceptions inside)
├─ mkdocs.yml # MkDocs site configuration
├─ poetry.lock # Poetry lockfile
├─ pyproject.toml # Poetry project config (deps, tools)
├─ README.md # Project readme
├─ requirements.txt # runtime dependencies
├─ requirements-dev.txt # dev dependencies
├─ SpeechScan.spec # PyInstaller build specification
│
├─ .github/
│ └─ workflows/
│ ├─ build.yml # Build workflow (package/test build)
│ ├─ ci.yml # CI workflow (lint, tests)
│ └─ release.yml # Release workflow (PyInstaller, publish artifacts)
│
├─ docs/ # Documentation (MkDocs site content)
│ ├─ index.md # Project introduction (homepage)
│ ├─ css/
│ │ ├─ mkdocstrings.css # Styling for mkdocstrings plugin
│ │ └─ theme-variants.css # Additional theme variants
│ └─ gen_ref_pages/ # Scripts for generating API reference pages
│ ├─ config.py
│ ├─ context.py
│ ├─ generate.py
│ ├─ gen_ref_pages.py
│ ├─ helpers.py
│ └─ traverse.py
│
├─ screenshots/ # Screenshots for README
│ ├─ main_screen.png # Main screen
│ ├─ youtube_input.png # YouTube input window
│ ├─ file_input.png # File input window
│ └─ result_screen.png # Result view (YouTube transcription)
│
├─ src/speechscan/
│ ├─ __main__.py # Entry point (python -m speechscan)
│ ├─ app.py # QApplication init, style, UI setup
│ ├─ logging_config.py # logging config (colors, ANSI detection)
│ │
│ ├─ assets/
│ │ ├─ img/
│ │ │ ├─ icon.ico # Windows icon
│ │ │ ├─ icon.png # App icon
│ │ │ └─ loading.gif # Loading animation
│ │ └─ style/
│ │ └─ style.qss # Qt stylesheet
│ │
│ ├─ services/
│ │ ├─ text/
│ │ │ └─ count_words.py # Transcript cleanup + word frequency counting
│ │ └─ transcription/
│ │ └─ transcribe_audio.py # AssemblyAI client (upload, poll, fetch text)
│ │
│ ├─ threads/
│ │ ├─ check_url_thread.py # YouTube URL validator
│ │ ├─ count_words_thread.py # Run counting in worker thread
│ │ └─ download_video_thread.py # Download YouTube audio (yt-dlp)
│ │
│ ├─ ui/
│ │ ├─ file_window.py # File input window
│ │ ├─ start_window.py # Start screen
│ │ ├─ youtube_window.py # YouTube input window
│ │ └─ views/
│ │ ├─ file_window.ui # Qt Designer layout (file mode)
│ │ ├─ open_window.ui # Qt Designer layout (start screen)
│ │ └─ youtube_window.ui # Qt Designer layout (YouTube mode)
│ │
│ └─ utils/
│ └─ paths.py # Resource paths (dev vs exe)
Users (runtime only):
python -m venv .venv
source .venv/bin/activate # Linux/macOS
.venv\Scripts\activate # Windows
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .Developers (runtime + dev):
pip install -r requirements.txt -r requirements-dev.txt
pip install -e .Users (without dev):
poetry install --without dev
poetry run speechscanDevelopers (with dev):
poetry install
poetry run speechscanBuilt with mkdocs.
mkdocs serve # local preview (http://127.0.0.1:8000)
mkdocs build # build into site/To build a Windows .exe with PyInstaller:
pyinstaller SpeechScan.specResulting binary will be in dist/.
This project uses additional tools to keep the codebase clean and consistent:
mypy src/ruff check src/black src/pre-commit run --all-filespython -m speechscanpoetry run speechscanspeechscanYou need an AssemblyAI API key.
Create it for free at https://www.assemblyai.com (account needed).
- Launch the app (
speechscan). - Choose File or YouTube mode.
- Provide your AssemblyAI API key (if prompted).
- Click Count and wait for the transcription.
- View the word frequency table in the GUI.
Result view (YouTube transcription):

Released under CC0-1.0 (public domain). You may copy, modify, distribute, and use it commercially without asking for permission.


