SpeechScan

SpeechScan is a lightweight desktop application (PyQt5) that transcribes audio recordings using AssemblyAI and counts word occurrences.
It supports two input modes:

Local file (.mp3 or other formats converted to .mp3),
YouTube link (audio automatically downloaded via yt-dlp).

The app supports ~99 languages (English, Polish, German, French, Spanish, Italian, Portuguese, Russian, Japanese, Turkish…), automatically detected by AssemblyAI.

License: CC0-1.0 (see LICENSE for details)

⚙️ Technologies

Runtime

Python 3.12–3.14
PyQt5 – GUI
yt-dlp – YouTube audio download
requests – API communication
AssemblyAI API – transcription

Production / Development

Poetry – dependency and package management
requirements.txt / requirements-dev.txt – pip installation
mkdocs – documentation (docs/, mkdocs.yml)
pre-commit – auto-formatting and linting
CI/CD – GitHub workflows (.github/)
logging_config – colored logs with environment detection
PyInstaller – .exe build (SpeechScan.spec)
mypy – static type checking
Black – code formatting
Ruff – linting and style enforcement

🧠 How It Works

User input
- Choose File (local audio) or YouTube (download audio via yt-dlp).
- Provide AssemblyAI API key.
Audio processing
- File is uploaded to AssemblyAI.
- A transcription job is created.
- The app polls the API until transcription completes.
Text analysis
- Transcript is cleaned and normalized.
- Word frequency table is generated.
Presentation
- Results are displayed in the GUI.
- Logs are saved (colored console, optional file logging).

All long-running tasks (download, transcription, counting) run in QThreads to keep the UI responsive.

🗂️ Project Structure

## 🗂️ Project Structure

SpeechScan/
├─ .gitignore                        # Git ignore rules
├─ .pre-commit-config.yaml           # pre-commit hooks (Black, Ruff, mypy, etc.)
├─ LICENSE                           # License (CC0-1.0, see exceptions inside)
├─ mkdocs.yml                        # MkDocs site configuration
├─ poetry.lock                       # Poetry lockfile
├─ pyproject.toml                    # Poetry project config (deps, tools)
├─ README.md                         # Project readme
├─ requirements.txt                  # runtime dependencies
├─ requirements-dev.txt              # dev dependencies
├─ SpeechScan.spec                   # PyInstaller build specification
│
├─ .github/
│  └─ workflows/
│     ├─ build.yml                   # Build workflow (package/test build)
│     ├─ ci.yml                      # CI workflow (lint, tests)
│     └─ release.yml                 # Release workflow (PyInstaller, publish artifacts)
│
├─ docs/                             # Documentation (MkDocs site content)
│  ├─ index.md                       # Project introduction (homepage)
│  ├─ css/
│  │  ├─ mkdocstrings.css            # Styling for mkdocstrings plugin
│  │  └─ theme-variants.css          # Additional theme variants
│  └─ gen_ref_pages/                 # Scripts for generating API reference pages
│     ├─ config.py
│     ├─ context.py
│     ├─ generate.py
│     ├─ gen_ref_pages.py
│     ├─ helpers.py
│     └─ traverse.py
│
├─ screenshots/                      # Screenshots for README
│  ├─ main_screen.png                # Main screen
│  ├─ youtube_input.png              # YouTube input window
│  ├─ file_input.png                 # File input window
│  └─ result_screen.png              # Result view (YouTube transcription)
│
├─ src/speechscan/
│  ├─ __main__.py                    # Entry point (python -m speechscan)
│  ├─ app.py                         # QApplication init, style, UI setup
│  ├─ logging_config.py              # logging config (colors, ANSI detection)
│  │
│  ├─ assets/
│  │  ├─ img/
│  │  │  ├─ icon.ico                 # Windows icon
│  │  │  ├─ icon.png                 # App icon
│  │  │  └─ loading.gif              # Loading animation
│  │  └─ style/
│  │     └─ style.qss                # Qt stylesheet
│  │
│  ├─ services/
│  │  ├─ text/
│  │  │  └─ count_words.py           # Transcript cleanup + word frequency counting
│  │  └─ transcription/
│  │     └─ transcribe_audio.py      # AssemblyAI client (upload, poll, fetch text)
│  │
│  ├─ threads/
│  │  ├─ check_url_thread.py         # YouTube URL validator
│  │  ├─ count_words_thread.py       # Run counting in worker thread
│  │  └─ download_video_thread.py    # Download YouTube audio (yt-dlp)
│  │
│  ├─ ui/
│  │  ├─ file_window.py              # File input window
│  │  ├─ start_window.py             # Start screen
│  │  ├─ youtube_window.py           # YouTube input window
│  │  └─ views/
│  │     ├─ file_window.ui           # Qt Designer layout (file mode)
│  │     ├─ open_window.ui           # Qt Designer layout (start screen)
│  │     └─ youtube_window.ui        # Qt Designer layout (YouTube mode)
│  │
│  └─ utils/
│     └─ paths.py                    # Resource paths (dev vs exe)

🔧 Installation

Option A — pip

Users (runtime only):

python -m venv .venv
source .venv/bin/activate      # Linux/macOS
.venv\Scripts\activate         # Windows

pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Developers (runtime + dev):

pip install -r requirements.txt -r requirements-dev.txt
pip install -e .

Option B — Poetry

Users (without dev):

poetry install --without dev
poetry run speechscan

Developers (with dev):

poetry install
poetry run speechscan

📚 Documentation

Built with mkdocs.

mkdocs serve      # local preview (http://127.0.0.1:8000)
mkdocs build      # build into site/

🏗️ Build Executable

To build a Windows .exe with PyInstaller:

pyinstaller SpeechScan.spec

Resulting binary will be in dist/.

🛠️ Development Tools

This project uses additional tools to keep the codebase clean and consistent:

Type checking

mypy src/

Linting

ruff check src/

Auto-formatting

black src/

Run all pre-commit hooks locally

pre-commit run --all-files

▶️ Running the App

From source

python -m speechscan

With Poetry

poetry run speechscan

After installation

speechscan

🔑 AssemblyAI API Key

You need an AssemblyAI API key.
Create it for free at https://www.assemblyai.com (account needed).

💻 Usage

Launch the app (speechscan).
Choose File or YouTube mode.
Provide your AssemblyAI API key (if prompted).
Click Count and wait for the transcription.
View the word frequency table in the GUI.

🖼️ Screenshots

Main screen:

YouTube input:

File input:

Result view (YouTube transcription):

📜 License

Released under CC0-1.0 (public domain). You may copy, modify, distribute, and use it commercially without asking for permission.
⚠️ Note: not all files in the repository are covered by CC0 — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechScan

⚙️ Technologies

Runtime

Production / Development

🧠 How It Works

🗂️ Project Structure

🔧 Installation

Option A — pip

Option B — Poetry

📚 Documentation

🏗️ Build Executable

🛠️ Development Tools

Type checking

Linting

Auto-formatting

Run all pre-commit hooks locally

▶️ Running the App

From source

With Poetry

After installation

🔑 AssemblyAI API Key

💻 Usage

🖼️ Screenshots

📜 License

About

Uh oh!

Releases 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
docs		docs
screenshots		screenshots
src/speechscan		src/speechscan
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
SpeechScan.spec		SpeechScan.spec
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

License

Antek-N/SpeechScan

Folders and files

Latest commit

History

Repository files navigation

SpeechScan

⚙️ Technologies

Runtime

Production / Development

🧠 How It Works

🗂️ Project Structure

🔧 Installation

Option A — pip

Option B — Poetry

📚 Documentation

🏗️ Build Executable

🛠️ Development Tools

Type checking

Linting

Auto-formatting

Run all pre-commit hooks locally

▶️ Running the App

From source

With Poetry

After installation

🔑 AssemblyAI API Key

💻 Usage

🖼️ Screenshots

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages